Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Browse files

Overhaul Django SearchMixin and docs

* overhauls SearchMixin to be more useful and flexible but continue to
  have useful defaults
* this overhauls a lot of the documentation for the contrib.django stuff
  and makes more of it correct
  • Loading branch information...
commit 357f3ff19ac9087aa51602812de1e3a6ee72d3e9 1 parent 4b305a4
@willkg willkg authored
Showing with 316 additions and 78 deletions.
  1. +176 −54 docs/django.rst
  2. +140 −24 elasticutils/contrib/django/models.py
View
230 docs/django.rst
@@ -17,63 +17,65 @@ This chapter covers using ElasticUtils Django bits.
Configuration
=============
-ElasticUtils depends on the following settings:
+ElasticUtils depends on the following settings in your Django settings
+file:
.. module:: django.conf.settings
.. data:: ES_DISABLED
- Disables talking to ElasticSearch from your app. Any method
- wrapped with `es_required` will return and log a warning. This is
- useful while developing, so you don't have to have ElasticSearch
- running.
+ If `ES_DISABLED = True`, then Any method wrapped with
+ `es_required` will return and log a warning. This is useful while
+ developing, so you don't have to have ElasticSearch running.
.. data:: ES_DUMP_CURL
- If set to a path all the requests that `ElasticUtils` makes will
- be dumped into the designated file.
+ If set to a file path all the requests that `ElasticUtils` makes
+ will be dumped into the designated file.
- .. note:: Python does not write this file until the process is
- finished.
+ If set to a class instance, calls the ``.write()`` method with
+ the curl equivalents.
+ See :ref:`django-debugging` for more details.
.. data:: ES_HOSTS
- This is a list of hosts. In development this will look like::
+ This is a list of ES hosts. In development this will look like::
- ES_HOSTS = ['127.0.0.1:9200']
+ ES_HOSTS = ['127.0.0.1:9200']
.. data:: ES_INDEXES
- This is a mapping of doctypes to indexes. A `default` mapping is
- required for types that don't have a specific index.
+ This is a mapping of doctypes to indexes. A `default` mapping is
+ required for types that don't have a specific index.
- When ElasticUtils queries the index for a model, it derives the
- doctype from `Model._meta.db_table`. When you build your indexes
- and doctypes, make sure to name them after your model db_table.
+ When ElasticUtils queries the index for a model, by default it
+ derives the doctype from `Model._meta.db_table`. When you build
+ your indexes and mapping types, make sure to match the indexes and
+ mapping types you're using.
- Example 1::
+ Example 1::
- ES_INDEXES = {'default': 'main_index'}
+ ES_INDEXES = {'default': 'main_index'}
- This only has a default, so ElasticUtils queries will look in
- `main_index` for all doctypes.
+ This only has a default, so all ElasticUtils queries will look in
+ `main_index` for all mapping types.
- Example 2::
+ Example 2::
- ES_INDEXES = {'default': 'main_index',
- 'splugs': 'splugs_index'}
+ ES_INDEXES = {'default': 'main_index',
+ 'splugs': 'splugs_index'}
- Assuming you have a `Splug` model which has a
- `Splug._meta.db_table` value of `splugs`, then ElasticUtils will
- run queries for `Splug` in the `splugs_index`. ElasticUtils will
- run queries for other models in `main_index` because that's the
- default.
+ Assuming you have a `Splug` model which has a
+ `Splug._meta.db_table` value of `splugs`, then ElasticUtils will
+ run queries for `Splug` in the `splugs_index`. ElasticUtils will
+ run queries for other models in `main_index` because that's the
+ default.
.. data:: ES_TIMEOUT
- Defines the timeout for the `ES` connection. This defaults to 5
- seconds.
+ Defines the timeout for the `ES` connection. This defaults to 5
+ seconds.
ES
@@ -87,8 +89,8 @@ It is built with the settings from your `django.conf.settings`.
.. Note::
`get_es()` only caches the `ES` if you don't pass in any override
- arguments. If you pass in override arguments, it doesn't cache it,
- but instead creates a new one.
+ arguments. If you pass in override arguments, it doesn't cache it
+ and instead creates a new one.
Using with Django ORM models
@@ -97,7 +99,7 @@ Using with Django ORM models
:Requirements: Django
The `elasticutils.contrib.django.S` class takes a model in the
-constructor. That model is a Django ORM Models derivative. For example::
+constructor. That model is a Django ORM model class. For example::
from elasticutils.contrib.django import S
from myapp.models import MyModel
@@ -110,29 +112,143 @@ bunch of functionality that makes indexing data easier.
Two things to know:
-1. The doctype for the model is ``cls._meta.db_table``.
+1. The doctype for the model is ``cls._meta.db_table`` by default.
2. The index that's searched is ``settings.ES_INDEXES[doctype]`` and
if that doesn't exist, it defaults to
- ``settings.ES_INDEXES['default']``
+ ``settings.ES_INDEXES['default']`` by default.
+
+
+For example, here's a minimal use of the SearchMixin::
+
+ from django.db import models
+
+ from elasticutils.contrib.django import SearchMixin
+
+
+ class Contact(models.Model, SearchMixin):
+ name = models.CharField(max_length=50)
+ bio = models.TextField(blank=True)
+ age = models.IntegerField()
+ website = models.URLField(blank=True)
+ last_udpated = models.DateTimeField(default=datetime.now)
+
+ @classmethod
+ def extract_document(cls, obj_id, obj=None):
+ """Takes an object id for this class, returns dict."""
+ if obj is None:
+ obj = cls.objects.get(pk=obj_id)
+
+ return {
+ 'id': obj.id,
+ 'name': obj.name,
+ 'bio': obj.bio,
+ 'age': obj.age,
+ 'website': obj.website,
+ 'last_updated': obj.last_updated
+ }
+
+
+This example doesn't specify a mapping. That's ok because ElasticSearch
+will infer from the shape of the data how it should analyze and store
+the data.
+
+If you want to specify this explicitly (and I suggest you do for
+anything that involves strings), then you want to additionally
+override `.get_mapping()`. Let's refine the above example by
+explicitly specifying `.get_mapping()`.
+
+::
+
+ from django.db import models
+
+ from elasticutils.contrib.django import SearchMixin
+
+
+ class Contact(models.Model, SearchMixin):
+ name = models.CharField(max_length=50)
+ bio = models.TextField(blank=True)
+ age = models.IntegerField()
+ website = models.URLField(blank=True)
+ last_udpated = models.DateTimeField(default=datetime.now)
+
+ @classmethod
+ def get_mapping(cls):
+ """Returns an ElasticSearch mapping."""
+ return {
+ # The id is an integer, so store it as such. ES would have
+ # inferred this just fine.
+ 'id': {'type': 'integer'},
+
+ # The name is a name---so we shouldn't analyze it
+ # (de-stem, tokenize, parse, etc).
+ 'name': {'type': 'string', 'index': 'not_analyzed'},
+
+ # The bio has free-form text in it, so analyze it with
+ # snowball.
+ 'bio': {'type': 'string', 'analyzer': 'snowball'},
+
+ # The website also shouldn't be analyzed.
+ 'website': {'type': 'string', 'index': 'not_analyzed'},
+
+ # The last_updated field is a date.
+ 'last_updated': {'type': 'date'}
+ }
+
+ @classmethod
+ def extract_document(cls, obj_id, obj=None):
+ """Takes an object id for this class, returns dict."""
+ if obj is None:
+ obj = cls.objects.get(pk=obj_id)
+
+ return {
+ 'id': obj.id,
+ 'name': obj.name,
+ 'bio': obj.bio,
+ 'age': obj.age,
+ 'website': obj.website,
+ 'last_updated': obj.last_updated
+ }
+
+
+SearchMixin
+-----------
.. autoclass:: elasticutils.contrib.django.models.SearchMixin
:members:
+.. seealso::
+
+ http://www.elasticsearch.org/guide/reference/mapping/
+ The ElasticSearch guide on mapping types.
+
+ http://www.elasticsearch.org/guide/reference/mapping/core-types.html
+ The ElasticSearch guide on mapping type field types.
+
+
+
Other helpers
=============
:Requirements: Django, Celery
You can then utilize things such as
-:func:`~elasticutils.contrib.django.tasks.index_objects` to
+:func:`elasticutils.contrib.django.tasks.index_objects` to
automatically index all new items.
+
+Tasks
+-----
+
.. automodule:: elasticutils.contrib.django.tasks
.. autofunction:: index_objects(model, ids=[...])
+
+Cron
+----
+
.. automodule:: elasticutils.contrib.django.cron
.. autofunction:: reindex_objects(model, chunk_size[=150])
@@ -165,32 +281,38 @@ Example::
...
+.. _django-debugging:
+
Debugging
=========
-From Rob Hudson (with some minor editing):
+You can set the ``settings.ES_DUMP_CURL`` to a few different things
+all of which can be helpful in debugging ElasticUtils.
+
+1. a file path
+
+ This will cause PyES to write the curl equivalents of the commands
+ it's sending to ElasticSearch to a file.
+
+ Example setting::
+
+ ES_DUMP_CURL = '/var/log/es_curl.log'
+
- I recently discovered a nice tool for helping solve ElasticSearch
- problems that I thought I'd share...
+ .. Note::
- While scanning the code of pyes I discovered that it has an option
- to dump the commands it is sending to the ES backend to whatever
- you give it that has a ``write()`` method [1]_. I also discovered
- that elasticutils will pass this through to pyes based on the
- ``settings.ES_DUMP_CURL`` [2]_.
+ The file is not closed until the process ends. Because of that,
+ you don't see much in the file until it's done.
- I threw together a quick and ugly class just to dump output while
- debugging an ES problem::
+
+2. a class instance that has a ``.write()`` method
+
+ PyES will call the ``.write()`` method with the curl equivalent and
+ then you can do whatever you want with it.
+
+ For example, this writes curl equivalent output to stdout::
class CurlDumper(object):
def write(self, s):
print s
ES_DUMP_CURL = CurlDumper()
-
- This is pretty great when running a test with output enabled, or
- even in the runserver output. But to my surprise, when running
- tests with output not enabled I see the curl dump for only tests
- that fail, which has turned out to be very useful information.
-
-.. [1] https://github.com/aparo/pyes/blob/master/pyes/es.py#L496
-.. [2] https://github.com/mozilla/elasticutils/blob/master/elasticutils/__init__.py#L29
View
164 elasticutils/contrib/django/models.py
@@ -1,44 +1,160 @@
from django.conf import settings
-from pyes import djangoutils
-
-import elasticutils
+from elasticutils.contrib.django import get_es, S
class SearchMixin(object):
- """This mixin correlates a Django model to an ElasticSearch index."""
+ """Mixin for indexing Django model instances
+
+ Add this mixin to your Django ORM model class and it gives you
+ super indexing power. This correlates an ES mapping type to a
+ Django ORM model. Using this allows you to get Django model
+ instances as ES search results.
+
+ """
@classmethod
- def _get_index(cls):
+ def get_index(cls):
+ """Gets the index for this model.
+
+ The index for this model is specified in `settings.ES_INDEXES`
+ which is a dict of mapping type -> index name.
+
+ By default, this uses `.get_mapping_type()` to determine the
+ mapping and returns the value in `settings.ES_INDEXES` for that
+ or ``settings.ES_INDEXES['default']``.
+
+ Override this to compute it differently.
+
+ :returns: index name to use
+
+ """
indexes = settings.ES_INDEXES
- return indexes.get(cls._meta.db_table) or indexes['default']
+ return indexes.get(cls.get_mapping_type()) or indexes['default']
@classmethod
- def index(cls, document, id=None, bulk=False, force_insert=False):
- """Associates a document with a correlated id in ES.
+ def get_mapping_type(cls):
+ """Returns the name of the mapping.
- Wrapper around pyes.ES.index.
+ By default, this is ``cls._meta.db_table``.
- Example::
+ Override this if you want to compute the mapping type name
+ differently.
+
+ :returns: mapping type string
- MyModel.index(instance.fields, id=instance.id)
"""
- elasticutils.get_es().index(
- document, index=cls._get_index(), doc_type=cls._meta.db_table,
- id=id, bulk=bulk, force_insert=force_insert)
+ return cls._meta.db_table
@classmethod
- def unindex(cls, id):
- """Removes a particular item from the search index."""
- elasticutils.get_es().delete(cls._get_index(), cls._meta.db_table, id)
+ def get_mapping(cls):
+ """Returns the mapping for this mapping type.
+
+ See the docs for details on how to specify a mapping.
+
+ Override this to return a mapping for this doctype.
+
+ :returns: dict representing the ES mapping or None if you
+ want ES to infer it. defaults to None.
+
+ """
+ return None
+
+ @classmethod
+ def extract_document(cls, obj_id, obj=None):
+ """Extracts the ES index document for this instance
+
+ This must be implemented.
+
+ .. Note::
+
+ The resulting dict must be JSON serializable.
+
+ :arg obj_id: the object id for the instance to extract from
+ :arg obj: if this is not None, use this as the object to
+ extract from; this allows you to fetch a bunch of items
+ at once and extract them one at a time
+
+ :returns: dict of key/value pairs representing the document
+
+ """
+ raise NotImplementedError
+
+ @classmethod
+ def get_indexable(cls):
+ """Returns the queryset of ids of all things to be indexed.
+
+ Defaults to::
- def fields(self):
- """Returns a serialization of a Model instance.
+ cls.objects.order_by('id').values_list('id', flat=True)
- This can be used for indexing data.
+ :returns: iterable of ids of objects to be indexed
- .. warning::
- It is recommended that you override this method and selectively
- serialize fields.
"""
- return djangoutils.get_values(self)
+ return cls.objects.order_by('id').values_list('id', flat=True)
+
+ @classmethod
+ def index(cls, document, id_=None, bulk=False, force_insert=False,
+ es=None):
+ """Adds or updates a document to the index
+
+ :arg document: Python dict of key/value pairs representing
+ the document
+
+ .. Note::
+
+ This must be serializable into JSON.
+
+ :arg id_: the Django ORM model instance id---this is used to
+ convert an ES search result back to the Django ORM model
+ instance from the db. It should be an integer.
+ :arg bulk: Whether or not this is part of a bulk indexing. If
+ this is, you must provide an ES with the `es` argument,
+ too.
+ :arg force_insert: TODO
+ :arg es: The ES to use. If you don't specify an ES, it'll
+ use `elasticutils.contrib.django.get_es()`.
+
+ :raises ValueError: if `bulk` is True, but `es` is None.
+
+ TODO: add example.
+
+ """
+ if bulk and es is None:
+ raise ValueError('bulk is True, but es is None')
+
+ if es is None:
+ es = get_es()
+
+ es.index(
+ document, index=cls.get_index(), doc_type=cls.get_mapping_type(),
+ id=id_, bulk=bulk, force_insert=force_insert)
+
+ @classmethod
+ def unindex(cls, id, es=None):
+ """Removes a particular item from the search index.
+
+ TODO: document this better.
+
+ """
+ if es is None:
+ es = get_es()
+
+ es.delete(cls.get_index(), cls.get_mapping_type(), id)
+
+ @classmethod
+ def refresh_index(cls, timesleep=0, es=None):
+ """Refreshes the index.
+
+ TODO: document this better.
+
+ """
+ if es is None:
+ es = get_es()
+
+ es.refresh(cls.get_index(), timesleep=timesleep)
+
+ @classmethod
+ def search(cls):
+ """Returns a typed S for this class."""
+ return S(cls).indexes(cls.get_index()).doctypes(cls.get_mapping_type())
Please sign in to comment.
Something went wrong with that request. Please try again.