Permalink
fa519e9 Nov 14, 2014
@willkg @davedash @noahmiller @kcolton @HonzaKral @emidln
312 lines (199 sloc) 8.29 KB

Using ElasticUtils with Django

Summary

Django-specific code is all located in elasticutils.contrib.django.

This chapter covers using ElasticUtils Django bits. For API documentation, see :ref:`django-api-docs-chapter`.

How to integrate ElasticUtils with Django

  1. add ElasticUtils configuration settings to your project's setting file
  2. write one or more MappingType classes
  3. write code to create the Elasticsearch index and populate it with documents based on your MappingType subclasses
  1. use :py:class:`elasticutils.contrib.django.S` to search and return results
  2. use :py:class:`elasticutils.contrib.django.estestcase.ESTestCase` to write tests

That's the gist of it. You can deviate on any of these depending on your needs, of course.

Configuration

ElasticUtils depends on the following settings in your Django settings file:

Elasticsearch

The get_es() in the Django contrib will use Django settings listed above to build the elasticsearch-py Elasticsearch object.

Using with Django ORM models

Requirements:Django

The elasticutils.contrib.django.S class takes a MappingType in the constructor. That allows you to tie Django ORM models to Elasticsearch index search results.

In elasticutils.contrib.django is MappingType which has some additional Django ORM-specific code in it to make it easier.

Define a MappingType subclass for your model. The minimal you need to define is get_model.

Further, you can use the Indexable mixin to get a bunch of helpful indexing-related code.

For example, here's a minimal MappingType subclass:

from django.models import Model
from elasticutils.contrib.django import MappingType


class MyModel(Model):
    # Django model ...


class MyMappingType(MappingType):
    @classmethod
    def get_model(cls):
        return MyModel

searcher = MyMappingType.search()

Here's one that uses Indexable and handles indexing:

from django.models import Model
from elasticutils.contrib.django import Indexable, MappingType


class MyModel(Model):
    # Django model ...


class MyMappingType(MappingType, Indexable):
    @classmethod
    def get_model(cls):
        """Returns the Django model this MappingType relates to"""
        return MyModel

    @classmethod
    def get_mapping(cls):
        """Returns an Elasticsearch mapping for this MappingType"""
        return {
            'properties': {
                # The id is an integer, so store it as such. Elasticsearch
                # would have inferred this just fine.
                'id': {'type': 'integer'},

                # The name is a name---so we shouldn't analyze it
                # (de-stem, tokenize, parse, etc).
                'name': {'type': 'string', 'index': 'not_analyzed'},

                # The bio has free-form text in it, so analyze it with
                # snowball.
                'bio': {'type': 'string', 'analyzer': 'snowball'},

                # Age is an integer
                'age': {'type': 'integer'}
            }
        }

    @classmethod
    def extract_document(cls, obj_id, obj=None):
        """Converts this instance into an Elasticsearch document"""
        if obj is None:
            obj = cls.get_model().objects.get(pk=obj_id)

        return {
            'id': obj.id,
            'name': obj.name,
            'bio': obj.bio,
            'age': obj.age
            }


searcher = MyMappingType.search()

Celery tasks

Requirements:Django, Celery

You can then utilize things such as :py:func:`elasticutils.contrib.django.tasks.index_objects` to automatically index all new items.

Middleware

Requirements:Django

There's a middleware that catches all Elasticsearch-related exceptions and shows a 501/503 template accordingly. See :py:class:`elasticutils.contrib.django.ESExceptionMiddleware` for details.

Writing tests

Requirements:Django

When writing test cases for your ElasticUtils-using code, you'll want to do a few things:

  1. Default ES_DISABLED to True. This way, the tests that kick off creating data but aren't testing search-specific things don't additionally index stuff. That'll save you a bunch of test time.
  2. When testing ElasticUtils things, override the settings and set ES_DISABLED to False.
  3. Use an ESTestCase that sets up the indexes before tests run and tears them down after they run.
  4. When testing, make sure you use an index name that's unique. You don't want to run your tests and have them affect your production index.

You can use :py:class:`elasticutils.contrib.django.estestcase.ESTestCase` for your app's tests. It's pretty basic but does all of the above except item 1 which you'll need to do in your test settings.

Example usage:

from elasticutils.contrib.django.estestcase import ESTestCase


class TestQueries(ESTestCase):
    # This class holds tests that do elasticsearch things

    def test_query(self):
        # Test code ...

    def test_locked_filters(self):
        # Test code ...

ElasticUtils uses this for it's Django tests. Look at the test code for more examples of usage:

https://github.com/mozilla/elasticutils/

If it's not what you want, you could subclass it and override behavior or just write your own.

Helpful things to know

Indexing and reset_queries

If you are:

  1. indexing a lot of data pulled out with the Django ORM, and
  2. have DEBUG = True (i.e. development environments)

then you'll probably want to call django.db.reset_queries() periodically.

What's going on is that when DEBUG = True (i.e. a devleopment environment), Django helpfully stores all the queries that are made which when you're indexing a lot of data is a lot of data. Calling django.db.reset_queries() periodically flushes the queries so it doesn't monotonically eat all your memory before the indexing is done.