Skip to content
c2cf947 Sep 11, 2014
@willkg @HonzaKral @catalanojuan
294 lines (184 sloc) 7.31 KB

Indexing

Overview

ElasticUtils is primarily an API for searching. However, before you can search, you need to create an index and index your documents.

This chapter covers the indexing side of things. It does so lightly---for more details, read through the elasticsearch-py documentation and the Elasticsearch guide.

Getting an Elasticsearch object

ElasticUtils uses elasticsearch-py which comes with a handy Elasticsearch object. This lets you:

  • create indexes
  • create mappings
  • apply settings
  • check status
  • etc.

To access this, you use :py:func:`elasticutils.get_es` which creates an Elasticsearch object for you.

See :py:func:`elasticutils.get_es` for more details.

Indexes

An index is a collection of documents.

Before you do anything, you need to have an index. You can create one with .indices.create().

For example:

es = get_es()
es.indices.create(index='blog-index')

You can pass in settings, too. For example, you can set the refresh interval when creating the index:

es.indices.create(index='blog-index', body={'refresh_interval': '5s'})

Types and Mappings

A type is a set of fields. A document is of a given type if it has those fields. Whenever you index a document, you specify which type the document is. This is sometimes called a "doctype", "document type" or "doc type".

A mapping is the definition of fields and how they should be indexed for a type. In ElasticUtils, we call a document type that has a defined mapping a "mapping type" mostly as a shorthand for "document type with a defined mapping" because that's a mouthful.

Elasticsearch can infer mappings to some degree, but you get a lot more value by specifying mappings explicitly.

To define a mapping, you use .indices.put_mapping().

For example:

es = get_es()
es.indices.put_mapping(
    index='blog-index',
    doc_type='blog-entry-type',
    body={
        'blog-entry-type': {
            'properties': {
                'id': {'type': 'integer'},
                'title': {'type': 'string'},
                'content': {'type': 'string'},
                'tags': {'type': 'string'},
                'created': {'type': 'date'}
            }
        }
    }
)

You can also define mappings when you create the index:

es = get_es()
es.indices.create(
    index='blog-index',
    body={
        'mappings': {
            'blog-entry-type': {
                'properties': {
                    'id': {'type': 'integer'},
                    'title': {'type': 'string'},
                    'content': {'type': 'string'},
                    'tags': {'type': 'string'},
                    'created': {'type': 'date'}
                }
            }
        }
    }
)

Note

If there's a possibility of a race condition between creating the index and defining the mapping and some document getting indexed, then it's good to create the index and define the mappings at the same time.

Indexing documents

Use .index() to index a document.

For example:

es = get_es()

entry = {'id': 1,
    'title': 'First post!',
    'content': '<p>First post!</p>',
    'tags': ['status', 'blog'],
    'created': '20130423T16:50:22'
    }

es.index(index='blog-index', doc_type='blog-entry-type', body=entry, id=1)

If you're indexing a bunch of documents at the same time, you should use elasticsearch.helpers.bulk_index().

For example:

from elasticsearch.helpers import bulk_index

es = get_es()

entries = [{ '_id': 42, ... }, { '_id': 47, ... }]

bulk_index(es, entries, index='blog-index', doc_type='blog-entry-type')

Deleting documents

You can delete documents with .delete().

For example:

es = get_es()

es.delete(index='blog-index', doc_type='blog-entry-type', id=1)

Refreshing

After you index documents, they're not available for searches until after the index is refreshed. By default, the index refreshes every second. If you need the documents to show up in searches before that, call indices.refresh().

For example:

es = get_es()

es.indices.refresh(index='blog-index')

Delete indexes

You can delete indexes with .indices.delete().

For example:

es = get_es()

es.indices.delete(index='blog-index')

Doing all of this with MappingTypes and Indexables

If you're using MappingTypes, then you can do much of the above using methods and classmethods on :py:class:`MappingType` and :py:class:`Indexable` classes. See :ref:`mapping-type-chapter` for more details.

Something went wrong with that request. Please try again.