Skip to content

Declarative mapping

Rob Hudson edited this page Dec 8, 2013 · 4 revisions

Proposal

Declarative mapping will allow users to define their Elasticsearch document type mappings in a way that's Pythonic and easy to read.

Work is currently being done in the branch: https://github.com/robhudson/elasticutils/compare/declarative-mapping

What it will look like

Here is an example of what a Book document type might look like using various fields:

class BookDocumentType(DocumentType):

    id = fields.IntegerField(type='long')
    name = fields.StringField(analyzer='snowball')
    name_sort = fields.StringField(index='not_analyzed')
    authors = fields.StringField(is_multivalued=True)
    published_date = fields.DateField()
    price = fields.DecimalField()
    is_autographed = fields.BooleanField()
    sales = fields.IntegerField()

The various Field types map to Elasticsearch field types. The DocumentType base class and the various Field classes will be able to generate the JSON representation of the mapping itself. So, for example, calling BookDocumentType().get_mapping() would produce the following JSON that can be passed to Elasticsearch:

{
    "properties": {
        "id": {
            "type": "long"
        },
        "name": {
            "type": "string",
            "analyzer": "snowball"
        },
        "name_sort": {
            "index": "not_analyzed",
            "type": "string"
        },
        "authors": {
            "type": "string"
        },
        "published_date": {
            "type": "date"
        },
        "price": {
            "type": "string"
        },
        "is_autographed": {
            "type": "boolean"
        },
        "sales": {
            "type": "integer"
        }
    }
}

Things to do

  • Support the object type
  • Start thinking about other Elasticsearch features (parent/child, nested docs, etc)
  • Consider attaching some of the things that are currently on Indexable and MappingType to this.

Idea for the object type

The object type could be treated as a related mapping type complete with its own fields, but included in the main mapping type. For example:

class BookDocumentType(DocumentType):
    id = IntegerField(type='long')
    title = StringField(analyzer='snowball')
    authors = ObjectField(class=AuthorDocumentType, is_multivalued=True)

class AuthorDocumentType(DocumentType):
    first_name = StringField(analyzer='snowball')
    last_name = StringField(analyzer='snowball')

While scanning to produce the mapping we would find the ObjectField, see the class and nest the defined class in the properties using "type": "object".

It's possible we should subclass ObjectType rather than DocumentType to make it clearer and attach any special handling code for this class.

Clone this wiki locally