## Elasticsearch: The Definitive Guide - Python

Following the examples in the book, here are Python snippets that achieve the same effect.

Documentation for the Python libs:

Low-level API:

https://elasticsearch-py.readthedocs.io/en/master/index.html

Expressive DSL API (more "Pythonic")

http://elasticsearch-dsl.readthedocs.io/en/latest/index.html

Github repo for DSL API:

https://github.com/elastic/elasticsearch-dsl-py


In [1]:
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, Q
from pprint import pprint

es = Elasticsearch(
    'localhost',
    # sniff before doing anything
    sniff_on_start=True,
    # refresh nodes after a node fails to respond
    sniff_on_connection_fail=True,
    # and also every 60 seconds
    sniffer_timeout=60
)

### Empty Search
From: https://www.elastic.co/guide/en/elasticsearch/guide/master/empty-search.html

>GET _search

In [2]:
res = es.search('_all') # same as es.search()

In [3]:
#from pprint import pprint
#pprint(res)

In [4]:
s = Search(using=es)
response = s.execute()
response

<Response: [<Hit(.kibana/config/5.2.2): {'defaultIndex': 'test', 'buildNum': 14723, 'discover:aggs:t...}>, <Hit(.kibana/index-pattern/test): {'title': 'test', 'fields': '[{"name":"tags","type":"string"...}>, <Hit(.kibana/index-pattern/megacorp): {'title': 'megacorp', 'fields': '[{"name":"last_name.keyword...}>, <Hit(.kibana/index-pattern/website): {'title': 'website', 'fields': '[{"name":"date","type":"date...}>, <Hit(us/tweet/14): {'date': '2014-09-24', 'user_id': 1, 'name': 'John Smith', '...}>, <Hit(gb/tweet/5): {'date': '2014-09-15', 'user_id': 2, 'name': 'Mary Jones', '...}>, <Hit(gb/tweet/9): {'date': '2014-09-19', 'user_id': 2, 'name': 'Mary Jones', '...}>, <Hit(us/tweet/8): {'date': '2014-09-18', 'user_id': 1, 'name': 'John Smith'}>, <Hit(us/tweet/10): {'date': '2014-09-20', 'user_id': 1, 'name': 'John Smith', '...}>, <Hit(us/tweet/12): {'date': '2014-09-22', 'user_id': 1, 'name': 'John Smith', '...}>]>

With timeout:

>GET /_search?timeout=10ms

In [5]:
res = es.search('_all', timeout='10ms') # same as es.search(timeout='10ms')

In [6]:
# To see the results, we can iterate:
# Elasticsearch pages the results (to 10 hits)
for hit in s:
    print(hit)

<Hit(.kibana/config/5.2.2): {'defaultIndex': 'test', 'buildNum': 14723, 'discover:aggs:t...}>
<Hit(.kibana/index-pattern/test): {'title': 'test', 'fields': '[{"name":"tags","type":"string"...}>
<Hit(.kibana/index-pattern/megacorp): {'title': 'megacorp', 'fields': '[{"name":"last_name.keyword...}>
<Hit(.kibana/index-pattern/website): {'title': 'website', 'fields': '[{"name":"date","type":"date...}>
<Hit(us/tweet/14): {'date': '2014-09-24', 'user_id': 1, 'name': 'John Smith', '...}>
<Hit(gb/tweet/5): {'date': '2014-09-15', 'user_id': 2, 'name': 'Mary Jones', '...}>
<Hit(gb/tweet/9): {'date': '2014-09-19', 'user_id': 2, 'name': 'Mary Jones', '...}>
<Hit(us/tweet/8): {'date': '2014-09-18', 'user_id': 1, 'name': 'John Smith'}>
<Hit(us/tweet/10): {'date': '2014-09-20', 'user_id': 1, 'name': 'John Smith', '...}>
<Hit(us/tweet/12): {'date': '2014-09-22', 'user_id': 1, 'name': 'John Smith', '...}>


### Multi-index, Multitype

First using the low-level API

In [7]:
#/_search
#Search all types in all indices
res = es.search('_all')

#/gb/_search
#Search all types in the gb index
res = es.search(index='gb')

#/gb,us/_search
#Search all types in the gb and us indices
res = es.search(index=['gb','us'])

#/g*,u*/_search
#Search all types in any indices beginning with g or beginning with u
res = es.search(index=['g*','u*'])

#/gb/user/_search
#Search type user in the gb index
res = es.search(index='gb', doc_type='user')

#/gb,us/user,tweet/_search
#Search types user and tweet in the gb and us indices
res = es.search(index=['g*','u*'], doc_type=['user', 'tweet'])
print(res['hits']['total'])

#/_all/user,tweet/_search
#Search types user and tweet in all indices
res = es.search(doc_type=['user', 'tweet'])
print(res['hits']['total'])

14
14


Next using the DSL, although similar for such basic searches

In [8]:
#/_search
#Search all types in all indices
s = Search(using=es)
response = s.execute()

#/gb/_search
#Search all types in the gb index
s = Search(using=es, index='gb')
response = s.execute()

#/gb,us/_search
#Search all types in the gb and us indices
s = Search(using=es, index=['gb','us'])
response = s.execute()

#/g*,u*/_search
#Search all types in any indices beginning with g or beginning with u
s = Search(using=es, index=['g*','u*'])
response = s.execute()

#/gb/user/_search
#Search type user in the gb index
s = Search(using=es, index=['g*','u*'], doc_type='user')
response = s.execute()


#/gb,us/user,tweet/_search
#Search types user and tweet in the gb and us indices
s = Search(using=es, index=['g*','u*'], doc_type=['user','tweet'])
response = s.execute()

#/_all/user,tweet/_search
#Search types user and tweet in all indices
s = Search(using=es, doc_type=['user','tweet'])
response = s.execute()
print(response['hits']['total'])
print(len(res['hits']['hits']))

14
10


### Pagination

The last search produced a hits total of 14, but there are only 10 documents in the array.

This is due to pagination, so we need to use pointers:

>GET /_search?size=5

>GET /_search?size=5&from=5

>GET /_search?size=5&from=10


In [9]:
# For search API:
res = es.search(doc_type=['user', 'tweet'], from_=5, size=5)

In [10]:
print(res['hits']['total'])
print(len(res['hits']['hits']))

14
5


### Search Lite

These initial searches all use the Lucene Query String Syntax.

>GET /_all/tweet/_search?q=tweet:elasticsearch

For the low-level API, we use the q parameter:

In [11]:
res = es.search(doc_type=['tweet'], q='tweet:elasticsearch')
print('Total hits:{}\n'.format(res['hits']['total']))
pprint(res['hits']['hits'][0])

Total hits:7

{'_id': '13',
 '_index': 'gb',
 '_score': 0.7081689,
 '_source': {'date': '2014-09-23',
             'name': 'Mary Jones',
             'tweet': 'So yes, I am an Elasticsearch fanboy',
             'user_id': 2},
 '_type': 'tweet'}


For the DSL, the intended purpose is to avoid the query string syntax and use the query string language instead. For completeness, here is an equivalent script:

In [12]:
s = Search(using=es, doc_type=['tweet']) \
    .query('match', tweet='elasticsearch')
response = s.execute()
print('Total hits:{}\n'.format(response['hits']['total']))
pprint(response['hits']['hits'][0])

Total hits:7

{'_id': '13', '_type': 'tweet', '_source': {'date': '2014-09...}


However, notice that the pprint has not given us the same JSON response as the above query string syntax result via the low-level API. This is because the Search() object returns an array of Hit objects. These are constructed so as to expose the individual fields as object attributes (__getattr__)

In [13]:
for hit in response:
    print(hit.tweet)

So yes, I am an Elasticsearch fanboy
However did I manage before Elasticsearch?
The Elasticsearch API is really easy to use
Elasticsearch surely is one of the hottest new NoSQL products
Elasticsearch means full text search has never been so easy
Elasticsearch is built for the cloud, easy to scale
Elasticsearch and I have left the honeymoon stage, and I still love her.


### The _all field

> GET /_search?q=mary

In [14]:
res = es.search(q='mary')
print('Total hits:{}\n'.format(res['hits']['total']))
pprint(res['hits']['hits'][0])

Total hits:8

{'_id': '4',
 '_index': 'us',
 '_score': 0.6650044,
 '_source': {'date': '2014-09-14',
             'name': 'John Smith',
             'tweet': '@mary it is not just text, it does everything',
             'user_id': 1},
 '_type': 'tweet'}


For the DSL, we need to call the _all field explicitly

In [15]:
s = Search(using=es) \
    .query('match', _all='mary')
response = s.execute()
print('Total hits:{}\n'.format(response['hits']['total']))
print(response[0].tweet)

Total hits:8

@mary it is not just text, it does everything


> +name:(mary john) +date:>2014-09-10 +(aggregations geo)

In [16]:
res = es.search(q='+name:(mary john) +date:>2014-09-10 +(aggregations geo)')
print('Total hits:{}\n'.format(res['hits']['total']))
pprint(res['hits']['hits'][0])

Total hits:1

{'_id': '9',
 '_index': 'gb',
 '_score': 2.3835227,
 '_source': {'date': '2014-09-19',
             'name': 'Mary Jones',
             'tweet': 'Geo-location aggregations are really cool',
             'user_id': 2},
 '_type': 'tweet'}
