<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Inserting-a-Document" data-toc-modified-id="Inserting-a-Document-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Inserting a Document</a></span></li><li><span><a href="#Retrieving-a-Document" data-toc-modified-id="Retrieving-a-Document-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Retrieving a Document</a></span></li><li><span><a href="#Deleting-a-Document" data-toc-modified-id="Deleting-a-Document-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Deleting a Document</a></span></li><li><span><a href="#Search-Lite:" data-toc-modified-id="Search-Lite:-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Search Lite:</a></span><ul class="toc-item"><li><span><a href="#match-operator:" data-toc-modified-id="match-operator:-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span><code>match operator:</code></a></span></li><li><span><a href="#bool-operator:" data-toc-modified-id="bool-operator:-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span><code>bool operator:</code></a></span></li><li><span><a href="#Filter-operator:" data-toc-modified-id="Filter-operator:-4.3"><span class="toc-item-num">4.3&nbsp;&nbsp;</span>Filter operator:</a></span></li></ul></li><li><span><a href="#Full-text-search" data-toc-modified-id="Full-text-search-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Full text search</a></span></li><li><span><a href="#Phrase-Search" data-toc-modified-id="Phrase-Search-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Phrase Search</a></span></li><li><span><a href="#Aggregations" data-toc-modified-id="Aggregations-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>Aggregations</a></span></li></ul></div>

# Elasticsearch Using Python

In [2]:
# Import Elasticsearch package 
from elasticsearch import Elasticsearch 

In [3]:
# Connect to the elastic cluster
es=Elasticsearch([{'host':'localhost','port':9200}])
es

<Elasticsearch([{'host': 'localhost', 'port': 9200}])>

`Elasticsearch` is document oriented, meaning that it stores entire object or documents. 

It not only stores them, but also indexes the content of each document in order to make them searchable. In Elasticsearch you `index`, `search`, `sort` and `filter` documents.

In [4]:
e1={
    "first_name":"nitin",
    "last_name":"panwar",
    "age": 27,
    "about": "Love to play cricket",
    "interests": ['sports','music'],
}

In [5]:
print(e1)

{'first_name': 'nitin', 'last_name': 'panwar', 'age': 27, 'about': 'Love to play cricket', 'interests': ['sports', 'music']}


## Inserting a Document

In [6]:
#Now let's store this document in Elasticsearch 
res = es.index(index='megacorp',doc_type='employee',id=1,body=e1)

Simple! There was no need to perform any administrative tasks first, like creating an index or specifying the type of data that each field contains. We could just index a document directly. Elasticsearch ships with defaults for everything, so all the necessary administration tasks were taken care of in the background, using default values.

In [8]:
# Let's insert some more documents
e2={
    "first_name" :  "Jane",
    "last_name" :   "Smith",
    "age" :         32,
    "about" :       "I like to collect rock albums",
    "interests":  [ "music" ]
}
e3={
    "first_name" :  "Douglas",
    "last_name" :   "Fir",
    "age" :         35,
    "about":        "I like to build cabinets",
    "interests":  [ "forestry" ]
}

res=es.index(index='megacorp',doc_type='employee',id=2,body=e2)
print(res['result'])
res=es.index(index='megacorp',doc_type='employee',id=3,body=e3)
print(res['result'])

## Retrieving a Document
This is easy in Elasticsearch. We simply execute an HTTP GET request and specify the address of the document — the index, type, and ID. Using those three pieces of information, we can return the original JSON document

In [11]:
res=es.get(index='megacorp',doc_type='employee',id=3)
print(res)

{'_index': 'megacorp', '_type': 'employee', '_id': '3', '_version': 1, '_seq_no': 3, '_primary_term': 1, 'found': True, '_source': {'first_name': 'Douglas', 'last_name': 'Fir', 'age': 35, 'about': 'I like to build cabinets', 'interests': ['forestry']}}


you will get the actual document in `_source` field

In [12]:
print(res['_source'])

{'first_name': 'Douglas', 'last_name': 'Fir', 'age': 35, 'about': 'I like to build cabinets', 'interests': ['forestry']}


## Deleting a Document

In [13]:
res=es.delete(index='megacorp',doc_type='employee',id=3)
print(res['result'])

deleted


In [16]:
# validate in Elasticsearch
res= es.search(index='megacorp',body={'query':{'match_all':{}}})
print('Got {} hits:'.format(res['hits']['total']['value']))

Got 2 hits:


## Search Lite:
A GET is fairly simple — you get back the document that you ask for. Let’s try something a little more advanced, like a simple search!

In [49]:
res= es.search(index='megacorp')

In [50]:
res.keys()

dict_keys(['took', 'timed_out', '_shards', 'hits'])

In [51]:
res['hits']['hits']

[{'_index': 'megacorp',
  '_type': 'employee',
  '_id': '1',
  '_score': 1.0,
  '_source': {'first_name': 'nitin',
   'last_name': 'panwar',
   'age': 27,
   'about': 'Love to play cricket',
   'interests': ['sports', 'music']}},
 {'_index': 'megacorp',
  '_type': 'employee',
  '_id': '4',
  '_score': 1.0,
  '_source': {'first_name': 'asd',
   'last_name': 'pafdfd',
   'age': 27,
   'about': 'Love to play football',
   'interests': ['sports', 'music']}},
 {'_index': 'megacorp',
  '_type': 'employee',
  '_id': '2',
  '_score': 1.0,
  '_source': {'first_name': 'Jane',
   'last_name': 'Smith',
   'age': 32,
   'about': 'I like to collect rock albums',
   'interests': ['music']}}]

### `match operator:`

In [27]:
res= es.search(index='megacorp',body={'query':{'match':{'first_name':'nitin'}}})
res['hits']['hits']

[{'_index': 'megacorp',
  '_type': 'employee',
  '_id': '1',
  '_score': 1.2039728,
  '_source': {'first_name': 'nitin',
   'last_name': 'panwar',
   'age': 27,
   'about': 'Love to play cricket',
   'interests': ['sports', 'music']}}]

In [37]:
import pandas as pd
pd.DataFrame.from_dict(res['hits']['hits'][0]['_source'])

Unnamed: 0,first_name,last_name,age,about,interests
0,nitin,panwar,27,Love to play cricket,sports
1,nitin,panwar,27,Love to play cricket,music


### `bool operator:`
bool takes a dictionary containing at least one of must, should, and must_not, each of which takes a list of matches or other further search operators.

In [38]:
res= es.search(index='megacorp',body={
        'query':{
            'bool':{
                'must':[{
                        'match':{
                            'first_name':'nitin'
                        }
                    }]
            }
        }
    })
print(res['hits']['hits'])

[{'_index': 'megacorp', '_type': 'employee', '_id': '1', '_score': 1.2039728, '_source': {'first_name': 'nitin', 'last_name': 'panwar', 'age': 27, 'about': 'Love to play cricket', 'interests': ['sports', 'music']}}]


### Filter operator:

We still want to find all employees with a first name of nitin, but we want only employees who are older than 30. Our query will change a little to accommodate a filter, which allows us to execute structured searches efficiently:

In [39]:
res= es.search(index='megacorp',body={
        'query':{
            'bool':{
                'must':{
                    'match':{
                        'first_name':'nitin'
                    }
                },
                "filter":{
                    "range":{
                        "age":{
                            "gt":25
                        }
                    }
                }
            }
        }
    })
print(res['hits']['hits'])

[{'_index': 'megacorp', '_type': 'employee', '_id': '1', '_score': 1.2039728, '_source': {'first_name': 'nitin', 'last_name': 'panwar', 'age': 27, 'about': 'Love to play cricket', 'interests': ['sports', 'music']}}]


In [40]:
res= es.search(index='megacorp',body={
        'query':{
            'bool':{
                'must':{
                    'match':{
                        'first_name':'nitin'
                    }
                },
                "filter":{
                    "range":{
                        "age":{
                            "gt":27
                        }
                    }
                }
            }
        }
    })
print(res['hits']['hits'])

[]


## Full text search

In [41]:
# Before starting this next type of search let me insert one more documnt.
e4={
    "first_name":"asd",
    "last_name":"pafdfd",
    "age": 27,
    "about": "Love to play football",
    "interests": ['sports','music'],
}
res=es.index(index='megacorp',doc_type='employee',id=4,body=e4)

In [43]:
res= es.search(index='megacorp',body={
        'query':{
            'match':{
                "about":"play cricket"
            }
        }
})

In [44]:
for hit in res['hits']['hits']:
    print( hit['_source']['about'] )
    print( hit['_score'])
    print( '**********************')

Love to play cricket
2.4633062
**********************
Love to play football
0.9534808
**********************


In above example it is returning two records but scores are differnt.

## Phrase Search
Finding individual words in a field is all well and good, but sometimes you want to match exact sequence of words of phrases.

In [46]:
res= es.search(index='megacorp',body={
        'query':{
            'match_phrase':{
                "about":"play cricket"
            }
        }
    })
for hit in res['hits']['hits']:
    print( hit['_source']['about'] )
    print( hit['_score'])
    print( '**********************')

Love to play cricket
1.5408845
**********************


## Aggregations