## Introduction to Databases

### Using elasticsearch

Based on [this](https://medium.com/naukri-engineering/elasticsearch-tutorial-for-beginners-using-python-b9cb48edcedc) and [this](https://www.elastic.co/guide/en/elasticsearch/reference/current/elasticsearch-intro.html) posts

Installing: https://www.willandskill.se/en/install-elasticsearch-6-x-on-ubuntu-18-04-lts/

!sudo pip install -U elasticsearch

In [1]:
!curl -XPUT -H "Content-Type: application/json" http://localhost:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

{"acknowledged":true}

In [2]:
# Import Elasticsearch package 
from elasticsearch import Elasticsearch 

### What is Elasticsearch?

You know, for search (and analysis)

Elasticsearch is the distributed search and analytics engine at the heart of the [Elastic Stack](https://www.elastic.co/pt/elk-stack). 


Elasticsearch provides near real-time search and analytics for all types of data. Whether you have structured or unstructured text, numerical data, or geospatial data, Elasticsearch can efficiently store and index it in a way that supports fast searches. You can go far beyond simple data retrieval and aggregate information to discover trends and patterns in your data. And as your data and query volume grows, the distributed nature of Elasticsearch enables your deployment to grow seamlessly right along with it.

While not every problem is a search problem, Elasticsearch offers speed and flexibility to handle data in a wide variety of use cases:

    Add a search box to an app or website
    Store and analyze logs, metrics, and security event data
    Use machine learning to automatically model the behavior of your data in real time
    Automate business workflows using Elasticsearch as a storage engine
    Manage, integrate, and analyze spatial information using Elasticsearch as a geographic information system (GIS)
    Store and process genetic data using Elasticsearch as a bioinformatics research tool

We’re continually amazed by the novel ways people use search. But whether your use case is similar to one of these, or you’re using Elasticsearch to tackle a new problem, the way you work with your data, documents, and indices in Elasticsearch is the same.

In [3]:
# Connect to the elastic cluster
es = Elasticsearch([{'host':'localhost','port':9200}])
es

<Elasticsearch([{'host': 'localhost', 'port': 9200}])>

Elasticsearch is document oriented, meaning that it stores entire object or documents. It not only stores them, but also indexes the content of each document in order to make them searchable. In Elasticsearch you index, search,sort and filter documents.

Elasticsearch uses JSON as the serialisation format for the documents.

Now let’s start by indexing the employee documents.

The act of storing data in Elasticsearch is called indexing. An Elasticsearch cluster can contain multiple indices, which in turn contain multiple types. These types hold multiple documents, and each document has multiple fields.

In [4]:
e1 = {"first_name":"nitin",
      "last_name":"panwar",
      "age": 27,
      "about": "Love to play cricket",
      "interests": ['sports','music'],
     }

print(e1)

{'first_name': 'nitin', 'last_name': 'panwar', 'age': 27, 'about': 'Love to play cricket', 'interests': ['sports', 'music']}


### Inserting a document:

In [5]:
#Now let's store this document in Elasticsearch 

res = es.index(index='emap',
               doc_type='employee',
               id=1,
               body=e1)

In [6]:
# Let's insert some more documents
e2 = {"first_name" :  "Jane",
      "last_name" :   "Smith",
      "age" :         32,
      "about" :       "I like to collect rock albums",
      "interests":  ["music"]
     }

e3 = {"first_name" :  "Douglas",
      "last_name" :   "Fir",
      "age" :         35,
      "about":        "I like to build cabinets",
      "interests":  ["forestry"]}

res = es.index(index='emap',
               doc_type='employee',
               id=2,
               body=e2)

res = es.index(index='emap',
               doc_type='employee',
               id=3,
               body=e3)

AuthorizationException: AuthorizationException(403, 'cluster_block_exception', 'blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];')

In [None]:
print(res)

### Retrieving a Document:

This is easy in Elasticsearch. We simply execute an HTTP GET request and specify the address of the document — the index, type, and ID. Using those three pieces of information, we can return the original JSON document.

In [None]:
res = es.get(index='megacorp',
             doc_type='employee',
             id=3)

print(res)

You will get the actual document in ‘_source’ field

In [None]:
print(res['_source'])

### Deleting a document:

In [None]:
res = es.delete(index='megacorp',
                doc_type='employee',
                id=3)

print(res['result'])

Now let’s validate it in Elasticsearch

In [None]:
res = es.search(index='megacorp',
                body={'query':{'match_all':{}}})

print('Got %d hits:' %res['hits']['total'])

### Search Lite:

A GET is fairly simple — you get back the document that you ask for. Let’s try something a little more advanced, like a simple search!

In [None]:
res = es.search(index='megacorp',
                body={'query':{}})

print(res['hits']['hits'])

Now let’s search for the user name who has nitin in his first name.

### match operator:

In [None]:
res = es.search(index='megacorp',
                body={'query':{'match':{'first_name':'nitin'}}})

print(res['hits']['hits'])

### bool operator:

bool takes a dictionary containing at least one of must, should, and must_not, each of which takes a list of matches or other further search operators.

In [None]:
res = es.search(index='megacorp',
                body={'query':{'bool':{'must':[{'match':{'first_name':'nitin'}}]}}}
               )
print(res['hits']['hits'])

### Filter operator:

Let’s make the search a little more complicated. We still want to find all employees with a first name of nitin, but we want only employees who are older than 30. Our query will change a little to accommodate a filter, which allows us to execute structured searches efficiently:

In [None]:
res= es.search(index='megacorp',
               body={'query':{'bool':{'must':{'match':{'first_name':'nitin'}},
                                      "filter":{"range":{"age":{"gt":25}}}}}}
              )
print(res['hits']['hits'])

### Full text search

The searches so far have been simple.  
Let’s try more advanced full text search. Before starting this next type of search let me insert one more document.

In [None]:
e4 = {"first_name":"asd",
      "last_name":"pafdfd",
      "age": 27,
      "about": "Love to play football",
      "interests": ['sports','music'],}

res = es.index(index='megacorp',
               doc_type='employee',
               id=4,
               body=e4)

print(res['created'])

In [None]:
res = es.search(index='megacorp',
                doc_type='employee',
                body={'query':{'match':{"about":"play cricket"}}})

for hit in res['hits']['hits']:
    print(hit['_source']['about']) 
    print(hit['_score'])
    print('**********************')

### Phrase Search

Finding individual words in a field is all well and good, but sometimes you want to match exact sequence of words of phrases.

In [None]:
res = es.search(index='megacorp',
                doc_type='employee',
                body={'query':{'match_phrase':{"about":"play cricket"}}})

for hit in res['hits']['hits']:
    print(hit['_source']['about']) 
    print(hit['_score'])
    print()'**********************')

### Aggregations

Elasticsearch has functionality called aggregations, which allowed you to generate sophisticated analytics over your data. It is similar to Group By in SQL, but much more powerful.



In [None]:
res= es.search(index='megacorp',
               doc_type='employee',
               body={"aggs": {"all_interests": {"terms": {"field": "interests"}}}}
              )