# Elasticsearch

In this notebook we will setup an Elasticsearch server, read in Shakespeares works, and analyze them to unerstand term vectors.

You may mix direct API calls, the Python API, or url calls from Python. Whatever gives you access to the data.



### Install the necessary elasticsearch Python packages

In [None]:
!pip install 'elasticsearch<7.14.0'

# docs are here https://elasticsearch-py.readthedocs.io/en/v7.13.4/#

### Import packages

In [None]:
import os
import time
from elasticsearch import Elasticsearch
import numpy as np
import pandas as pd

## Setup Elasticsearch Instance


In [None]:
%%bash

wget -q https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-oss-7.9.2-linux-x86_64.tar.gz
wget -q https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-oss-7.9.2-linux-x86_64.tar.gz.sha512
tar -xzf elasticsearch-oss-7.9.2-linux-x86_64.tar.gz
sudo chown -R daemon:daemon elasticsearch-7.9.2/
shasum -a 512 -c elasticsearch-oss-7.9.2-linux-x86_64.tar.gz.sha512 

Run the instance as a daemon (background) process

In [None]:
%%bash --bg

sudo -H -u daemon elasticsearch-7.9.2/bin/elasticsearch

In [None]:
# Sleep for few seconds to let the instance start.  - here in case you are running end-to-end
time.sleep(20)

query the base endpoint to retrieve information about the cluster.

In [None]:
%%bash

curl -sX GET "localhost:9200/"

### Data

Get the Shakespeare data 

In [None]:
%%bash 

wget 'https://download.elastic.co/demos/kibana/gettingstarted/shakespeare_6.0.json' -q

In [None]:
%%bash

head -5 shakespeare_6.0.json

In [None]:
from elasticsearch import helpers, Elasticsearch
import csv

ES_NODES = "http://localhost:9200"

es = Elasticsearch(hosts = [ES_NODES])
index_name = 'shakespeare'
doctype = 'shakespeare_works'
es.indices.delete(index=index_name, ignore=[400, 404])
es.indices.create(index=index_name, ignore=400, 
      body={
              "mappings": {
                  "properties" : {
                  "speaker": 
                    {"type": "keyword"},
                  "play_name": 
                    {"type": "keyword"},
                  "line_id": 
                    {"type": "integer"},
                  "speech_number": 
                    {"type": "integer"}, 
                  "text_entry":
                    {"term_vector": "with_positions_offsets",
                     "type": "text", 
                     "fielddata": True}
            }
      }}
  )
  

Bulk upload the data

In [None]:
! curl -s -q -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/shakespeare/_bulk?pretty' --data-binary @shakespeare_6.0.json 

In [None]:
! curl http://localhost:9200/_cat/indices

### Extract term vectors

### Find a rare term

### Search for the term
