# ElasticSearch installation notes

* To download ElasticSearch on Linix:  
`wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.9.2-linux-x86_64.tar.gz`

Next, uncompress the archive file:

`tar -xzf elasticsearch-8.9.2-linux-x86_64.tar.gz`

Then, set up the `ES_HOME` environment variable to the directory of elastic-search install. Finally, enter into the `$ES_HOME/bin` directory, and start the elasticsearch server with the command:
`./elasticsearch`

In the output logs note down the password that gets generated and store in `ELASTIC_PASSWORD` environment variable


# Python ES installation

In [2]:
#!pip install elasticsearch

Collecting elasticsearch
  Downloading elasticsearch-8.9.0-py3-none-any.whl (395 kB)
     -------------------------------------- 395.5/395.5 kB 4.9 MB/s eta 0:00:00
Collecting elastic-transport<9,>=8
  Downloading elastic_transport-8.4.0-py3-none-any.whl (59 kB)
     ---------------------------------------- 59.5/59.5 kB ? eta 0:00:00
Installing collected packages: elastic-transport, elasticsearch
Successfully installed elastic-transport-8.4.0 elasticsearch-8.9.0


# Sample python code

## Creating ES client

In [2]:
from elasticsearch import Elasticsearch
import os

# Password for the 'elastic' user generated by Elasticsearch
ELASTIC_PASSWORD = os.getenv("ELASTIC_PASSWORD")
ES_HOME = os.getenv("ES_HOME")

# Create the client instance
client = Elasticsearch(
    "https://localhost:9200",
    ca_certs=ES_HOME + "/config/certs/http_ca.crt",
    basic_auth=("elastic", ELASTIC_PASSWORD)
)

# Successful response!
client.info()
# {'name': 'instance-0000000000', 'cluster_name': ...}

ObjectApiResponse({'name': 'snow-mountain', 'cluster_name': 'elasticsearch', 'cluster_uuid': '5_UH2ScBTN6uswcLp5Zkwg', 'version': {'number': '8.9.2', 'build_flavor': 'default', 'build_type': 'tar', 'build_hash': 'e8179018838f55b8820685f92e245abef3bddc0f', 'build_date': '2023-08-31T02:43:14.210479707Z', 'build_snapshot': False, 'lucene_version': '9.7.0', 'minimum_wire_compatibility_version': '7.17.0', 'minimum_index_compatibility_version': '7.0.0'}, 'tagline': 'You Know, for Search'})

## Creating an index and adding a document

In [3]:
from datetime import datetime

doc = {
    'author': 'author_name',
    'text': 'Interesting content...',
    'timestamp': datetime.now(),
}
resp = client.index(index="test-index", id=1, document=doc)
print(resp['result'])

updated


## Getting a document

In [4]:
resp = client.get(index="test-index", id=1)
print(resp['_source'])

{'author': 'author_name', 'text': 'Interesting content...', 'timestamp': '2023-09-15T20:17:11.789704'}


## Refreshing an index

In [5]:
client.indices.refresh(index="test-index")

ObjectApiResponse({'_shards': {'total': 2, 'successful': 1, 'failed': 0}})

## Updating a document

In [6]:
doc = {
    'author': 'author_name',
    'text': 'Interesting modified content...',
    'timestamp': datetime.now(),
}
resp = client.update(index="test-index", id=1, doc=doc)
print(resp['result'])

updated


## Adding another document to same index

In [7]:
import os
from tika import parser

In [8]:
path = '/home/asif/Downloads/docs'
parsed_docs = []
files = (file for file in os.listdir(path) http://localhost:8888/notebooks/elastic-search-notes.ipynb#
         if os.path.isfile(os.path.join(path, file)))
for file in files:
    file_with_path = os.path.join(path, file)
    parsed_docs.append(parser.from_file(file_with_path))

In [9]:
doc = {
    'author': 'Asif Qamar',
    'text': parsed_docs[0]['content'],
    'timestamp': datetime.now(),
}
resp = client.index(index="test-index", id=2, document=doc)
print(resp['result'])

updated


## Searching for all documents in the index by giving a match_all

In [10]:
resp = client.search(index="test-index", query={"match_all": {}})
print("Got %d Hits:" % resp['hits']['total']['value'])
for hit in resp['hits']['hits']:
    print("%(timestamp)s %(author)s: " % hit["_source"])

Got 2 Hits:
2023-09-15T20:17:14.506734 author_name: 
2023-09-15T20:17:16.294440 Asif Qamar: 


## Searching for all documents matching some key words in the text

In [11]:
resp = client.search(index="test-index", query={"match": {"text": "Support Vector"}})
print("Got %d Hits:" % resp['hits']['total']['value'])
for hit in resp['hits']['hits']:
    print("%(timestamp)s %(author)s: " % hit["_source"])

Got 1 Hits:
2023-09-15T20:17:16.294440 Asif Qamar: 
