- <b>Create ElasticSearch Connection: </b> `es = Elasticsearch(ELASTIC_HOSTNAME, port=ELASTIC_PORT, timeout=ELASTIC_CONN_TIMEOUT)`

- Elasticsearch('http://user:pass@localhost:9200')


- <b> Create ES index :</b> `es.indices.create(index='example_index')`

- <b> Add Single data to ES:</b>
   
    ```  python
    doc={
    'name':11,
    'details': [
    {
    'age':2.10,
    'adult':False
    },
        { 
            'extra':1 
        }
    ]
}
res = es.index(index="example_index", id=1, body=doc)
print(res['result'])```

- <b> Add Aggregation </b>: <br/>
``` json
GET my-index/_search
{
  "size":0,
    "aggs": {
    "distinct_state": {
      "terms": {
        "field": "address.state",
        "size": 10000
      }
    }
  }
}
```

```python
agg = A('terms', field='address.state', size=100)
    s.aggs.bucket('by_state', agg)
    s = s.execute()```

#### Querying data using python
```python
es = Elasticsearch(ELASTIC_HOSTNAME, port=ELASTIC_PORT, timeout=ELASTIC_CONN_TIMEOUT)
s = Search(using=es, index=ELASTIC_INDEX).extra(size=10000)
must={ 'match': {
                        "address.state": 'Kashmir'
                    }
                }

Query = s.query(Q('bool', must=must))
res=Query.execute()

es_data = res.to_dict()['hits']['hits']
data = [es_obj['_source'] for es_obj in es_data]
```

#### Create Mappings
``` json
PUT example_index
{
  "mappings": {
    "properties": {
      "name":{
        "type": "text"
      },
      "details" : {
  "type" : "object",
  "properties" : {
    "age" : { "type" : "integer"},
    "adult" : { "type" : "boolean"}
  }}}
  }
}
```

#### Elastic terms
- It is used to search in a list:
```json
GET /_search
{
  "query": {
    "terms": {
      "user.id": [ "kimchy", "elkbee" ],
    }
  }
}```

`
qry = Search(using = es_client, index=my_index).query("terms",id=[3,1,22,56]).extra(size=10000)
`

### BULK Import

`doc=[
    {
    '_op_type': 'delete',
    '_id': 10,
},
{
    '_op_type': 'update',
    '_index': 'my_index',
    '_id': 4,
   'doc':{
       'att': value
   }
}
]
`

`For new records, don't use any op_type or 'doc' `
```python
batch_size=10
for i in range(0,len(li),batch_size):
    print("Toal Imports done: ",i)
    chunk=li[i:i+batch_size]
    body = []
    for entry in chunk:
        body.append({'index': {'_id': entry['id']}})
        body.append(entry)
    # using bulk to import data
    es.bulk(index=ELASTIC_INDEX_, body=body)

    
from elasticsearch.helpers import bulk
for j in range(50):
    doc=[]
    for i in range(100):
        fake = Faker()
        doc.append({
        "_id":(j*100)+i,
        "name":fake.name(),
        "age":max((i%100),10),
        "state":fake.state(),
        "year":fake.year()
        })
    res=bulk(es_client, doc, index="my_index")
    print(res)
```


### ElasticSearch Automatic Backup
- Go to kibana
- Go to Stack Management
- Go to Snapshot & Restore
- Create only one repository
- One repository can have multiple policies

## ElasticSearch Aggregations

- select state,count(*) from table group by state order by count(*) desc limit 10

- Metric Aggregations
`Numerical Aggregations like sum, avg, cardinality`


- Bucket Aggregations
`Same as Group BY`

```python
GET my_index/_search{
    "size:0",
    "aggs":{
        # aggregation name
    "group_by_state":{
        "terms":{
        "field":"state.keyword"
    }
    }
    }
    }
```

- select gender,avg(age) from table group by state order by avg(age) desc limit 10
```python
GET my_index/_search{
    "size:0",
    "aggs":{
        # aggregation name
    "group_by_gender":{
        "terms":{
        "field":"gender.keyword",
            "order":{
                    "avg_age":"desc"
            }
    },
       "aggs":{
           # inner aggregation name
           "avg_age":{
                   "avg":{
                       "field":"age"
                   }
           }
       }
     }   
    }
    }
```



### Python Aggregations
- aggregations are directly applied on search
- bucket aggregations also give `doc_count` for each group/bucket
```python
agg = A('terms', field='state.keyword', size=100)
s.aggs.bucket('by_state', agg)
res=s.execute()
res.to_dict()['aggregations']['by_state']['buckets']
```

- use metric with bucket
```python
agg = A('terms', field='state.keyword', size=100,order={'avg_age':'desc'})
s.aggs.bucket('by_state', agg).metric('avg_age','avg',field='age')
                OR
search.aggs.bucket('by_state','terms' ,field='state.keyword',size=100,order={'_term':'desc'}).metric('total_age', 'sum', field='age')
```

- `AVG takes into account none as well, that is records which don't have that row`
-  `order={'_key': 'desc'}`

### Imp elasticsearch points

- `match` breaks the strimg into tokens i.e 'rafiq nazir' will be broke into 2 strings.
if we want the exact match we should use `match_phrase`
`match` will search for 'rafiq' or 'nazir' in text while `match_phrase` searches for `rafiq nazir`

- use `nested` for lists

- use `column.keyword` for exact match


- for field within another field use : ``outerfield__innerfield` or `puterfield.innerfield`

### ElasticSearch Queries

- Get data where name is rafiq <br>
`search.query("match", name__keyword="rafiq")`  
`search.filter("term", state__keyword="New York")` <br>
<b> we can also use match_phrase is used for exact match</b> <br><br>

- Where attribute is present
`search.filter(Q("exists",field="age"))`

- Multiple should queries
``` python
s = s.filter('match',state__keyword='Texas')
should=[Q('match',age=14),Q('match',name__keyword="Rafiq Nazir")]
Query = s.query(Q('bool', should=should))
```
`we can also add must after should`
<br><br>

- Range Query
```python
s=s.query(Q('range',age={'gte':40,'lte':50}))
s=s.filter('range',age={'gte':40,'lte':50})
```

In [277]:
from elasticsearch import Elasticsearch
from faker import Faker
from elasticsearch_dsl import Search,Q,A

ELASTIC_INDEX='my_index'
es_client = Elasticsearch('http://elastic:elastic@localhost:9200')


In [30]:
es = Elasticsearch()

In [377]:




s = Search(using = es_client, index='my_index')
qry=s.query("match",id=101010)

In [380]:
res = s.execute()

In [381]:
res.to_dict()['hits']['hits']

[{'_index': 'my_index',
  '_id': '5',
  '_score': 1.0,
  '_source': {'name': 'Suzanne Chandler',
   'age': 10,
   'state': 'Louisiana',
   'year': '1996'}},
 {'_index': 'my_index',
  '_id': '6',
  '_score': 1.0,
  '_source': {'name': 'Shawn Scott',
   'age': 10,
   'state': 'North Carolina',
   'year': '1971'}},
 {'_index': 'my_index',
  '_id': '7',
  '_score': 1.0,
  '_source': {'name': 'Patricia Mitchell',
   'age': 10,
   'state': 'Alabama',
   'year': '2013'}},
 {'_index': 'my_index',
  '_id': '8',
  '_score': 1.0,
  '_source': {'name': 'Christopher Pearson DDS',
   'age': 10,
   'state': 'Mississippi',
   'year': '2019'}},
 {'_index': 'my_index',
  '_id': '11',
  '_score': 1.0,
  '_source': {'name': 'Samantha Gray',
   'age': 11,
   'state': 'North Carolina',
   'year': '1999'}},
 {'_index': 'my_index',
  '_id': '12',
  '_score': 1.0,
  '_source': {'name': 'Virginia Kennedy',
   'age': 12,
   'state': 'Idaho',
   'year': '1984'}},
 {'_index': 'my_index',
  '_id': '13',
  '_score':

In [512]:
doc=[
{
        '_id':222222,
       'extra_field':'elastc3',
       'id':101010,
        'state': 'Ohio',
        'age':0
},
    {
        '_id':222229,
       'extra_field':'elastc3',
       'id':101010,
        'state': 'Ohio',
        'age':0
},
    {
        '_id':2222220,
       'extra_field':'elastc3',
       'id':101010,
        'state': 'Ohio',
        'age':0
}
]

In [513]:
res=bulk(es_client, doc, index="my_index")
print(res)

(3, [])


In [272]:
search = Search(using=es_client,index="my_index")
s=search.filter(Q("exists",field="extra_field"))

In [273]:
res = s.execute()

In [276]:
for r in res.to_dict()['hits']['hits']:
    print(r['_source'])

{'extra_field': 'elastc1', 'age': 10, 'id': 101010}
{'extra_field': 'elastc2', 'age': 10, 'id': 101010}
{'extra_field': 'elastc3', 'age': 10, 'id': 101010}
{'extra_field': 'elastc4', 'age': 10, 'id': 101010}
{'extra_field': 'elastc3', 'id': 101010}


In [518]:
s = Search(using=es_client,index="my_index").extra(size=0)
# s.filter("gte",field=)
agg = A('terms', field='state.keyword', size=100,order={'avg_age':'desc'})
# s=s.query(Q('range',age={'gte':30,'lte':50}))
# s=s.filter('range',age={'gte':40,'lte':50})
S=s.aggs.bucket('by_state', agg)
S.metric('avg_age','avg',field='age',missing='10000')
res = s.execute()

In [519]:
res.to_dict()['aggregations']
# 43.523809523809526

{'by_state': {'doc_count_error_upper_bound': 0,
  'sum_other_doc_count': 0,
  'buckets': [{'key': 'Utah',
    'doc_count': 22,
    'avg_age': {'value': 63.27272727272727}},
   {'key': 'Washington',
    'doc_count': 19,
    'avg_age': {'value': 59.73684210526316}},
   {'key': 'Alaska', 'doc_count': 29, 'avg_age': {'value': 58.3448275862069}},
   {'key': 'Nevada', 'doc_count': 30, 'avg_age': {'value': 58.1}},
   {'key': 'Iowa', 'doc_count': 22, 'avg_age': {'value': 57.27272727272727}},
   {'key': 'Louisiana',
    'doc_count': 19,
    'avg_age': {'value': 55.05263157894737}},
   {'key': 'Arkansas',
    'doc_count': 26,
    'avg_age': {'value': 54.73076923076923}},
   {'key': 'Vermont',
    'doc_count': 27,
    'avg_age': {'value': 54.44444444444444}},
   {'key': 'Wisconsin',
    'doc_count': 26,
    'avg_age': {'value': 54.38461538461539}},
   {'key': 'Nebraska',
    'doc_count': 33,
    'avg_age': {'value': 52.96969696969697}},
   {'key': 'Minnesota',
    'doc_count': 26,
    'avg_age': 

In [506]:
s = Search(using=es_client,index="my_index").extra(size=100)
must=[Q('range',age={'gte':40,'lte':50}),Q('match',state__keyword="Ohio")]
Query = s.query(Q('bool', must=must))
Query.execute().to_dict()['hits']['hits']

[{'_index': 'my_index',
  '_id': 'kgL4tYEBjKJ9kdwC8FA_',
  '_score': 5.150857,
  '_source': {'name': 'Erin Rojas',
   'age': 44,
   'state': 'Ohio',
   'year': '2011'}}]

In [532]:
search = Search(using=es_client,index="my_index").extra(size=100)

# search.filter("term", state__keyword="New York").execute().to_dict()['hits']['hits']

In [535]:
search.aggs.bucket('by_state','terms' ,field='state.keyword',size=100, order={'_key': 'desc'}).metric('total_age', 'sum', field='age',order={'total_age':'desc'})

Terms(aggs={'total_age': Sum(field='age', order={'total_age': 'desc'})}, field='state.keyword', order={'_key': 'desc'}, size=100)

In [536]:

search.execute().to_dict()['aggregations']

{'by_state': {'doc_count_error_upper_bound': 0,
  'sum_other_doc_count': 0,
  'buckets': [{'key': 'Wyoming',
    'doc_count': 35,
    'total_age': {'value': 1581.0}},
   {'key': 'Wisconsin', 'doc_count': 26, 'total_age': {'value': 1414.0}},
   {'key': 'West Virginia', 'doc_count': 30, 'total_age': {'value': 1444.0}},
   {'key': 'Washington', 'doc_count': 19, 'total_age': {'value': 1135.0}},
   {'key': 'Virginia', 'doc_count': 31, 'total_age': {'value': 1464.0}},
   {'key': 'Vermont', 'doc_count': 27, 'total_age': {'value': 1470.0}},
   {'key': 'Utah', 'doc_count': 22, 'total_age': {'value': 1392.0}},
   {'key': 'Texas', 'doc_count': 26, 'total_age': {'value': 995.0}},
   {'key': 'Tennessee', 'doc_count': 29, 'total_age': {'value': 1403.0}},
   {'key': 'South Dakota', 'doc_count': 31, 'total_age': {'value': 1558.0}},
   {'key': 'South Carolina', 'doc_count': 20, 'total_age': {'value': 826.0}},
   {'key': 'Rhode Island', 'doc_count': 23, 'total_age': {'value': 1193.0}},
   {'key': 'Penns