## Documentation

To read more about the search API, visit the docs [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-your-data.html) and [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html).



## Connect to ElasticSearch

In [None]:
from pprint import pprint
from elasticsearch import Elasticsearch

es = Elasticsearch('http://localhost:9200')
client_info = es.info()
print('Connected to Elasticsearch!')
pprint(client_info.body)

## Inserting documents

In [None]:
es.indices.delete(index='my_index', ignore_unavailable=True)
es.indices.create(index='my_index')

Let's index the documents sequentially in both indices.

In [None]:
import json


dummy_data = json.load(open("../data/dummy_data_2.json"))
for _ in range(10):
    dummy_data += dummy_data

len(dummy_data)

Since, we have duplicated the dummy data so much, we ended up with `5120` documents. Let's use the `bulk API` since we learned it before to index all those documents rapidly.

In [None]:
operations = []
for document in dummy_data:
    operations.append({'index': {'_index': 'my_index'}})
    operations.append(document)

es.bulk(operations=operations)

## Searching

### 1. Size + From

In this example, we perform a search that retrieves 10 documents, starting from the 11th document (i.e., skipping the first 10 results). This demonstrates pagination using the `size` and `from` parameters.


In [None]:
response = es.search(
    index="my_index",
    body={
        "query": {
            "match_all": {}
        },
        "size": 10,
        "from": 10
    },
)

for hit in response['hits']['hits']:
    print(hit['_source'])

### 2. Timeout

This example shows how to set a timeout for the search query. If the query takes longer than the specified `10s` (10 seconds), it will be aborted.

In [None]:
response = es.search(
    index="my_index",
    body={
        "query": {
            "match": {
                "message": "search keyword"
            }
        },
        "timeout": "10s"
    },
)

response.body

### 3. Aggregation

In this example, we perform an aggregation to calculate the average value of the `age` field across all documents that match the query. The result of the aggregation is stored in the `avg_age` key.


In [None]:
response = es.search(
    index="my_index",
    body={
        "query": {
            "match_all": {}
        },
        "aggs": {
            "avg_age": {
                "avg": {
                    "field": "age"
                }
            }
        }
    }
)

average_age = response['aggregations']['avg_age']['value']
print(f"Average Age: {average_age}")

### 4. Combining size, from, timeout, and aggs

Here we combine multiple parameters: we limit the results to 5 documents (`size`), skip the first 20 documents (`from`), set a timeout of 5 seconds (`timeout`), and perform a maximum aggregation (`aggs`) on the `price` field. This demonstrates how to use multiple search parameters together.

In [None]:
response = es.search(
    index="my_index",
    body={
        "query": {
            "match": {
                "message": "important keyword"
            }
        },
        "aggs": {
            "max_price": {
                "max": {
                    "field": "price"
                }
            }
        },
        "size": 5,
        "from": 20,
        "timeout": "5s"
    },
)

for hit in response['hits']['hits']:
    print(hit['_source'])

max_price = response['aggregations']['max_price']['value']
print(f"Max Price: {max_price}")