## Documentation

To read more about the search API, visit the docs [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-your-data.html) and [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html).

![search_api_docs](../images/search_api_docs.png)

## Connect to ElasticSearch

In [1]:
from pprint import pprint
from elasticsearch import Elasticsearch

HOST = "http://localhost:9200"

es = Elasticsearch(HOST)
client_info = es.info()
print("Connected tp Elasticsearch!")
pprint(client_info.body)

Connected tp Elasticsearch!
{'cluster_name': 'docker-cluster',
 'cluster_uuid': 'IzAz_bJfQnS_zfMDjIPmJA',
 'name': 'eb6cd056e782',
 'tagline': 'You Know, for Search',
 'version': {'build_date': '2025-01-09T14:09:01.578835424Z',
             'build_flavor': 'default',
             'build_hash': '0f88dde84795b30ca0d2c0c4796643ec5938aeb5',
             'build_snapshot': False,
             'build_type': 'docker',
             'lucene_version': '8.11.3',
             'minimum_index_compatibility_version': '6.0.0-beta1',
             'minimum_wire_compatibility_version': '6.8.0',
             'number': '7.17.27'}}


  client_info = es.info()


## Inserting documents

In [2]:
INDEX = "my_index"

settings = {
    "index": {
        "number_of_shards": 1,
        "number_of_replicas": 0
    }
}

es.indices.delete(index=INDEX, ignore_unavailable=True)
es.indices.create(index=INDEX, settings=settings)

  es.indices.delete(index=INDEX, ignore_unavailable=True)
  es.indices.create(index=INDEX, settings=settings)


ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'my_index'})

In [5]:
import json

dummy_data = json.load(open("../data/dummy_data_2.json"))
for _ in range(10):
    dummy_data += dummy_data

len(dummy_data)

5120

Since, we have duplicated the dummy data so much, we ended up with `5120` documents. Let's use the `bulk API` since we learned it before to index all those documents rapidly.

In [6]:
from tqdm import tqdm

operations = []

for document in tqdm(dummy_data, total=len(dummy_data)):
    operations.append({"index": {"_index": INDEX}})
    operations.append(document)

es.bulk(operations=operations)

100%|██████████| 5120/5120 [00:00<00:00, 275866.61it/s]
  es.bulk(operations=operations)


ObjectApiResponse({'took': 594, 'errors': False, 'items': [{'index': {'_index': 'my_index', '_type': '_doc', '_id': 'R3jZJJUBpQvCJGK5Y03x', '_version': 1, 'result': 'created', '_shards': {'total': 1, 'successful': 1, 'failed': 0}, '_seq_no': 0, '_primary_term': 1, 'status': 201}}, {'index': {'_index': 'my_index', '_type': '_doc', '_id': 'SHjZJJUBpQvCJGK5Y03y', '_version': 1, 'result': 'created', '_shards': {'total': 1, 'successful': 1, 'failed': 0}, '_seq_no': 1, '_primary_term': 1, 'status': 201}}, {'index': {'_index': 'my_index', '_type': '_doc', '_id': 'SXjZJJUBpQvCJGK5Y03y', '_version': 1, 'result': 'created', '_shards': {'total': 1, 'successful': 1, 'failed': 0}, '_seq_no': 2, '_primary_term': 1, 'status': 201}}, {'index': {'_index': 'my_index', '_type': '_doc', '_id': 'SnjZJJUBpQvCJGK5Y03y', '_version': 1, 'result': 'created', '_shards': {'total': 1, 'successful': 1, 'failed': 0}, '_seq_no': 3, '_primary_term': 1, 'status': 201}}, {'index': {'_index': 'my_index', '_type': '_doc',

## Searching

### 1. Size + From

In this example, we perform a search that retrieves 10 documents, starting from the 11th document (i.e., skipping the first 10 results). This demonstrates pagination using the `size` and `from` parameters.


In [7]:
response = es.search(
    index=INDEX,
    body={
        "query": {
            "match_all": {}
        },
        "size": 10,
        "from": 10
    }
)

for hit in response["hits"]["hits"]:
    print(hit["_source"])

{'message': 'This is an important keyword search result.', 'age': 25, 'price': 100.0}
{'message': 'Another search result with an important keyword.', 'age': 30, 'price': 150.0}
{'message': 'Keyword match in this result as well.', 'age': 40, 'price': 200.0}
{'message': 'Important keyword again in this document.', 'age': 35, 'price': 120.0}
{'message': 'Final document with the important keyword.', 'age': 28, 'price': 180.0}
{'message': 'This is an important keyword search result.', 'age': 25, 'price': 100.0}
{'message': 'Another search result with an important keyword.', 'age': 30, 'price': 150.0}
{'message': 'Keyword match in this result as well.', 'age': 40, 'price': 200.0}
{'message': 'Important keyword again in this document.', 'age': 35, 'price': 120.0}
{'message': 'Final document with the important keyword.', 'age': 28, 'price': 180.0}


  response = es.search(


### 2. Timeout

This example shows how to set a timeout for the search query. If the query takes longer than the specified `10s` (10 seconds), it will be aborted.

In [8]:
response = es.search(
    index=INDEX,
    body={
        "query": {
            "match": {
                "message": "search keyword"
            }
        },
        "timeout": "10s"
    }
)

response.body

  response = es.search(


{'took': 29,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 5120, 'relation': 'eq'},
  'max_score': 0.8941701,
  'hits': [{'_index': 'my_index',
    '_type': '_doc',
    '_id': 'R3jZJJUBpQvCJGK5Y03x',
    '_score': 0.8941701,
    '_source': {'message': 'This is an important keyword search result.',
     'age': 25,
     'price': 100.0}},
   {'_index': 'my_index',
    '_type': '_doc',
    '_id': 'SHjZJJUBpQvCJGK5Y03y',
    '_score': 0.8941701,
    '_source': {'message': 'Another search result with an important keyword.',
     'age': 30,
     'price': 150.0}},
   {'_index': 'my_index',
    '_type': '_doc',
    '_id': 'THjZJJUBpQvCJGK5Y03y',
    '_score': 0.8941701,
    '_source': {'message': 'This is an important keyword search result.',
     'age': 25,
     'price': 100.0}},
   {'_index': 'my_index',
    '_type': '_doc',
    '_id': 'TXjZJJUBpQvCJGK5Y03y',
    '_score': 0.8941701,
    '_source': {'message': 'Another 

### 3. Aggregation

In this example, we perform an aggregation to calculate the average value of the `age` field across all documents that match the query. The result of the aggregation is stored in the `avg_age` key.


In [9]:
response = es.search(
    index=INDEX,
    body={
        "query": {
            "match_all": {}
        },
        "aggs": {
            "avg_age": {
                "avg": {
                    "field": "age"
                }
            }
        }
    }
)

average_age = response["aggregations"]["avg_age"]["value"]
print(f"Average Age: {average_age}")

Average Age: 31.6


  response = es.search(


### 4. Combining size, from, timeout, and aggs

Here we combine multiple parameters: we limit the results to 5 documents (`size`), skip the first 20 documents (`from`), set a timeout of 5 seconds (`timeout`), and perform a maximum aggregation (`aggs`) on the `price` field. This demonstrates how to use multiple search parameters together.

In [10]:
response = es.search(
    index=INDEX,
    body={
        "query": {
            "match": {
                "message": "important keyword"
            }
        },
        "aggs": {
            "max_price": {
                "max": {
                    "field": "price"
                }
            }
        },
        "size": 5,
        "from": 20,
        "timeout": "5s"
    }
)

for hit in response["hits"]["hits"]:
    print(hit["_source"])


max_price = response["aggregations"]["max_price"]["value"]
print(f"Max Price: {max_price}")

{'message': 'Important keyword again in this document.', 'age': 35, 'price': 120.0}
{'message': 'Final document with the important keyword.', 'age': 28, 'price': 180.0}
{'message': 'Important keyword again in this document.', 'age': 35, 'price': 120.0}
{'message': 'Final document with the important keyword.', 'age': 28, 'price': 180.0}
{'message': 'Important keyword again in this document.', 'age': 35, 'price': 120.0}
Max Price: 200.0


  response = es.search(
