## Documentation

To read more about filters, checkout the docs [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/filter-search-results.html).

![filters_in_depth](../images/filters_in_depth.png)

## Connect to ElasticSearch

In [1]:
from pprint import pprint
from elasticsearch import Elasticsearch

HOST = "http://localhost:9200"

es = Elasticsearch(hosts=HOST)
client_info = es.info()
print("Connected tp Elasticsearch!")
pprint(client_info.body)

Connected tp Elasticsearch!
{'cluster_name': 'docker-cluster',
 'cluster_uuid': 'iugjHCt8SwCWRVd35xnJ0A',
 'name': '5013781c82bc',
 'tagline': 'You Know, for Search',
 'version': {'build_date': '2025-02-05T22:10:57.067596412Z',
             'build_flavor': 'default',
             'build_hash': '747663ddda3421467150de0e4301e8d4bc636b0c',
             'build_snapshot': False,
             'build_type': 'docker',
             'lucene_version': '9.12.0',
             'minimum_index_compatibility_version': '7.0.0',
             'minimum_wire_compatibility_version': '7.17.0',
             'number': '8.17.2'}}


## Index data

In [2]:
import json

from pprint import pprint


INDEX = "my_index"

settings = {
    "index": {
        "number_of_shards": 1,
        "number_of_replicas": 0
    }
}

es.indices.delete(index=INDEX, ignore_unavailable=True)
es.indices.create(index=INDEX, settings=settings)

operations = []
clothes_documents = json.load(open("../data/clothes.json"))

for document in clothes_documents:
    operations.append({"index": {"_index": INDEX}})
    operations.append(document)

response = es.bulk(operations=operations)
pprint(response.body)

{'errors': False,
 'items': [{'index': {'_id': 'TqcZNpUBObNT95MyJJ71',
                      '_index': 'my_index',
                      '_primary_term': 1,
                      '_seq_no': 0,
                      '_shards': {'failed': 0, 'successful': 1, 'total': 1},
                      '_version': 1,
                      'result': 'created',
                      'status': 201}},
           {'index': {'_id': 'T6cZNpUBObNT95MyJJ71',
                      '_index': 'my_index',
                      '_primary_term': 1,
                      '_seq_no': 1,
                      '_shards': {'failed': 0, 'successful': 1, 'total': 1},
                      '_version': 1,
                      'result': 'created',
                      'status': 201}},
           {'index': {'_id': 'UKcZNpUBObNT95MyJJ71',
                      '_index': 'my_index',
                      '_primary_term': 1,
                      '_seq_no': 2,
                      '_shards': {'failed': 0, 'successful': 1, '

In [3]:
count = es.count(index=INDEX)
print("Number of documents in index: ", count.body["count"])

Number of documents in index:  100


## Simple filters

Previously, we learned about compound queries. Using a boolean query, you can apply filters to narrow down documents based on specific criteria. In this simple example, we filter documents to keep only those where the brand is Adidas.

In [4]:
response = es.search(
    index=INDEX,
    body={
        "query": {
            "bool": {
                "filter": [
                    {
                        "term": {
                            "brand": "adidas"
                        }
                    }
                ]
            }
        },
        "size": 100
    }
)

hits = response["hits"]["hits"]
print(f"Found {len(hits)} documents")

Found 23 documents


Here, we apply multiple filters using an AND operation to retain documents where the brand is Adidas and the color is yellow.

In [6]:
response = es.search(
    index=INDEX,
    body={
        "query": {
            "bool": {
                "filter": [
                    {
                        "term": {
                            "color": "yellow"
                        }
                    },
                    {
                        "term": {
                            "brand": "adidas"
                        }
                    }
                ]
            }
        }
    }
)

hits = response["hits"]["hits"]
print(f"Found {len(hits)} documents")

Found 6 documents


## Post filters

In this example, we'll explore the use of filters, aggregations, filtered aggregations, and post-filters.

We start by narrowing our search to documents where the `brand` is `gucci`. Next, we apply aggregations to determine the document count for each color. We then define a filtered aggregation, `color_red`, which counts the models in documents where the color is `red`.

Finally, a `post_filter` is used after performing the aggregations, refining the search results to include only documents with the color `red`.

In [7]:
response = es.search(
    index="my_index",
    body={
        "query": {
            "bool": {
                "filter": {
                    "term": {
                        "brand": "gucci"
                    }
                }
            }
        },
        "aggs": {
            "colors": {
                "terms": {
                    "field": "color.keyword"
                }
            },
            "color_red": {
                "filter": {
                    "term": {
                        "color.keyword": "red"
                    }
                },
                "aggs": {
                    "models": {
                        "terms": {
                            "field": "model.keyword"
                        }
                    }
                }
            }
        },
        "post_filter": {
            "term": {
                "color": "red"
            }
        },
        "size": 20
    }
)
pprint(response.body)

{'_shards': {'failed': 0, 'skipped': 0, 'successful': 1, 'total': 1},
 'aggregations': {'color_red': {'doc_count': 12,
                                'models': {'buckets': [{'doc_count': 3,
                                                        'key': 'model_1'},
                                                       {'doc_count': 1,
                                                        'key': 'model_14'},
                                                       {'doc_count': 1,
                                                        'key': 'model_16'},
                                                       {'doc_count': 1,
                                                        'key': 'model_2'},
                                                       {'doc_count': 1,
                                                        'key': 'model_26'},
                                                       {'doc_count': 1,
                                                        'key': 'model_2

The `colors_aggregation` variable holds the count of documents associated with each color, as returned by the aggregation query.

In [8]:
colors_aggregation = response["aggregations"]["colors"]["buckets"]
pprint(colors_aggregation)

[{'doc_count': 12, 'key': 'red'},
 {'doc_count': 8, 'key': 'blue'},
 {'doc_count': 6, 'key': 'green'},
 {'doc_count': 4, 'key': 'yellow'}]


The `color_red_aggregation` variable contains the results from the `color_red` filtered aggregation, which includes the count of models specifically for documents where the color is `red`.

In [9]:
color_red_aggregation = response["aggregations"]["color_red"]["models"]["buckets"]
pprint(color_red_aggregation)

[{'doc_count': 3, 'key': 'model_1'},
 {'doc_count': 1, 'key': 'model_14'},
 {'doc_count': 1, 'key': 'model_16'},
 {'doc_count': 1, 'key': 'model_2'},
 {'doc_count': 1, 'key': 'model_26'},
 {'doc_count': 1, 'key': 'model_28'},
 {'doc_count': 1, 'key': 'model_3'},
 {'doc_count': 1, 'key': 'model_4'},
 {'doc_count': 1, 'key': 'model_6'},
 {'doc_count': 1, 'key': 'model_8'}]


To get the search results after the post-filter has been applied, we access the `hits` from `response.body['hits']['hits']`. This allows us to iterate through each result and display details such as the shirt's brand, color, and model.

In [10]:
hits = response["hits"]["hits"]
for hit in hits:
    print(f"Shirt brand: {hit["_source"]["brand"]}, color: {hit["_source"]["color"]}, and model: {hit["_source"]["model"]}")

Shirt brand: gucci, color: red, and model: model_1
Shirt brand: gucci, color: red, and model: model_3
Shirt brand: gucci, color: red, and model: model_1
Shirt brand: gucci, color: red, and model: model_4
Shirt brand: gucci, color: red, and model: model_1
Shirt brand: gucci, color: red, and model: model_2
Shirt brand: gucci, color: red, and model: model_28
Shirt brand: gucci, color: red, and model: model_6
Shirt brand: gucci, color: red, and model: model_14
Shirt brand: gucci, color: red, and model: model_26
Shirt brand: gucci, color: red, and model: model_8
Shirt brand: gucci, color: red, and model: model_16
