## Documentation

To read more about filters, checkout the docs [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/filter-search-results.html).



## Connect to ElasticSearch

In [None]:
from pprint import pprint
from elasticsearch import Elasticsearch

es = Elasticsearch('http://localhost:9200')
client_info = es.info()
print('Connected to Elasticsearch!')
pprint(client_info.body)

## Index data

In [None]:
import json

from pprint import pprint

es.indices.delete(index='my_index', ignore_unavailable=True)
es.indices.create(index='my_index')

operations = []
clothes_documents = json.load(open("../data/clothes.json"))

for document in clothes_documents:
    operations.append({'index': {'_index': 'my_index'}})
    operations.append(document)

response = es.bulk(operations=operations)
pprint(response.body)

In [None]:
count = es.count(index='my_index')
print('Number of documents in index:', count.body['count'])

## Simple filters

Previously, we learned about compound queries. Using a boolean query, you can apply filters to narrow down documents based on specific criteria. In this simple example, we filter documents to keep only those where the brand is Adidas.

In [None]:
response = es.search(
    index="my_index",
    body={
        "query": {
            "bool": {
                "filter": [
                    {
                        "term": {
                            "brand": "adidas"
                        }
                    }
                ]
            }
        },
        "size": 100
    },
)

hits = response.body['hits']['hits']
print(f"Found {len(hits)} documents")

Here, we apply multiple filters using an AND operation to retain documents where the brand is Adidas and the color is yellow.

In [None]:
response = es.search(
    index="my_index",
    body={
        "query": {
            "bool": {
                "filter": [
                    {
                        "term": {
                            "color": "yellow"
                        }
                    },
                    {
                        "term": {
                            "brand": "adidas"
                        }
                    }
                ]
            }
        },
    },
)

hits = response.body['hits']['hits']
print(f"Found {len(hits)} documents")

## Post filters

In this example, we'll explore the use of filters, aggregations, filtered aggregations, and post-filters.

We start by narrowing our search to documents where the `brand` is `gucci`. Next, we apply aggregations to determine the document count for each color. We then define a filtered aggregation, `color_red`, which counts the models in documents where the color is `red`.

Finally, a `post_filter` is used after performing the aggregations, refining the search results to include only documents with the color `red`.

In [None]:
response = es.search(
    index="my_index",
    body={
        "query": {
            "bool": {
                "filter": {
                    "term": {
                        "brand": "gucci"
                    }
                }
            }
        },
        "aggs": {
            "colors": {
                "terms": {
                    "field": "color.keyword"
                }
            },
            "color_red": {
                "filter": {
                    "term": {
                        "color.keyword": "red"
                    }
                },
                "aggs": {
                    "models": {
                        "terms": {
                            "field": "model.keyword"
                        }
                    }
                }
            }
        },
        "post_filter": {
            "term": {
                "color": "red"
            }
        },
        "size": 20
    }
)
pprint(response.body)

The `colors_aggregation` variable holds the count of documents associated with each color, as returned by the aggregation query.

In [None]:
colors_aggregation = response.body['aggregations']['colors']['buckets']
pprint(colors_aggregation)

The `color_red_aggregation` variable contains the results from the `color_red` filtered aggregation, which includes the count of models specifically for documents where the color is `red`.

In [None]:
color_red_aggregation = response.body['aggregations']['color_red']['models']['buckets']
pprint(color_red_aggregation)

To get the search results after the post-filter has been applied, we access the `hits` from `response.body['hits']['hits']`. This allows us to iterate through each result and display details such as the shirt's brand, color, and model.

In [None]:
hits = response.body['hits']['hits']
for hit in hits:
    print(f"""Shirt brand: {hit['_source']['brand']}, color: {
          hit['_source']['color']}, and model: {hit['_source']['model']}""")