## Setup packages

Intall locally elasticsearch or create an account in bonsai.io

The `elasticsearch` cluster  utilized in this notebook  for preparing and storing the data within elasticsearch indices for demonstration purposes only. In real-world production clusters with numerous nodes, the cluster might be receiving the data from connectors like logstash etc.



### Install the required elasticsearch packages

In [None]:
!pip install elasticsearch==7.10.1
!pip install tensorflow
!pip install urllib3

Collecting elasticsearch==7.10.1
  Downloading elasticsearch-7.10.1-py2.py3-none-any.whl (322 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m322.1/322.1 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting urllib3<2,>=1.21.1 (from elasticsearch==7.10.1)
  Downloading urllib3-1.26.18-py2.py3-none-any.whl (143 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m143.8/143.8 kB[0m [31m9.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: urllib3, elasticsearch
  Attempting uninstall: urllib3
    Found existing installation: urllib3 2.0.7
    Uninstalling urllib3-2.0.7:
      Successfully uninstalled urllib3-2.0.7
Successfully installed elasticsearch-7.10.1 urllib3-1.26.18


### Import packages

In [None]:
import os
import time
from collections.abc import Mapping
from elasticsearch import Elasticsearch
import numpy as np
import pandas as pd


### Explore the dataset

For the purpose of this tutorial, lets download the [PetFinder](https://www.kaggle.com/c/petfinder-adoption-prediction) dataset and feed the data into elasticsearch manually. The goal of this classification problem is predict if the pet will be adopted or not.


In [None]:
import tensorflow as tf

dataset_url = 'http://storage.googleapis.com/download.tensorflow.org/data/petfinder-mini.zip'
csv_file = 'datasets/petfinder-mini/petfinder-mini.csv'
tf.keras.utils.get_file('petfinder_mini.zip', dataset_url,
                        extract=True, cache_dir='.')
pf_df = pd.read_csv(csv_file)

Downloading data from http://storage.googleapis.com/download.tensorflow.org/data/petfinder-mini.zip


In [None]:
pf_df.head()

Unnamed: 0,Type,Age,Breed1,Gender,Color1,Color2,MaturitySize,FurLength,Vaccinated,Sterilized,Health,Fee,Description,PhotoAmt,AdoptionSpeed
0,Cat,3,Tabby,Male,Black,White,Small,Short,No,No,Healthy,100,Nibble is a 3+ month old ball of cuteness. He ...,1,2
1,Cat,1,Domestic Medium Hair,Male,Black,Brown,Medium,Medium,Not Sure,Not Sure,Healthy,0,I just found it alone yesterday near my apartm...,2,0
2,Dog,1,Mixed Breed,Male,Brown,White,Medium,Medium,Yes,No,Healthy,0,Their pregnant mother was dumped by her irresp...,7,3
3,Dog,4,Mixed Breed,Female,Black,Brown,Medium,Short,Yes,No,Healthy,150,"Good guard dog, very alert, active, obedience ...",8,2
4,Dog,1,Mixed Breed,Male,Black,No Color,Medium,Short,No,No,Healthy,0,This handsome yet cute boy is up for adoption....,3,2


In [None]:
# Number of datapoints and columns
len(pf_df), len(pf_df.columns)

(11537, 15)

### Store the train and test data in elasticsearch indices

Storing the data in the local elasticsearch cluster simulates an environment for continuous remote data retrieval for training and inference purposes.

In [None]:
es_client = Elasticsearch([{'host': 'tec-search-834913877.us-east-1.bonsaisearch.net', 'port': 443}], use_ssl=True, verify_certs=True,
                       ssl_show_warn=False, http_auth=('dj06j4p1pf', 'p6i8sb8equ'),request_timeout=100)

In [None]:
def prepare_es_data(index, doc_type, df):
  records = df.to_dict(orient="records")
  es_data = []
  for idx, record in enumerate(records):
    meta_dict = {
          "index": {
              "_index": index,
              "_type": doc_type,
              "_id": idx
          }
      }
    es_data.append(meta_dict)
    es_data.append(record)

  return es_data

def index_es_data(index, pf_df):
  if es_client.indices.exists(index = index):
      print("deleting the '{}' index.".format(index))
      res = es_client.indices.delete(index=index)
      print("Response from server: {}".format(res))

  print("creating the '{}' index.".format(index))
  res = es_client.indices.create(index=index)
  print("Response from server: {}".format(res))
  es_data = prepare_es_data(index="pets", doc_type="_doc", df=pf_df)
  print("bulk index the data")
  res = es_client.bulk(index=index, body=es_data, refresh = True)
  print("Errors: {}, Num of records indexed: {}".format(res["errors"], len(res["items"])))


In [None]:
index_es_data("pets", pf_df)

deleting the 'pets' index.
Response from server: {'acknowledged': True}
creating the 'pets' index.
Response from server: {'acknowledged': True, 'shards_acknowledged': True, 'index': 'pets'}
bulk index the data
Errors: True, Num of records indexed: 11537




###Finding pets
Lets find some dogs to adopt, we want a friendly and active dog, let's search over the description the tokens "friendly and active dogs"



In [None]:
query = {
  "query": {
    "query_string": {
      "default_field": "Description",
      "query": "friendly and active dogs",
      "default_operator": "AND"
    }
  }
}


response = es_client.search(index="pets",body=query)

let's see how many pets we found and some examples

In [None]:
print(f'{response["hits"]["total"]["value"]} founded!')
for pet in  response["hits"]["hits"][:5]:
  print(pet["_source"])

45 founded!
{'Type': 'Dog', 'Age': 3, 'Breed1': 'Mixed Breed', 'Gender': 'Female', 'Color1': 'Black', 'Color2': 'Brown', 'MaturitySize': 'Medium', 'FurLength': 'Medium', 'Vaccinated': 'Yes', 'Sterilized': 'Not Sure', 'Health': 'Healthy', 'Fee': 100, 'Description': 'Friendly and active dog who loves to play with people and other dogs.', 'PhotoAmt': 3, 'AdoptionSpeed': 4}
{'Type': 'Dog', 'Age': 2, 'Breed1': 'Mixed Breed', 'Gender': 'Female', 'Color1': 'Brown', 'Color2': 'Cream', 'MaturitySize': 'Medium', 'FurLength': 'Short', 'Vaccinated': 'No', 'Sterilized': 'No', 'Health': 'Healthy', 'Fee': 0, 'Description': 'Hi Coco is a rescued puppy from the streets. If you would like to adopt her pls contact me. She is very active and is very friendly with other dogs.', 'PhotoAmt': 1, 'AdoptionSpeed': 0}
{'Type': 'Dog', 'Age': 6, 'Breed1': 'Mixed Breed', 'Gender': 'Male', 'Color1': 'Black', 'Color2': 'Brown', 'MaturitySize': 'Medium', 'FurLength': 'Short', 'Vaccinated': 'Yes', 'Sterilized': 'Yes', 

It isn't bad but what happens if we search with a singular word "dog"

In [None]:
query = {
  "query": {
    "query_string": {
      "default_field": "Description",
      "query": "friendly and active dog",
      "default_operator": "AND"
    }
  }
}


response = es_client.search(index="pets",body=query)

print(f'{response["hits"]["total"]["value"]} founded!')
for pet in  response["hits"]["hits"][:5]:
  print(pet["_source"])

67 founded!
{'Type': 'Dog', 'Age': 2, 'Breed1': 'Mixed Breed', 'Gender': 'Female', 'Color1': 'Brown', 'Color2': 'No Color', 'MaturitySize': 'Small', 'FurLength': 'Medium', 'Vaccinated': 'No', 'Sterilized': 'Yes', 'Health': 'Healthy', 'Fee': 0, 'Description': '2 months old female dog ... very friendly and active', 'PhotoAmt': 0, 'AdoptionSpeed': 4}
{'Type': 'Dog', 'Age': 3, 'Breed1': 'Mixed Breed', 'Gender': 'Female', 'Color1': 'Black', 'Color2': 'Brown', 'MaturitySize': 'Medium', 'FurLength': 'Medium', 'Vaccinated': 'Yes', 'Sterilized': 'Not Sure', 'Health': 'Healthy', 'Fee': 100, 'Description': 'Friendly and active dog who loves to play with people and other dogs.', 'PhotoAmt': 3, 'AdoptionSpeed': 4}
{'Type': 'Dog', 'Age': 3, 'Breed1': 'Mixed Breed', 'Gender': 'Male', 'Color1': 'Black', 'Color2': 'No Color', 'MaturitySize': 'Medium', 'FurLength': 'Short', 'Vaccinated': 'No', 'Sterilized': 'No', 'Health': 'Healthy', 'Fee': 0, 'Description': 'stray dog puppy about 3mths+. healthy and ac

And what if we remove stopwords

In [None]:
query = {
  "query": {
    "query_string": {
      "default_field": "Description",
      "query": "friendly  active dog",
      "default_operator": "AND"
    }
  }
}


response = es_client.search(index="pets",body=query)

print(f'{response["hits"]["total"]["value"]} founded!')
for pet in  response["hits"]["hits"][:5]:
  print(pet["_source"])

71 founded!
{'Type': 'Dog', 'Age': 2, 'Breed1': 'Mixed Breed', 'Gender': 'Female', 'Color1': 'Brown', 'Color2': 'No Color', 'MaturitySize': 'Small', 'FurLength': 'Medium', 'Vaccinated': 'No', 'Sterilized': 'Yes', 'Health': 'Healthy', 'Fee': 0, 'Description': '2 months old female dog ... very friendly and active', 'PhotoAmt': 0, 'AdoptionSpeed': 4}
{'Type': 'Dog', 'Age': 3, 'Breed1': 'Mixed Breed', 'Gender': 'Female', 'Color1': 'Black', 'Color2': 'Brown', 'MaturitySize': 'Medium', 'FurLength': 'Medium', 'Vaccinated': 'Yes', 'Sterilized': 'Not Sure', 'Health': 'Healthy', 'Fee': 100, 'Description': 'Friendly and active dog who loves to play with people and other dogs.', 'PhotoAmt': 3, 'AdoptionSpeed': 4}
{'Type': 'Dog', 'Age': 3, 'Breed1': 'Mixed Breed', 'Gender': 'Male', 'Color1': 'Black', 'Color2': 'White', 'MaturitySize': 'Medium', 'FurLength': 'Short', 'Vaccinated': 'No', 'Sterilized': 'No', 'Health': 'Healthy', 'Fee': 0, 'Description': 'stray dog puppy about 3mths+. friendly, healthy

### Mappings for processing data
---
We will update the index by creating some filters for our tokens generation and we will alzo apply it to the field "description"


In [None]:
new_index = {
  "settings":
  {
    "analysis":
    {
      "analyzer":
      {
        "my_analyzer":
        {
          "tokenizer": "standard",
          "filter": [
            "my_stemmer",
            "english_stop",
            "my_stop_word",
            "lowercase"
          ]
        }
      },
      "filter": {
        "my_stemmer": {
          "type": "stemmer",
          "name": "english"
        },
        "english_stop":{
          "type": "stop",
          "stopwords": "_english_"
        },
        "my_stop_word": {
          "type": "stop",
          "stopwords": ["robot", "love", "affection", "play", "the"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "Description": {
        "type": "text",
        "analyzer": "my_analyzer"
      }
    }
  }
}



In [None]:
def prepare_es_data(index, doc_type, df):
  records = df.to_dict(orient="records")
  es_data = []
  for idx, record in enumerate(records):
    meta_dict = {
          "index": {
              "_index": index,
              "_type": doc_type,
              "_id": idx
          }
      }
    es_data.append(meta_dict)
    es_data.append(record)

  return es_data

def index_es_data(index, pf_df):
  if es_client.indices.exists(index = index):
      print("deleting the '{}' index.".format(index))
      res = es_client.indices.delete(index=index)
      print("Response from server: {}".format(res))

  print("creating the '{}' index.".format(index))
  res = es_client.indices.create(index=index,body=new_index)
  print("Response from server: {}".format(res))
  es_data = prepare_es_data(index="pets", doc_type="_doc", df=pf_df)
  print("bulk index the data")
  res = es_client.bulk(index=index, body=es_data, refresh = True)
  print("Errors: {}, Num of records indexed: {}".format(res["errors"], len(res["items"])))

index_es_data("pets", pf_df)

deleting the 'pets' index.
Response from server: {'acknowledged': True}
creating the 'pets' index.
Response from server: {'acknowledged': True, 'shards_acknowledged': True, 'index': 'pets'}
bulk index the data
Errors: True, Num of records indexed: 11537


In [None]:
query = {
  "query": {
    "query_string": {
      "default_field": "Description",
      "query": "friendly and active dogs",
      "default_operator": "AND"
    }
  }
}


response = es_client.search(index="pets",body=query)

print(f'{response["hits"]["total"]["value"]} founded!')
for pet in  response["hits"]["hits"][:5]:
  print(pet["_source"])

88 founded!
{'Type': 'Dog', 'Age': 3, 'Breed1': 'Mixed Breed', 'Gender': 'Female', 'Color1': 'Black', 'Color2': 'Brown', 'MaturitySize': 'Medium', 'FurLength': 'Medium', 'Vaccinated': 'Yes', 'Sterilized': 'Not Sure', 'Health': 'Healthy', 'Fee': 100, 'Description': 'Friendly and active dog who loves to play with people and other dogs.', 'PhotoAmt': 3, 'AdoptionSpeed': 4}
{'Type': 'Dog', 'Age': 2, 'Breed1': 'Mixed Breed', 'Gender': 'Female', 'Color1': 'Brown', 'Color2': 'No Color', 'MaturitySize': 'Small', 'FurLength': 'Medium', 'Vaccinated': 'No', 'Sterilized': 'Yes', 'Health': 'Healthy', 'Fee': 0, 'Description': '2 months old female dog ... very friendly and active', 'PhotoAmt': 0, 'AdoptionSpeed': 4}
{'Type': 'Dog', 'Age': 9, 'Breed1': 'Beagle', 'Gender': 'Male', 'Color1': 'Brown', 'Color2': 'White', 'MaturitySize': 'Large', 'FurLength': 'Medium', 'Vaccinated': 'Yes', 'Sterilized': 'Not Sure', 'Health': 'Healthy', 'Fee': 0, 'Description': 'He is a friendly and active dog. He is also v

In [None]:
query = {
  "query": {
    "query_string": {
      "default_field": "Description",
      "query": "friendly active dog",
      "default_operator": "AND"
    }
  }
}


response = es_client.search(index="pets",body=query)

print(f'{response["hits"]["total"]["value"]} founded!')
for pet in  response["hits"]["hits"][:5]:
  print(pet["_source"])

88 founded!
{'Type': 'Dog', 'Age': 3, 'Breed1': 'Mixed Breed', 'Gender': 'Female', 'Color1': 'Black', 'Color2': 'Brown', 'MaturitySize': 'Medium', 'FurLength': 'Medium', 'Vaccinated': 'Yes', 'Sterilized': 'Not Sure', 'Health': 'Healthy', 'Fee': 100, 'Description': 'Friendly and active dog who loves to play with people and other dogs.', 'PhotoAmt': 3, 'AdoptionSpeed': 4}
{'Type': 'Dog', 'Age': 2, 'Breed1': 'Mixed Breed', 'Gender': 'Female', 'Color1': 'Brown', 'Color2': 'No Color', 'MaturitySize': 'Small', 'FurLength': 'Medium', 'Vaccinated': 'No', 'Sterilized': 'Yes', 'Health': 'Healthy', 'Fee': 0, 'Description': '2 months old female dog ... very friendly and active', 'PhotoAmt': 0, 'AdoptionSpeed': 4}
{'Type': 'Dog', 'Age': 9, 'Breed1': 'Beagle', 'Gender': 'Male', 'Color1': 'Brown', 'Color2': 'White', 'MaturitySize': 'Large', 'FurLength': 'Medium', 'Vaccinated': 'Yes', 'Sterilized': 'Not Sure', 'Health': 'Healthy', 'Fee': 0, 'Description': 'He is a friendly and active dog. He is also v

### Filters
By using filters we are able not to perform searches but to segment our dataset, this is the equivalent to WHERE clause on SQL


In [None]:
query = {

  "query": {
    "match": {
      "Breed1": "Beagle"
    }
  }
}


response = es_client.search(index="pets",body=query)

print(f'{response["hits"]["total"]["value"]} founded!')
for pet in  response["hits"]["hits"][:5]:
  print(pet["_source"])

86 founded!
{'Type': 'Dog', 'Age': 22, 'Breed1': 'Beagle', 'Gender': 'Male', 'Color1': 'Brown', 'Color2': 'Cream', 'MaturitySize': 'Medium', 'FurLength': 'Short', 'Vaccinated': 'Yes', 'Sterilized': 'No', 'Health': 'Healthy', 'Fee': 0, 'Description': 'Very aggressive, playful and obedient. He needs someone pay attention to him.', 'PhotoAmt': 0, 'AdoptionSpeed': 3}
{'Type': 'Dog', 'Age': 18, 'Breed1': 'Beagle', 'Gender': 'Female', 'Color1': 'Brown', 'Color2': 'White', 'MaturitySize': 'Medium', 'FurLength': 'Short', 'Vaccinated': 'Yes', 'Sterilized': 'Yes', 'Health': 'Healthy', 'Fee': 0, 'Description': 'Chanelle found a home!', 'PhotoAmt': 10, 'AdoptionSpeed': 0}
{'Type': 'Dog', 'Age': 55, 'Breed1': 'Beagle', 'Gender': 'Male', 'Color1': 'Black', 'Color2': 'Yellow', 'MaturitySize': 'Medium', 'FurLength': 'Short', 'Vaccinated': 'Yes', 'Sterilized': 'Yes', 'Health': 'Healthy', 'Fee': 0, 'Description': 'Rocco is an adorable beagle. My wife bought her to me 4 years ago, almost when we just arr

We can combine multiple filters, for example Beagles of Gender equals Male, this is equivalent (breed=beagle and Gender=Male)

In [None]:
query ={
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "Breed1": "Beagle"
          }
        },
        {
          "match": {
            "Gender": "Male"
          }
        }
      ]
    }
  }
}


response = es_client.search(index="pets",body=query)

print(f'{response["hits"]["total"]["value"]} founded!')
for pet in  response["hits"]["hits"][:5]:
  print(pet["_source"])

49 founded!
{'Type': 'Dog', 'Age': 22, 'Breed1': 'Beagle', 'Gender': 'Male', 'Color1': 'Brown', 'Color2': 'Cream', 'MaturitySize': 'Medium', 'FurLength': 'Short', 'Vaccinated': 'Yes', 'Sterilized': 'No', 'Health': 'Healthy', 'Fee': 0, 'Description': 'Very aggressive, playful and obedient. He needs someone pay attention to him.', 'PhotoAmt': 0, 'AdoptionSpeed': 3}
{'Type': 'Dog', 'Age': 55, 'Breed1': 'Beagle', 'Gender': 'Male', 'Color1': 'Black', 'Color2': 'Yellow', 'MaturitySize': 'Medium', 'FurLength': 'Short', 'Vaccinated': 'Yes', 'Sterilized': 'Yes', 'Health': 'Healthy', 'Fee': 0, 'Description': 'Rocco is an adorable beagle. My wife bought her to me 4 years ago, almost when we just arrived. Rocco loves to be among people, she\'s very playful and excellent to get along with cats since she has grown up with couple of them. Unfortunately I\'m moving and I cannot bring her with me. It\'s really heartbreaking but I have no choice. Rocco is partially trained, understands "sit", "stay" & "

We can perform OR operations , for example ((breed=beagle OR breed=Rottweiler) and Gender=Male)

In [None]:
query ={
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "should": [
              {
                "match": {
                  "Breed1": "Beagle"
                }
              },
              {
                "match": {
                  "Breed1": "Rottweiler"
                }
              }
            ]
          }
        },
        {
          "match": {
            "Gender": "Male"
          }
        }
      ]
    }
  }
}



response = es_client.search(index="pets",body=query)

print(f'{response["hits"]["total"]["value"]} founded!')
for pet in  response["hits"]["hits"][:5]:
  print(pet["_source"])

93 founded!
{'Type': 'Dog', 'Age': 14, 'Breed1': 'Rottweiler', 'Gender': 'Male', 'Color1': 'Black', 'Color2': 'Brown', 'MaturitySize': 'Large', 'FurLength': 'Short', 'Vaccinated': 'Yes', 'Sterilized': 'No', 'Health': 'Healthy', 'Fee': 1, 'Description': 'healthy and active , skinny and tall, as the mother (doberman) fully vaccinated , dewormed. Authorities not allowed have more than 1 dog at home, they dont issue license .', 'PhotoAmt': 0, 'AdoptionSpeed': 3}
{'Type': 'Dog', 'Age': 2, 'Breed1': 'Rottweiler', 'Gender': 'Male', 'Color1': 'Black', 'Color2': 'Brown', 'MaturitySize': 'Large', 'FurLength': 'Medium', 'Vaccinated': 'Not Sure', 'Sterilized': 'Not Sure', 'Health': 'Healthy', 'Fee': 0, 'Description': 'We was found this dog from a roadside i think its abandoned, Its a Rottweiler mix breed.. If anybody would like to take care and give him a new home by ur kind adoption pls do contact me. -Rishi Location : Taman kinrara sek 2,Puchong **Your kind adoption will give them a new life,Tha

### Faceted Search

Is very common to mix filters and text searches , this is known as a faceted search (multiple facets) for example in the first search, lets include an aggregation component, this might help to know what are the top 100 common "breeds" matches to the text search "friendly and active dogs", this is the equivalent of a "Group by" clausure in SQL

In [None]:
query = {
  "query": {
    "query_string": {
      "default_field": "Description",
      "query": "friendly and active dogs",
      "default_operator": "AND"
    }
  },
  "aggs": {
    "breeds": {
      "terms": {
        "field": "Breed1.keyword",
        "size": 100
      }
    }
  }
}


response = es_client.search(index="pets",body=query)

print(f'{response["hits"]["total"]["value"]} founded!')
for pet in  response["hits"]["hits"][:5]:
  print(pet["_source"])
print(f'{len(response["aggregations"]["breeds"]["buckets"])} breeds avialable to filter!')
for breed in response["aggregations"]["breeds"]["buckets"]:
  print(breed)

88 founded!
{'Type': 'Dog', 'Age': 3, 'Breed1': 'Mixed Breed', 'Gender': 'Female', 'Color1': 'Black', 'Color2': 'Brown', 'MaturitySize': 'Medium', 'FurLength': 'Medium', 'Vaccinated': 'Yes', 'Sterilized': 'Not Sure', 'Health': 'Healthy', 'Fee': 100, 'Description': 'Friendly and active dog who loves to play with people and other dogs.', 'PhotoAmt': 3, 'AdoptionSpeed': 4}
{'Type': 'Dog', 'Age': 2, 'Breed1': 'Mixed Breed', 'Gender': 'Female', 'Color1': 'Brown', 'Color2': 'No Color', 'MaturitySize': 'Small', 'FurLength': 'Medium', 'Vaccinated': 'No', 'Sterilized': 'Yes', 'Health': 'Healthy', 'Fee': 0, 'Description': '2 months old female dog ... very friendly and active', 'PhotoAmt': 0, 'AdoptionSpeed': 4}
{'Type': 'Dog', 'Age': 9, 'Breed1': 'Beagle', 'Gender': 'Male', 'Color1': 'Brown', 'Color2': 'White', 'MaturitySize': 'Large', 'FurLength': 'Medium', 'Vaccinated': 'Yes', 'Sterilized': 'Not Sure', 'Health': 'Healthy', 'Fee': 0, 'Description': 'He is a friendly and active dog. He is also v

This first facet now can be repeated but filtering an specific Breed and now lets see the fur lenght avialable.

In [None]:
query = {
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "default_field": "Description",
            "query": "friendly and active dogs",
            "default_operator": "AND"
          }
        },
        {
          "match": {
            "Breed1": "Beagle"
          }
        }
      ]
    }
  },
  "aggs": {
    "fur": {
      "terms": {
        "field": "FurLength.keyword",
        "size": 100
      }
    }
  }
}




response = es_client.search(index="pets",body=query)

print(f'{response["hits"]["total"]["value"]} founded!')
for pet in  response["hits"]["hits"][:5]:
  print(pet["_source"])
print(f'{len(response["aggregations"]["fur"]["buckets"])} fur avialable to filter!')
for breed in response["aggregations"]["fur"]["buckets"]:
  print(breed)

4 founded!
{'Type': 'Dog', 'Age': 9, 'Breed1': 'Beagle', 'Gender': 'Male', 'Color1': 'Brown', 'Color2': 'White', 'MaturitySize': 'Large', 'FurLength': 'Medium', 'Vaccinated': 'Yes', 'Sterilized': 'Not Sure', 'Health': 'Healthy', 'Fee': 0, 'Description': 'He is a friendly and active dog. He is also very playful. His energy is like unlimited.', 'PhotoAmt': 2, 'AdoptionSpeed': 1}
{'Type': 'Dog', 'Age': 48, 'Breed1': 'Beagle', 'Gender': 'Male', 'Color1': 'Black', 'Color2': 'Brown', 'MaturitySize': 'Medium', 'FurLength': 'Medium', 'Vaccinated': 'Yes', 'Sterilized': 'Yes', 'Health': 'Healthy', 'Fee': 500, 'Description': "Bosco is looking for a new owner as he deserves a better home with more time and companionship. He is an active, loving and friendly dog. Smart dog as he picks up tricks very easily. Very upset to have to let him go as I've been posted to other states for my work.", 'PhotoAmt': 0, 'AdoptionSpeed': 4}
{'Type': 'Dog', 'Age': 36, 'Breed1': 'Beagle', 'Gender': 'Male', 'Color1': 

We can include as many facets as we like, and also we can include as many aggregations as columns in our dataset

In [None]:
query = {
  "query": {
    "query_string": {
      "default_field": "Description",
      "query": "friendly and active dogs",
      "default_operator": "AND"
    }
  },
  "aggs": {
    "breeds": {
      "terms": {
        "field": "Breed1.keyword",
        "size": 100
      }
    },
    "fur": {
      "terms": {
        "field": "FurLength.keyword",
        "size": 100
      }
    }
  }
}


response = es_client.search(index="pets",body=query)

print(f'{response["hits"]["total"]["value"]} founded!')
for pet in  response["hits"]["hits"][:5]:
  print(pet["_source"])
print(f'{len(response["aggregations"]["fur"]["buckets"])} fur avialable to filter!')
for breed in response["aggregations"]["fur"]["buckets"]:
  print(breed)
print(f'{len(response["aggregations"]["breeds"]["buckets"])} breeds avialable to filter!')
for breed in response["aggregations"]["breeds"]["buckets"]:
  print(breed)

88 founded!
{'Type': 'Dog', 'Age': 3, 'Breed1': 'Mixed Breed', 'Gender': 'Female', 'Color1': 'Black', 'Color2': 'Brown', 'MaturitySize': 'Medium', 'FurLength': 'Medium', 'Vaccinated': 'Yes', 'Sterilized': 'Not Sure', 'Health': 'Healthy', 'Fee': 100, 'Description': 'Friendly and active dog who loves to play with people and other dogs.', 'PhotoAmt': 3, 'AdoptionSpeed': 4}
{'Type': 'Dog', 'Age': 2, 'Breed1': 'Mixed Breed', 'Gender': 'Female', 'Color1': 'Brown', 'Color2': 'No Color', 'MaturitySize': 'Small', 'FurLength': 'Medium', 'Vaccinated': 'No', 'Sterilized': 'Yes', 'Health': 'Healthy', 'Fee': 0, 'Description': '2 months old female dog ... very friendly and active', 'PhotoAmt': 0, 'AdoptionSpeed': 4}
{'Type': 'Dog', 'Age': 9, 'Breed1': 'Beagle', 'Gender': 'Male', 'Color1': 'Brown', 'Color2': 'White', 'MaturitySize': 'Large', 'FurLength': 'Medium', 'Vaccinated': 'Yes', 'Sterilized': 'Not Sure', 'Health': 'Healthy', 'Fee': 0, 'Description': 'He is a friendly and active dog. He is also v

### Fuzzy parameter
Sometimes text searches have misspells , one simple way to deal with it is to include a fuzzy token analysis, this uses levenshtein distance distance to find token even with misspell, for example "friendly" and "fliendly"

In [None]:
query = {
  "query": {
    "query_string": {
      "default_field": "Description",
      "query": "fliendly and active dogs",
      "default_operator": "AND"
    }
  }
}


response = es_client.search(index="pets",body=query)

print(f'{response["hits"]["total"]["value"]} founded!')
for pet in  response["hits"]["hits"][:5]:
  print(pet["_source"])

0 founded!


In [None]:
query = {
  "query": {
    "query_string": {
      "default_field": "Description",
      "query": "fliendly~ and active dogs",
      "default_operator": "AND"
    }
  }
}


response = es_client.search(index="pets",body=query)

print(f'{response["hits"]["total"]["value"]} founded!')
for pet in  response["hits"]["hits"][:5]:
  print(pet["_source"])

88 founded!
{'Type': 'Dog', 'Age': 3, 'Breed1': 'Mixed Breed', 'Gender': 'Female', 'Color1': 'Black', 'Color2': 'Brown', 'MaturitySize': 'Medium', 'FurLength': 'Medium', 'Vaccinated': 'Yes', 'Sterilized': 'Not Sure', 'Health': 'Healthy', 'Fee': 100, 'Description': 'Friendly and active dog who loves to play with people and other dogs.', 'PhotoAmt': 3, 'AdoptionSpeed': 4}
{'Type': 'Dog', 'Age': 2, 'Breed1': 'Mixed Breed', 'Gender': 'Female', 'Color1': 'Brown', 'Color2': 'No Color', 'MaturitySize': 'Small', 'FurLength': 'Medium', 'Vaccinated': 'No', 'Sterilized': 'Yes', 'Health': 'Healthy', 'Fee': 0, 'Description': '2 months old female dog ... very friendly and active', 'PhotoAmt': 0, 'AdoptionSpeed': 4}
{'Type': 'Dog', 'Age': 9, 'Breed1': 'Beagle', 'Gender': 'Male', 'Color1': 'Brown', 'Color2': 'White', 'MaturitySize': 'Large', 'FurLength': 'Medium', 'Vaccinated': 'Yes', 'Sterilized': 'Not Sure', 'Health': 'Healthy', 'Fee': 0, 'Description': 'He is a friendly and active dog. He is also v

### Score analyis
The results are sorted by using an score formula, Elasticsearch uses TF/IDF as default score , you can analyze the score by inlcudint the "explain" parameter

In [None]:
query = {
  "query": {
    "query_string": {
      "default_field": "Description",
      "query": "fliendly~ and actiive~ dogs",
      "default_operator": "AND"
    }
  },
  "explain":True
}


response = es_client.search(index="pets",body=query)

print(f'{response["hits"]["total"]["value"]} founded!')
print(response["hits"]["hits"][0]["_explanation"])

92 founded!
{'value': 6.5442266, 'description': 'sum of:', 'details': [{'value': 2.218247, 'description': 'sum of:', 'details': [{'value': 2.218247, 'description': 'weight(Description:friendli in 3351) [PerFieldSimilarity], result of:', 'details': [{'value': 2.218247, 'description': 'score(freq=1.0), computed as boost * idf * tf from:', 'details': [{'value': 1.6500001, 'description': 'boost', 'details': []}, {'value': 1.952136, 'description': 'idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:', 'details': [{'value': 1635, 'description': 'n, number of documents containing term', 'details': []}, {'value': 11519, 'description': 'N, total number of documents with field', 'details': []}]}, {'value': 0.6886773, 'description': 'tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:', 'details': [{'value': 1.0, 'description': 'freq, occurrences of term within document', 'details': []}, {'value': 1.2, 'description': 'k1, term saturation parameter', 'details': []}, {'value':

## References:

- [Elasticsearch](https://www.elastic.co/guide/en/elasticsearch/reference/current/targz.html)

- [PetFinder Dataset](https://www.kaggle.com/c/petfinder-adoption-prediction)

