<a href="https://colab.research.google.com/github/kkrueger/Redis-Workshops/blob/main/03-Advanced_RedisSearch/03-Advanced_RedisSearch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Advanced RediSearch

![Redis](https://redis.com/wp-content/themes/wpx/assets/images/logo-redis.svg?auto=webp&quality=85,75&width=120)

This notebook is an adapted and simplified version of the RedisInsight QuickGuide "Working with Hashes".

For the full exterience we'd recommend installing RedisInsight and going through tutorial there.

https://redis.com/redis-enterprise/redis-insight/

In [None]:
# Install the requirements
!pip install -q redis

In [None]:
%%sh
curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
sudo apt-get update  > /dev/null 2>&1
sudo apt-get install redis-stack-server  > /dev/null 2>&1
redis-stack-server --daemonize yes


In [None]:
import redis
import os


In [None]:
REDIS_HOST = os.getenv("REDIS_HOST", "localhost")
REDIS_PORT = os.getenv("REDIS_PORT", "6379")
REDIS_PASSWORD = os.getenv("REDIS_PASSWORD", "")
#Replace values above with your own if using Redis Cloud instance
#REDIS_HOST="redis-12110.c82.us-east-1-2.ec2.cloud.redislabs.com"
#REDIS_PORT=12110
#REDIS_PASSWORD="pobhBJP7Psicp2gV0iqa2ZOc1XXXXXX"

#shortcut for redis-cli $REDIS_CONN command
if REDIS_PASSWORD!="":
  os.environ["REDIS_CONN"]=f"-h {REDIS_HOST} -p {REDIS_PORT} -a {REDIS_PASSWORD} --no-auth-warning"
else:
  os.environ["REDIS_CONN"]=f"-h {REDIS_HOST} -p {REDIS_PORT}"

In [None]:
r = redis.Redis(
  host=REDIS_HOST,
  port=REDIS_PORT,
  password=REDIS_PASSWORD)
r.ping()

## Redis Hashes

Hash is a fundamental Redis datatype.


See full list of Redis HASH commands here: https://redis.io/commands/?group=hash




In [None]:
schools = [
    {"name":"Hall School",
     "description":"Spanning 10 states, this school award-winning curriculum includes a comprehensive reading system (from letter recognition and phonics to reading full-length books), as well as math, science, social studies, and even  philosophy.",
     "class":"independent",
     "type":"traditional",
     "address_city":"London",
     "address_street":"Manor Street",
     "students":342,
     "location":"51.445417, -0.258352"
     },
    {"name":"Garden School",
     "description":"Garden School is a new and innovative outdoor teaching and learning experience, offering rich and varied activities in a natural environment to children and families.",
     "class":"state","type":"forest; montessori",
     "address_city":"London",
     "address_street":"Gordon Street",
     "students":1452,
     "location":"51.402926, -0.321523",
     },

    {"name":"Gillford School",
     "description":"Gillford School is an inclusive learning centre welcoming people from all walks of life, here invited to step into their role as regenerative agents, creating new pathways into the future and inciting an international movement of cultural, land, and social transformation.",
     "class":"private",
     "type":"democratic; waldorf",
     "address_city":"Goudhurst",
     "address_street":"Goudhurst",
     "students":721,
     "location":"51.112685, 0.451076"
     },

     {
     "name":"Forest School",
     "description":"The philosophy behind Forest School is based upon the desire to provide young children with an education that encourages appreciation of the wide world in nature while achieving independence, confidence and high self-esteem. ",
     "class":"independent",
     "type":"forest; montessori; democratic",
     "address_city":"Oxford",
     "address_street":"Trident Street",
     "students":1200,
     "location":"51.781756, -1.123196"
     }
    ]
#load data in Redis as JSON
for id,school in enumerate(schools):
    #print(school)
    r.hset(f"school:{id}", mapping = school)

In [None]:
!redis-cli $REDIS_CONN keys 'school:*'
!echo
!redis-cli $REDIS_CONN HGETALL school:1



In [None]:
#!redis-cli $REDIS_CONN flushdb

## RediSearch

RediSearch adds the ability to query data in your HASH or JSON data structures, essentially turning Redis into the docuemnt database.

With RediSearch you declare indices once and then every database object matching the prefix, defined in the index would be automatically and in real time added to the index.

For the full list of RediSearch commands see: https://redis.io/commands/?group=search

Python documentation: https://redis-py.readthedocs.io/en/stable/redismodules.html#redisearch-commands

In [None]:
from redis.commands.search.field import (
    NumericField,
    TagField,
    TextField,
    GeoField,
    VectorField
)
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.query import Query
from redis.commands.search.aggregation import AggregateRequest
from redis.commands.search import reducers
schema = (
    TextField("name", as_name="name"),
    TextField("description", as_name="description"),
    TagField("address_city", as_name="city"),
    TagField("type", as_name="type", separator=";"),
    NumericField("students", as_name="students"),
    GeoField("location", as_name="location")
    )
r.ft("idx:schools").create_index(schema,
                    definition=IndexDefinition(prefix=["school:"],
                    index_type=IndexType.HASH)
                    )

In [None]:
#Retrieve index information
r.ft("idx:schools").info()


In [None]:
import pandas as pd

#helper function to display results of redis.ft().search() as a dataframe
def display_ft(res):
  if res.total==0:
    print("No matches found")
  else:
    res_df = pd.DataFrame([t.__dict__ for t in res.docs ]).drop(columns=["payload"])
    display(res_df)

#helper function to translate aggregate result to dataframe and display it
#function is created with the help of ChatGPT: https://chat.openai.com/share/fc4e4ea5-d421-4aaf-a1b2-6fac02c96f20
def display_ft_agg(res):
  data = res.rows
  data = [[item.decode('utf-8') for item in sublist] for sublist in data]
  column_dict = {}
  for sublist in data:
      for i in range(0, len(sublist), 2):
          column_name = sublist[i]
          column_value = sublist[i + 1]
          column_dict.setdefault(column_name, []).append(column_value)
  df = pd.DataFrame(column_dict)
  display(df)

## Text search

You can run full text search queries on any field you marked to be indexed as `TEXT` or `TextField()` in Python.

To specify the specific field use `@field:value` syntax.

You can also do pattern matching, using `@field:val*`

In [None]:
#return the entire document
res=r.ft("idx:schools").search("nature")
display_ft(res)

In [None]:
#Full text search, return selected fields only
query=Query("nature") \
   .return_field("address_city", as_field="city") \
   .return_field("name", as_field="name")
res=r.ft("idx:schools").search(query)
display_ft(res)


## Search with multiple parameters
You can combine conditions on multiple fields using space as a logical AND or pipe `|` as logical OR.

In [None]:
# Perform a search for documents that have all of the tags (AND condition)
query=Query("@type:{forest} @type:{montessori}")
res=r.ft("idx:schools").search(query)
display_ft(res)


In [None]:
# Perform a search for documents that are either in Goudhurst or of type montessori (OR condition)
query=Query("(@city:{Goudhurst})|(@type:{montessori})")
res=r.ft("idx:schools").search(query)
display_ft(res)

## TAG, Numeric and Geo search

For TAG fields - use `@field:{value}` syntax.

Geo radius matches on geo fields with the syntax `@field:[lon lat radius {m|km|mi|ft}]`

Numeric ranges look like `@students:[0, 10000]` with square brackets used as inclusive (less or equal, greater or equal) and regular braces for exclusive (less or greater).

In [None]:
#Geolocation search
query = Query('@location:[51.3 0.32 30 km]')
res = r.ft('idx:schools').search(query)
display_ft(res)

In [None]:
#Combining Tag and Geo conditions
query=Query('@city:{London} @students:[0, 10000]') \
   .return_field("address_city", as_field="city") \
   .return_field("name", as_field="name") \
   .return_field("students", as_field="students")
res=r.ft("idx:schools").search(query)
display_ft(res)



## Aggregations
Aggregations are a way to process the results of a search query, group, sort and transform them - and extract analytic insights from them. Much like aggregation queries in other databases and search engines, they can be used to create analytics reports, or perform Faceted Search style queries.

For example, we can group schools by city and count schools per group, giving us the number of schools per city. Or we could group by school class (independent/state) and see the average number of students per group.

In [None]:
#Perform aggregation by city and count number of records
request = AggregateRequest(f'*').group_by('@city', reducers.count().alias('count'))
res = r.ft("idx:schools").aggregate(request)
display_ft_agg(res)



In [None]:
#Perform aggregation by city and count number of students
request = AggregateRequest(f'*').group_by('@city', reducers.sum('@students').alias('students_count'))
res = r.ft("idx:schools").aggregate(request)
display_ft_agg(res)