# Redis in Python

This notebook will showcase how to interact with a Redis db via Python.

Useful resources:
<ul>
    <li><a href="https://redis-py.readthedocs.io/en/stable/" target="_blank">Redis Python documentation</a></li>
    <li><a href="https://redis.io/docs/getting-started/" target="_blank">Base Redis documentation</a></li>
    <li><a href="https://www.ibm.com/docs/en/watson-studio-local/1.2.3?topic=notebooks-markdown-jupyter-cheatsheet" target="_blank">Markdown in jupyter cheatsheet</a></li>
</ul>

## Document Database Example:

In this section we will get started by connecting to our redis db and start adding data from the University Idaho Library (<a href="https://www.lib.uidaho.edu/digital/1918flu/assets/data/metadata.json" target="_blank">1918 Flu Pandemic Collection</a>). This example will focus on Redis as a document database where we will set up a searchable index, add data, and finally search our data. For a detailed example see: <a href="https://redis-py.readthedocs.io/en/stable/examples/search_json_examples.html" target="_blank">redis-py docs</a>.

Start by importing the required modules:

In [1]:
import redis
from redis.commands.search.field import TextField
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
import json

Next we connect to our Redis db: 

In [2]:
redis_client = redis.Redis(host='localhost', port=6379, db=0, decode_responses=True)

Before we add data, we will create a searchable index. To do so we must create an index definition that defines the key prefix and the type of data that will be in the index. Additionally, we will specifiy which fields we would like to be searchable within the index.

In [3]:
index_def = IndexDefinition(
    index_type=IndexType.JSON,
    prefix = ['document:']
)
# $ indicates the root level of the document (top most level of the JSON)
schema = (TextField('$.description', as_name='description'))
# ft stands for full text and is apart of the RedisSearch module
redis_client.ft('py_document_idx').create_index(schema, definition = index_def)

'OK'

Read the JSON file that contains data:

In [4]:
with open('./data/uidaho_spanish_flu_metadata.json') as flu_file:
    data = json.load(flu_file)

In [22]:
print(str(data)[:2000])

{'objects': [{'objectid': 'spanishflu001', 'filename': 'pg90_8-1-84_11.jpg', 'title': 'Inland Hospital Postcard', 'creator': 'Ott, Clifford M.', 'description': "Postcard depicting Dr. Carither's Inland Hospital, Moscow, Idaho. In 1918, while Dr. Carither's was serving the nation, he allowed Inland Hospital to be used for influenza patients from the university's Student Army Training Corps.", 'date': '1910-01', 'date_is_approximate': 'yes', 'subject': 'hospitals; moscow; buildings;', 'location': 'United States--Idaho--Latah County--Moscow', 'digital_collection': 'https://www.lib.uidaho.edu/digital/ott/', 'source': 'Clifford M. Ott collection, PG 90, University of Idaho Library Special Collections and Archives', 'identifier': 'PG90_8-1-84_11', 'format_original': 'Negative', 'format': 'image/jpg', 'type': 'image;stillimage', 'rights': 'Material has likely passed into public domain. Digital reproductions are made available by University of Idaho Library for educational purposes, and future

We can see that our example data contains a key named objects that contains a list of items. These items are the documents that we will be writing to the database.

In [6]:
# iterate over the list of documents
for document in data['objects']:
    # create a unique key for each document
    document_key = 'document:{0}'.format(document['identifier'])
    # write the document to the database
    redis_client.json().set(document_key, '$', document)

After adding our data to the database, we can inspect the keys that have been added (I subset the list because we don't need to see all of the keys):

In [13]:
redis_client.keys()[0:5]

['document:arg1918_10_19p3',
 'document:arg1918_10_30p2b',
 'document:UG12_526_02',
 'document:arg1918_11_06p2b',
 'document:UG12_526_06']

Now that we know that certain keys exist, let's check the data types so that we can properly query the data:

In [23]:
redis_client.type('document:arg1918_10_19p3')

'ReJSON-RL'

<div class="alert alert-block alert-success">
    Keys can contain different datatypes each of which requires a specific command to access.
</div

The RedisSearch module supports a variety of search types. Here are a few examples:

In [9]:
# exact search for a word
results = redis_client.ft('py_document_idx').search("symptoms")
# search for documents that have the prefix "th" within them
results_2 = redis_client.ft('py_document_idx').search("th*")
# fuzzy search
results_3 = redis_client.ft('py_document_idx').search("%%simptom%%")

In [10]:
print(results.total)

1


In [30]:
print(results)

Result{1 total, docs: [Document {'id': 'document:MG170_5', 'payload': None, 'json': '{"objectid":"spanishflu007","filename":"mg202_1.jpg","title":"Alpha Kappa Epsilon House","creator":"Thomas, Esther E.","description":"Photograph of the Alpha Kappa Epsilon House. This house was converted into an auxiliary hospital for patients with mild symptoms of the influenza.","date":"1918-10","date_is_approximate":"yes","subject":"hospitals; fraternities; houses","location":"United States--Idaho--Latah County--Moscow","source":"Esther E. Thomas scrapbooks, 1915-1925, MG 170, University of Idaho Library Special Collections and Archives","identifier":"MG170_5","format_original":"Scrapbook","format":"image/jpg","type":"image;stillimage","rights":"Material has likely passed into public domain. Digital reproductions are made available by University of Idaho Library for educational purposes, and future use should acknowledge this repository. For more information, please contact University of Idaho Libra

In [12]:
print(results_3)

Result{1 total, docs: [Document {'id': 'document:MG170_5', 'payload': None, 'json': '{"objectid":"spanishflu007","filename":"mg202_1.jpg","title":"Alpha Kappa Epsilon House","creator":"Thomas, Esther E.","description":"Photograph of the Alpha Kappa Epsilon House. This house was converted into an auxiliary hospital for patients with mild symptoms of the influenza.","date":"1918-10","date_is_approximate":"yes","subject":"hospitals; fraternities; houses","location":"United States--Idaho--Latah County--Moscow","source":"Esther E. Thomas scrapbooks, 1915-1925, MG 170, University of Idaho Library Special Collections and Archives","identifier":"MG170_5","format_original":"Scrapbook","format":"image/jpg","type":"image;stillimage","rights":"Material has likely passed into public domain. Digital reproductions are made available by University of Idaho Library for educational purposes, and future use should acknowledge this repository. For more information, please contact University of Idaho Libra

## Geographic Data

Next, we will add data that contain latitude and longitude and demonstrate how geographic queries work in Redis. The example dataset comes from the linked <a href="https://www.kaggle.com/datasets/camnugent/california-housing-prices?resource=download" target="_blank">Kaggle repository</a>.

In [58]:
import pandas as pd
import plotly.graph_objects as go
import plotly.io as pio
pio.renderers.default = 'iframe'

In [37]:
# read the csv data into a pandas dataframe
housing_df = pd.read_csv('./data/housing.csv')
print('{0} data points'.format(len(housing_df)))

20640 data points


In [35]:
housing_df.head()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value,ocean_proximity
0,-122.23,37.88,41.0,880.0,129.0,322.0,126.0,8.3252,452600.0,NEAR BAY
1,-122.22,37.86,21.0,7099.0,1106.0,2401.0,1138.0,8.3014,358500.0,NEAR BAY
2,-122.24,37.85,52.0,1467.0,190.0,496.0,177.0,7.2574,352100.0,NEAR BAY
3,-122.25,37.85,52.0,1274.0,235.0,558.0,219.0,5.6431,341300.0,NEAR BAY
4,-122.25,37.85,52.0,1627.0,280.0,565.0,259.0,3.8462,342200.0,NEAR BAY


Below we will be using the geoadd function to add location data to our database. For more information see <a href="https://redis.io/commands/geoadd/" target="_blank">GEOADD documentation</a>.

In [41]:
# iterate over the rows in the dataframe
for index, row in housing_df.iterrows():
    # locations in the name of the geo index that we will add data to 
    redis_client.geoadd('locations', (row['longitude'], row['latitude'], 'location:{0}'.format(index)), nx=True)

Geo indices are stored as sorted sets in Redis. To return the key names for a sorted set, you must use the zrange() function.

In [43]:
redis_client.zrange('locations', 0, -1)[0:5]

['location:9497',
 'location:9495',
 'location:9494',
 'location:9471',
 'location:9466']

In [51]:
location_list = redis_client.zrange('locations', 0, -1)
# returns coordinates as lng, lat
position_list = redis_client.geopos('locations', *location_list)

print(position_list[0])

(-123.80999833345413, 39.310000651575066)


In [52]:
full_lng_list = [x[0] for x in position_list]
full_lat_list = [x[1] for x in position_list]

In [59]:
fig = go.Figure(data=go.Scattergeo(
        lon = full_lng_list,
        lat = full_lat_list,
        text = position_list,
        mode = 'markers',
        ))

fig.update_layout(
        title = 'California Houses',
        geo_scope='usa',
    )
fig.show()

In [72]:
results = redis_client.georadius('locations', -122.22, 37.86, 50, unit='km', withcoord=True)

In [73]:
print(results[0:5])

[['location:17170', (-122.27000266313553, 37.43000050241453)], ['location:17168', (-122.27000266313553, 37.449999452361716)], ['location:17100', (-122.25999802350998, 37.449999452361716)], ['location:17169', (-122.23999947309494, 37.43000050241453)], ['location:17155', (-122.23000019788742, 37.42000102744094)]]


In [74]:
label_list = [x[0] for x in results]
lng_list = [x[1][0] for x in results]
lat_list = [x[1][1] for x in results]

In [75]:
fig = go.Figure(data=go.Scattergeo(
        lon = lng_list,
        lat = lat_list,
        text = label_list,
        mode = 'markers',
        ))

fig.update_layout(
        title = 'California Houses (Redis GEORADIUS Query)',
        geo_scope='usa',
    )
fig.show()