<center>
    <h2>Online learning platform database - Redis</h2>
    <h3>Performing the queries and storing the queries execution time</h3>
</center>

<h3>Python - Redis interaction</h3>

Prior to performing the queries, we import the required modules (the <i>redis-py</i> driver and the <i>time</i> and <i>csv</i> modules) and establish connections to the four different Redis instances running in Docker. Also, we import the set of dependencies that are necessary to perform queries and aggregations.

In [89]:
# import modules
import redis              # Redis driver (redis-py)
import time               # time-related functions to register query execution times
import csv                # read and write csv files

# start four different Redis instances
myRedis = redis.Redis(host = 'localhost', port = 6379, decode_responses = True)
myRedis2 = redis.Redis(host = 'localhost', port = 6382, decode_responses = True)
myRedis3 = redis.Redis(host = 'localhost', port = 6383, decode_responses = True)
myRedis4 = redis.Redis(host = 'localhost', port = 6384, decode_responses = True)

# import query-related redis-py dependencies
from redis.commands.search.field import TextField, NumericField, TagField
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.query import Query
import redis.commands.search.aggregation as aggregations
import redis.commands.search.reducers as reducers

<h3>Query the Redis instances</h3>

I create a dictionary of lists for each of the four Redis instances. In these dictionaries the keys are the query names and the values are the 31 query execution times: in fact I attach the value of the query execution time of the most recent query to the list. Since query execution times are required in milliseconds, prior to attaching them, I multiply them by 1000 and round them to the fifth decimal precision.
As previously explained, the above summarized actions (for each of the four queries on each of the four instances) are performed by following a standard succession of steps. Each step is encapsulated within a notebook cell (so each query is performed 31 times by using four notebook cells), as follows:
 - step 1: define the query: create a RediSearch object, define the schema (basically, the indices needed for the specific query) and the index characteristics, create the index;
 - step 2: perform the query for the first time, contextually create timestamps prior and after query execution, print query result;
 - step 3: compute execution time of the first query execution and store it within the corresponding dictionary list;
 - step 4: [thirty times] perform query execution while creating prior and following timestamps, compute execution time and store it within the corresponding dictionary list.

For each Redis instance, after having performed the four queries, I will finally compute the mean of the query executions from step 3. Together with the first query execution, this mean value will be stored into a new dictionary, specific to a dataset. Originally, I would use these four new dictionaries to save the query execution times into a csv file for constructing histograms. I later resolved to save all the 31 recorded query execution times and pass them all to Microsoft© Excel to process them.

N.B.:
It must be noticed that, if an index has been already created, it resides in the memory of the system. Hence, either it is directly re-used or it must be deleted and created anew. Since I want to keep the code for documentation purposes, I keep a commented line with the instruction to destroy an index in each <i>step1</i> cell for use when needed.

In [2]:
smallDict = {'query1' : list(), 'query2' : list(), 'query3' : list(), 'query4' : list()}
mediumDict = {'query1' : list(), 'query2' : list(), 'query3' : list(), 'query4' : list()}
largeDict = {'query1' : list(), 'query2' : list(), 'query3' : list(), 'query4' : list()}
humongousDict = {'query1' : list(), 'query2' : list(), 'query3' : list(), 'query4' : list()}

In [3]:
# mean function
def mean(aList):
    n = len(aList)
    sum = 0
    for value in aList:
        sum += value
    return sum / n

<h3>Redis instance with 250k hashes</h3>

I start with the Redis instance with the smallest number of hash keys.

<h4>Query1</h4>

In [6]:
#step 1
small_redis1 = myRedis.ft('small_index1')
#small_redis1.dropindex()
schema1 = (TextField('courseID'), TextField('firstName'), TextField('lastName'))
index_definition = IndexDefinition(prefix = ['smallDB:'], index_type = IndexType.HASH)
small_redis1.create_index(schema1, definition = index_definition)

'OK'

In [7]:
#step 2
aggRequest1 = aggregations.AggregateRequest('@courseID:192').group_by({'@firstName', '@lastName'})
before = time.time()
small_query1 = small_redis1.aggregate(aggRequest1)
after = time.time()

for res in small_query1.rows:
    print(res[1], res[3])

Ferrán Narciso
Lie Cathrine
Nicolas Émile
Amundsen Ingeborg
Laroche Arthur
Lara Sarah
Leite Patrícia
Gaižauskas Vigilija
Hidalgo Custodia
Arenas Casandra
Soylu Ledün
Nicolas Nath
Narušis Ana


In [8]:
#step 2
msec_duration = (after - before) * 1000
smallDict['query1'].append(round(msec_duration, 5))

In [9]:
# step 3
for i in range(0, 30):
    before = time.time()
    small_redis1.aggregate(aggRequest1)
    after = time.time()
    msec_duration = (after - before) * 1000
    smallDict['query1'].append(round(msec_duration, 5))

<h4>Query2</h4>

In [11]:
# step 1
small_redis2 = myRedis.ft('small_index2')
#small_redis2.dropindex()
schema2 = (TextField('discipline'), TextField('courseYear'), TextField('courseName'))
index_definition = IndexDefinition(prefix = ['smallDB:'], index_type = IndexType.HASH)
small_redis2.create_index(schema2, definition = index_definition)

'OK'

In [12]:
# step 2
aggRequest2 = aggregations.AggregateRequest('@discipline:statistics @courseYear:2022').group_by('@courseName')
before = time.time()
small_query2 = small_redis2.aggregate(aggRequest2)
after = time.time()

for res in small_query2.rows:
    print(res[1])

Introduction to Probability and Data with R
Basic Statistics
Bayesian Statistics: From Concept to Data Analysis
Python and Statistics for Financial Analysis
Understanding Clinical Research: Behind the Statistics
Econometrics: Methods and Applications
Exploratory Data Analysis
Foundations: Data, Data, Everywhere
Introduction to Statistics


In [13]:
# step 3
msec_duration = (after - before) * 1000
smallDict['query2'].append(round(msec_duration, 5))

In [14]:
# step 4
for i in range(0, 30):
    before = time.time()
    small_redis2.aggregate(aggRequest2)
    after = time.time()
    msec_duration = (after - before) * 1000
    smallDict['query2'].append(round(msec_duration, 5))

<h4>Query3</h4>

In [16]:
# step 1
small_redis3 = myRedis.ft('small_index3')
#small_redis3.dropindex()
schema3 = (TextField('materialType'), TagField('discipline'), TextField('email'), TextField('firstName'))
index_definition = IndexDefinition(prefix = ['smallDB:'], index_type = IndexType.HASH)
small_redis3.create_index(schema3, definition = index_definition)

'OK'

In [21]:
# step 2
aggRequest3 = aggregations.AggregateRequest('@discipline:{maths} @materialType:\'lecture slides\' @email:*gmail.com').group_by('@discipline', reducers.count().alias('count'))
before = time.time()
small_query3 = small_redis3.aggregate(aggRequest3)
after = time.time()

print(small_query3.rows[0][3])

838


In [22]:
# step 3
msec_duration = (after - before) * 1000
smallDict['query3'].append(round(msec_duration, 5))

In [23]:
# step 4
for i in range(0, 30):
    before = time.time()
    small_query3 = small_redis3.aggregate(aggRequest3)
    after = time.time()
    msec_duration = (after - before) * 1000
    smallDict['query3'].append(round(msec_duration, 5))

<h4>Query4</h4>

In [25]:
# step 1
small_redis4 = myRedis.ft('small_index4')
#small_redis4.dropindex()
schema4 = (TagField('discipline'), TagField('courseYear'), TextField('country'), TextField('dateOfBirth'), TextField('firstName'), TextField('lastName', sortable = True))
index_definition = IndexDefinition(prefix = ['smallDB:'], index_type = IndexType.HASH)
small_redis4.create_index(schema4, definition = index_definition)

'OK'

In [26]:
#step 2
aggRequest4 = aggregations.AggregateRequest('@discipline:{psychology} AND @courseYear:{2023} AND @country:*orea AND -@dateOfBirth:200*').group_by({'@firstName', '@lastName', '@country', '@dateOfBirth'}).sort_by('@lastName')
before = time.time()
small_query4 = small_redis4.aggregate(aggRequest4)
after = time.time()

for res in small_query4.rows:
    print(res[1], res[5], res[7], res[3])

lie South Korea Cathrine 1986-7-12
reynolds Korea Lynda 1989-7-21
sura North Korea Raghav 1973-11-27


In [27]:
# step 3
msec_duration = (after - before) * 1000
smallDict['query4'].append(round(msec_duration, 5))

In [28]:
# step 4
for i in range(0, 30):
    before = time.time()
    small_redis4.aggregate(aggRequest4)
    after = time.time()
    msec_duration = (after - before) * 1000
    smallDict['query4'].append(round(msec_duration, 5))

In [30]:
smallDataset = {'query1' : list(), 'query2' : list(), 'query3' : list(), 'query4' : list()}
for key in smallDict:
    smallDataset[key].append(smallDict[key][0])
    mean30 = mean(smallDict[key][1 : 31])
    smallDataset[key].append(round(mean30, 5))
smallDataset

{'query1': [43.05792, 6.59162],
 'query2': [12.70509, 19.69381],
 'query3': [17.30895, 17.22561],
 'query4': [14.01019, 7.57892]}

<h3>Redis instance with 500k hashes</h3>

<h4>Query1</h4>

In [34]:
# step 1
medium_redis1 = myRedis2.ft('medium_index1')
#medium_redis1.dropindex()
schema1 = (TextField('courseID'), TextField('firstName'), TextField('lastName'))
index_definition = IndexDefinition(prefix = ['mediumDB:'], index_type = IndexType.HASH)
medium_redis1.create_index(schema1, definition = index_definition)

'OK'

In [35]:
# step 2
aggRequest1 = aggregations.AggregateRequest('@courseID:192').group_by({'@firstName', '@lastName'})
before = time.time()
medium_query1 = medium_redis1.aggregate(aggRequest1)
after = time.time()

for res in medium_query1.rows:
    print(res[1], res[3])

Kavaliauskas Joris
Ferrán Narciso
Nicolas Émile
Lie Cathrine
Henschel Christl
Amundsen Ingeborg
Laroche Arthur
Dara Yuvaan
Lara Sarah
Leite Patrícia
Christensen Karl
Vaz Débora
Naujokas Nedas
Gaižauskas Vigilija
Hidalgo Custodia
Arenas Casandra
Soylu Ledün
Nicolas Nath
Real Miguel
Narušis Ana


In [36]:
# step 3
msec_duration = (after - before) * 1000
mediumDict['query1'].append(round(msec_duration, 5))

In [37]:
# step 4
for i in range(0, 30):
    before = time.time()
    medium_redis1.aggregate(aggRequest1)
    after = time.time()
    msec_duration = (after - before) * 1000
    mediumDict['query1'].append(round(msec_duration, 5))

<h4>Query2</h4>

In [39]:
# step 1
medium_redis2 = myRedis2.ft('medium_index2')
#medium_redis2.dropindex()
schema2 = (TextField('discipline'), TextField('courseYear'), TextField('courseName'))
index_definition = IndexDefinition(prefix = ['mediumDB:'], index_type = IndexType.HASH)
medium_redis2.create_index(schema2, definition = index_definition)

'OK'

In [40]:
# step 2
aggRequest2 = aggregations.AggregateRequest('@discipline:statistics @courseYear:2022').group_by('@courseName')
before = time.time()
medium_query2 = medium_redis2.aggregate(aggRequest2)
after = time.time()

for res in medium_query2.rows:
    print(res[1])

Introduction to Probability and Data with R
Basic Statistics
Bayesian Statistics: From Concept to Data Analysis
Python and Statistics for Financial Analysis
Understanding Clinical Research: Behind the Statistics
Econometrics: Methods and Applications
Exploratory Data Analysis
Foundations: Data, Data, Everywhere
Introduction to Statistics


In [41]:
# step 3
msec_duration = (after - before) * 1000
mediumDict['query2'].append(round(msec_duration, 5))

In [42]:
# step 4
for i in range(0, 30):
    before = time.time()
    medium_redis2.aggregate(aggRequest2)
    after = time.time()
    msec_duration = (after - before) * 1000
    mediumDict['query2'].append(round(msec_duration, 5))

<h4>Query3</h4>

In [44]:
# step 1
medium_redis3 = myRedis2.ft('medium_index3')
#medium_redis3.dropindex()
schema3 = (TextField('materialType'), TagField('discipline'), TextField('email'), TextField('firstName'))
index_definition = IndexDefinition(prefix = ['mediumDB:'], index_type = IndexType.HASH)
medium_redis3.create_index(schema3, definition = index_definition)

'OK'

In [51]:
# step 2
aggRequest3 = aggregations.AggregateRequest('@discipline:{maths} @materialType:\'lecture slides\' @email:*gmail.com').group_by('@discipline', reducers.count().alias('count'))
before = time.time()
medium_query3 = medium_redis3.aggregate(aggRequest3)
after = time.time()

print(medium_query3.rows[0][3])

1698


In [52]:
# step 3
msec_duration = (after - before) * 1000
mediumDict['query3'].append(round(msec_duration, 5))

In [54]:
# step 4
for i in range(0, 30):
    before = time.time()
    medium_query3 = medium_redis3.aggregate(aggRequest3)
    after = time.time()
    msec_duration = (after - before) * 1000
    mediumDict['query3'].append(round(msec_duration, 5))

<h4>Query4</h4>

In [56]:
# step 1
medium_redis4 = myRedis2.ft('medium_index4')
#medium_redis4.dropindex()
schema4 = (TagField('discipline'), TagField('courseYear'), TextField('country'), TextField('dateOfBirth'), TextField('firstName'), TextField('lastName', sortable = True))
index_definition = IndexDefinition(prefix = ['mediumDB:'], index_type = IndexType.HASH)
medium_redis4.create_index(schema4, definition = index_definition)

'OK'

In [57]:
# step 2
aggRequest4 = aggregations.AggregateRequest('@discipline:{psychology} AND @courseYear:{2023} AND @country:*orea AND -@dateOfBirth:200*').group_by({'@firstName', '@lastName', '@country', '@dateOfBirth'}).sort_by('@lastName')
before = time.time()
medium_query4 = medium_redis4.aggregate(aggRequest4)
after = time.time()

for res in medium_query4.rows:
    print(res[1], res[5], res[7], res[3])

horrocks Noord-Korea Ninthe 1962-10-15
lie South Korea Cathrine 1986-7-12
real República de Corea Miguel 1987-9-12
reynolds Korea Lynda 1989-7-21
sura North Korea Raghav 1973-11-27


In [58]:
# step 3
msec_duration = (after - before) * 1000
mediumDict['query4'].append(round(msec_duration, 5))

In [59]:
# step 4
for i in range(0, 30):
    before = time.time()
    medium_redis4.aggregate(aggRequest4)
    after = time.time()
    msec_duration = (after - before) * 1000
    mediumDict['query4'].append(round(msec_duration, 5))

In [61]:
mediumDataset = {'query1' : list(), 'query2' : list(), 'query3' : list(), 'query4' : list()}
for key in mediumDict:
    mediumDataset[key].append(mediumDict[key][0])
    mean30 = mean(mediumDict[key][1 : 31])
    mediumDataset[key].append(round(mean30, 5))
mediumDataset

{'query1': [9.85003, 8.29449],
 'query2': [16.55197, 37.00384],
 'query3': [28.56994, 41.98983],
 'query4': [7.14493, 8.67297]}

<h3>Redis instance with 750k hashes</h3>

<h4>Query1</h4>

In [63]:
# step 1
large_redis1 = myRedis3.ft('large_index1')
#large_redis1.dropindex()
schema1 = (TextField('courseID'), TextField('firstName'), TextField('lastName'))
index_definition = IndexDefinition(prefix = ['largeDB:'], index_type = IndexType.HASH)
large_redis1.create_index(schema1, definition = index_definition)

'OK'

In [64]:
# step 2
aggRequest1 = aggregations.AggregateRequest('@courseID:192').group_by({'@firstName', '@lastName'})
before = time.time()
large_query1 = large_redis1.aggregate(aggRequest1)
after = time.time()

for res in large_query1.rows:
    print(res[1], res[3])

Real Miguel
Ferrán Narciso
Lie Cathrine
Nicolas Émile
Narušis Ana
Henschel Christl
Laroche Arthur
Lara Sarah
Vaz Débora
Hidalgo Custodia
Kavaliauskas Joris
Nicolas Nath
Dani Urvi
Abella Dorita
Amundsen Ingeborg
Heerkens Collin
Christensen Karl
Dara Yuvaan
Leite Patrícia
Naujokas Nedas
Gaižauskas Vigilija
Arenas Casandra
Soylu Ledün
Gül Özkutlu
Thompson Brian
Flaiano Liliana


In [65]:
# step 3
msec_duration = (after - before) * 1000
largeDict['query1'].append(round(msec_duration, 5))

In [66]:
# step 4
for i in range(0, 30):
    before = time.time()
    large_redis1.aggregate(aggRequest1)
    after = time.time()
    msec_duration = (after - before) * 1000
    largeDict['query1'].append(round(msec_duration, 5))

<h4>Query2</h4>

In [68]:
# step 1
large_redis2 = myRedis3.ft('large_index2')
#large_redis2.dropindex()
schema2 = (TextField('discipline'), TextField('courseYear'), TextField('courseName'))
index_definition = IndexDefinition(prefix = ['largeDB:'], index_type = IndexType.HASH)
large_redis2.create_index(schema2, definition = index_definition)

'OK'

In [69]:
# step 2
aggRequest2 = aggregations.AggregateRequest('@discipline:statistics @courseYear:2022').group_by('@courseName')
before = time.time()
large_query2 = large_redis2.aggregate(aggRequest2)
after = time.time()

for res in large_query2.rows:
    print(res[1])

Introduction to Probability and Data with R
Basic Statistics
Bayesian Statistics: From Concept to Data Analysis
Python and Statistics for Financial Analysis
Understanding Clinical Research: Behind the Statistics
Econometrics: Methods and Applications
Exploratory Data Analysis
Foundations: Data, Data, Everywhere
Introduction to Statistics


In [70]:
# step 3
msec_duration = (after - before) * 1000
largeDict['query2'].append(round(msec_duration, 5))

In [71]:
# step 4
for i in range(0, 30):
    before = time.time()
    large_redis2.aggregate(aggRequest2)
    after = time.time()
    msec_duration = (after - before) * 1000
    largeDict['query2'].append(round(msec_duration, 5))

<h4>Query3</h4>

In [73]:
# step 1
large_redis3 = myRedis3.ft('large_index3')
#large_redis3.dropindex()
schema3 = (TextField('materialType'), TagField('discipline'), TextField('email'), TextField('firstName'))
index_definition = IndexDefinition(prefix = ['largeDB:'], index_type = IndexType.HASH)
large_redis3.create_index(schema3, definition = index_definition)

'OK'

In [79]:
# step 2
aggRequest3 = aggregations.AggregateRequest('@discipline:{maths} @materialType:\'lecture slides\' @email:*gmail.com').group_by('@discipline', reducers.count().alias('count'))
before = time.time()
large_query3 = large_redis3.aggregate(aggRequest3)
after = time.time()

print(large_query3.rows[0][3])

2628


In [80]:
# step 3
msec_duration = (after - before) * 1000
largeDict['query3'].append(round(msec_duration, 5))

In [81]:
# step 4
for i in range(0, 30):
    before = time.time()
    large_query3 = large_redis3.aggregate(aggRequest3)
    after = time.time()
    msec_duration = (after - before) * 1000
    largeDict['query3'].append(round(msec_duration, 5))

<h4>Query4</h4>

In [83]:
# step 1
large_redis4 = myRedis3.ft('large_index4')
#large_redis4.dropindex()
schema4 = (TagField('discipline'), TagField('courseYear'), TextField('country'), TextField('dateOfBirth'), TextField('firstName'), TextField('lastName', sortable = True))
index_definition = IndexDefinition(prefix = ['largeDB:'], index_type = IndexType.HASH)
large_redis4.create_index(schema4, definition = index_definition)

'OK'

In [84]:
# step 2
aggRequest4 = aggregations.AggregateRequest('@discipline:{psychology} AND @courseYear:{2023} AND @country:*orea AND -@dateOfBirth:200*').group_by({'@firstName', '@lastName', '@country', '@dateOfBirth'}).sort_by('@lastName')
before = time.time()
large_query4 = large_redis4.aggregate(aggRequest4)
after = time.time()

for res in large_query4.rows:
    print(res[1], res[5], res[7], res[3])

castells República Popular Democrática de Corea Tere 1978-9-2
horrocks Noord-Korea Ninthe 1962-10-15
lie South Korea Cathrine 1986-7-12
real República de Corea Miguel 1987-9-12
reynolds Korea Lynda 1989-7-21
sura North Korea Raghav 1973-11-27


In [85]:
# step 3
msec_duration = (after - before) * 1000
largeDict['query4'].append(round(msec_duration, 5))

In [86]:
# step 4
for i in range(0, 30):
    before = time.time()
    large_redis4.aggregate(aggRequest4)
    after = time.time()
    msec_duration = (after - before) * 1000
    largeDict['query4'].append(round(msec_duration, 5))

In [88]:
largeDataset = {'query1' : list(), 'query2' : list(), 'query3' : list(), 'query4' : list()}
for key in largeDict:
    largeDataset[key].append(largeDict[key][0])
    mean30 = mean(largeDict[key][1 : 31])
    largeDataset[key].append(round(mean30, 5))
largeDataset

{'query1': [8.6472, 7.94671],
 'query2': [14.91308, 31.10654],
 'query3': [42.06467, 32.61566],
 'query4': [8.33488, 10.09521]}

<h3>Redis instance with 1m hashes</h3>

<h4>Query1</h4>

In [90]:
# step 1
humongous_redis1 = myRedis4.ft('humongous_index1')
#humongous_redis1.dropindex()
schema1 = (TextField('courseID'), TextField('firstName'), TextField('lastName'))
index_definition = IndexDefinition(prefix = ['humongousDB:'], index_type = IndexType.HASH)
humongous_redis1.create_index(schema1, definition = index_definition)

'OK'

In [91]:
# step 2
aggRequest1 = aggregations.AggregateRequest('@courseID:192').group_by({'@firstName', '@lastName'})
before = time.time()
humongous_query1 = humongous_redis1.aggregate(aggRequest1)
after = time.time()

for res in humongous_query1.rows:
    print(res[1], res[3])

Flaiano Liliana
Ferrán Narciso
Lie Cathrine
Nicolas Émile
Henschel Christl
Savorgnan Melania
Lara Sarah
Vaz Débora
Soylu Ledün
Miranda David
Hidalgo Custodia
Kavaliauskas Joris
Nicolas Nath
Dani Urvi
Rezende Eduardo
Abella Dorita
Webb Kristen
Schulz Torsten
Amundsen Ingeborg
Heerkens Collin
Dara Yuvaan
Christensen Karl
Leite Patrícia
Naujokas Nedas
Gaižauskas Vigilija
Karlsen Finn
Arenas Casandra
Gül Özkutlu
Real Miguel
Thompson Brian
Raju Shaan


In [92]:
# step 3
msec_duration = (after - before) * 1000
humongousDict['query1'].append(round(msec_duration, 5))

In [93]:
# step 4
for i in range(0, 30):
    before = time.time()
    humongous_redis1.aggregate(aggRequest1)
    after = time.time()
    msec_duration = (after - before) * 1000
    humongousDict['query1'].append(round(msec_duration, 5))

<h4>Query2</h4>

In [95]:
# step 1
humongous_redis2 = myRedis4.ft('humongous_index2')
#humongous_redis2.dropindex()
schema2 = (TextField('discipline'), TextField('courseYear'), TextField('courseName'))
index_definition = IndexDefinition(prefix = ['humongousDB:'], index_type = IndexType.HASH)
humongous_redis2.create_index(schema2, definition = index_definition)

'OK'

In [96]:
# step 2
aggRequest2 = aggregations.AggregateRequest('@discipline:statistics @courseYear:2022').group_by('@courseName')
before = time.time()
humongous_query2 = humongous_redis2.aggregate(aggRequest2)
after = time.time()

for res in humongous_query2.rows:
    print(res[1])

Introduction to Probability and Data with R
Basic Statistics
Bayesian Statistics: From Concept to Data Analysis
Python and Statistics for Financial Analysis
Understanding Clinical Research: Behind the Statistics
Econometrics: Methods and Applications
Exploratory Data Analysis
Foundations: Data, Data, Everywhere
Introduction to Statistics


In [97]:
# step 3
msec_duration = (after - before) * 1000
humongousDict['query2'].append(round(msec_duration, 5))

In [98]:
# step 4
for i in range(0, 30):
    before = time.time()
    humongous_redis2.aggregate(aggRequest2)
    after = time.time()
    msec_duration = (after - before) * 1000
    humongousDict['query2'].append(round(msec_duration, 5))

<h4>Query3</h4>

In [100]:
# step 1
humongous_redis3 = myRedis4.ft('humongous_index3')
#humongous_redis3.dropindex()
schema3 = (TextField('materialType'), TagField('discipline'), TextField('email'), TextField('firstName'))
index_definition = IndexDefinition(prefix = ['humongousDB:'], index_type = IndexType.HASH)
humongous_redis3.create_index(schema3, definition = index_definition)

'OK'

In [106]:
# step 2
aggRequest3 = aggregations.AggregateRequest('@discipline:{maths} @materialType:\'lecture slides\' @email:*gmail.com').group_by('@discipline', reducers.count().alias('count'))
before = time.time()
humongous_query3 = humongous_redis3.aggregate(aggRequest3)
after = time.time()

print(humongous_query3.rows[0][3])

3498


In [107]:
# step 3
msec_duration = (after - before) * 1000
humongousDict['query3'].append(round(msec_duration, 5))

In [109]:
# step 4
for i in range(0, 30):
    before = time.time()
    humongous_query3 = humongous_redis3.aggregate(aggRequest3)
    after = time.time()
    msec_duration = (after - before) * 1000
    humongousDict['query3'].append(round(msec_duration, 5))

<h4>Query4</h4>

In [111]:
# step 1
humongous_redis4 = myRedis4.ft('humongous_index4')
#humongous_redis4.dropindex()
schema4 = (TagField('discipline'), TagField('courseYear'), TextField('country'), TextField('dateOfBirth'), TextField('firstName'), TextField('lastName', sortable = True))
index_definition = IndexDefinition(prefix = ['humongousDB:'], index_type = IndexType.HASH)
humongous_redis4.create_index(schema4, definition = index_definition)

'OK'

In [112]:
# step 2
aggRequest4 = aggregations.AggregateRequest('@discipline:{psychology} AND @courseYear:{2023} AND @country:*orea AND -@dateOfBirth:200*').group_by({'@firstName', '@lastName', '@country', '@dateOfBirth'}).sort_by('@lastName')
before = time.time()
humongous_query4 = humongous_redis4.aggregate(aggRequest4)
after = time.time()

for res in humongous_query4.rows:
    print(res[1], res[5], res[7], res[3])

castells República Popular Democrática de Corea Tere 1978-9-2
gailys Korea Leila 1998-9-7
lie South Korea Cathrine 1986-7-12
real República de Corea Miguel 1987-9-12
reynolds Korea Lynda 1989-7-21
sura North Korea Raghav 1973-11-27


In [113]:
# step 3
msec_duration = (after - before) * 1000
humongousDict['query4'].append(round(msec_duration, 5))

In [114]:
# step 4
for i in range(0, 30):
    before = time.time()
    humongous_redis4.aggregate(aggRequest4)
    after = time.time()
    msec_duration = (after - before) * 1000
    humongousDict['query4'].append(round(msec_duration, 5))

In [116]:
humongousDataset = {'query1' : list(), 'query2' : list(), 'query3' : list(), 'query4' : list()}
for key in humongousDict:
    humongousDataset[key].append(humongousDict[key][0])
    mean30 = mean(humongousDict[key][1 : 31])
    humongousDataset[key].append(round(mean30, 5))
humongousDataset

{'query1': [6.90603, 6.4638],
 'query2': [19.38391, 24.00498],
 'query3': [60.08697, 45.49086],
 'query4': [6.71697, 9.386]}

In [117]:
with open('redis_tests.csv', 'w', newline = '') as redis_tests:
    writer = csv.writer(redis_tests, delimiter = ',')
    keys = smallDict.keys()
    limit = len(smallDict['query1'])
    
    writer.writerow(keys)
    writer.writerow('s') # s stands for small dataset
    for i in range(0, limit):
        writer.writerow(smallDict[k][i] for k in keys)
    writer.writerow('m')  # m stands for medium dataset
    for i in range(0, limit):
        writer.writerow(mediumDict[k][i] for k in keys)
    writer.writerow('l') # l stands for large dataset
    for i in range(0, limit):
        writer.writerow(largeDict[k][i] for k in keys)
    writer.writerow('h') # h stands for humongous dataset
    for i in range(0, limit):
        writer.writerow(humongousDict[k][i] for k in keys)