<center>
    <h2>Online learning platform database - Neo4j</h2>
    <h3>Performing the queries and storing the queries execution time</h3>
</center>

<h3>Python - Neo4j interaction</h3>

Prior to performing the queries, I import the required modules (the <i>Neo4j</i> Python driver and the <i>time</i> and <i>csv</i> modules) and set four driver objects to which I pass the parameters to establish a connection to each of the four Neo4j instances running in Docker.

In [None]:
# import modules
from neo4j import GraphDatabase     # Neo4j driver
import time                         # time-related functions to register query execution times
import csv                          # read and write csv files

# create driver object
driver = GraphDatabase.driver(uri = 'neo4j://127.0.0.1:7687', auth = ('neo4j', 'myPassword'))
driver2 = GraphDatabase.driver(uri = 'neo4j://127.0.0.1:7692', auth = ('neo4j', 'myPassword'))
driver3 = GraphDatabase.driver(uri = 'neo4j://127.0.0.1:7693', auth = ('neo4j', 'myPassword'))
driver4 = GraphDatabase.driver(uri = 'neo4j://127.0.0.1:7694', auth = ('neo4j', 'myPassword'))

<h3>Query the Neo4j instances</h3>

I create a dictionary of lists for each of the four Neo4j instances. In these dictionaries the keys are the query names and the values are the 31 query execution times: in fact I attach the value of the query execution time of the most recent query to the list. Since query execution times are required in milliseconds, prior to attaching them, I multiply them by 1000 and round them to the fifth decimal precision.
As previously explained, the above summarized actions (for each of the four queries on each of the four instances) are performed by following a standard succession of steps. Each step is encapsulated within a notebook cell (so each query is performed 31 times by using three notebook cells), as follows:
 - step 1: define the query as a string and pass it together with the <code>database</code> parameter to the <code>execute_query()</code> method of the driver object, while contextually registering the time before and after the query execution. Finally print query results;
 - step 2: compute execution time of the first query execution and store it within the corresponding dictionary list;
 - step 3: [thirty times] perform query execution while creating prior and following timestamps, compute execution time and store it within the corresponding dictionary list.

For each Neo4j instance, after having performed the four queries, I will finally compute the mean of the query executions from step 3. Together with the first query execution, this mean value will be stored into a new dictionary, specific to a dataset. Originally, I would use these four new dictionaries to save the query execution times into a csv file for constructing histograms. I later resolved to save all the 31 recorded query execution times and pass them all to Microsoft© Excel to process them.

N.B.:
Contrary to what has been introduced when presenting the methodologies used for accessing and querying Neo4j, since I don't need to use the information contained in the <i>summary</i> object to get the query execution time, I am only interested in the query result records for showing them. I can use dot notation on the query object to access the query records (a query object has a <code>records</code> attribute). I could use the index <code>0</code> on the query object since the records are the first element in a list. By iterating on the records object and applying the <code>data()</code> method during iteration I can turn the records elements into key-value pairs (a dictionary, actually).

In [89]:
smallDict = {'query1' : list(), 'query2' : list(), 'query3' : list(), 'query4' : list()}
mediumDict = {'query1' : list(), 'query2' : list(), 'query3' : list(), 'query4' : list()}
largeDict = {'query1' : list(), 'query2' : list(), 'query3' : list(), 'query4' : list()}
humongousDict = {'query1' : list(), 'query2' : list(), 'query3' : list(), 'query4' : list()}

In [81]:
# mean function
def mean(aList):
    n = len(aList)
    sum = 0
    for value in aList:
        sum += value
    return sum / n

<h3>Neo4j instance with 250k hashes</h3>

I start with the smallest Neo4j instance.

<h4>Query 1</h4>

In [98]:
# step 1
small_neo4j1 = 'MATCH (s:student) -[:is_enrolled]-> (c:course) WHERE c.courseID = \'192\' RETURN s.firstName, s.lastName'

before = time.time()
small_query1 = driver.execute_query(small_neo4j1, database = 'neo4j')
after = time.time()

records = small_query1.records
for record in records:
    print(record.data()['s.firstName'], record.data()['s.lastName'],)

Custodia Hidalgo
Ledün Soylu
Nath Nicolas
Ana Narušis
Émile Nicolas
Sarah Lara
Patrícia Leite
Cathrine Lie
Arthur Laroche
Casandra Arenas
Narciso Ferrán
Vigilija Gaižauskas
Ingeborg Amundsen


In [91]:
# step 2
msec_duration = (after - before) * 1000
smallDict['query1'].append(round(msec_duration, 5))

In [92]:
# step 3
for i in range(0, 30):
    before = time.time()
    driver.execute_query(small_neo4j1, database = 'neo4j')
    after = time.time()
    msec_duration = (after - before) * 1000
    smallDict['query1'].append(round(msec_duration, 5))

In [93]:
smallDict

{'query1': [52.85406,
  16.70313,
  15.98001,
  27.03428,
  13.44514,
  11.38997,
  9.08017,
  9.57918,
  9.8598,
  12.26807,
  9.75084,
  20.99299,
  9.52601,
  7.74121,
  10.12492,
  8.45695,
  8.03399,
  7.53999,
  11.73592,
  10.29992,
  8.84414,
  11.32393,
  8.93307,
  8.61502,
  9.1269,
  8.71205,
  8.27789,
  7.74813,
  8.43811,
  8.33821,
  8.35586],
 'query2': [],
 'query3': [],
 'query4': []}

<h4>Query 2</h4>

In [94]:
# step 1
small_neo4j2 = 'MATCH(c:course) WHERE c.discipline = \'statistics\' AND c.courseYear = \'2022\' RETURN c.courseName'

before = time.time()
small_query2 = driver.execute_query(small_neo4j2, database = 'neo4j')
after = time.time()

records = small_query2.records
for record in records:
    print(record.data()['c.courseName'])

Econometrics: Methods and Applications
Exploratory Data Analysis
Understanding Clinical Research: Behind the Statistics
Introduction to Probability and Data with R
Bayesian Statistics: From Concept to Data Analysis
Introduction to Statistics
Python and Statistics for Financial Analysis
Basic Statistics
Foundations: Data, Data, Everywhere


In [95]:
# step 2
msec_duration = (after - before) * 1000
smallDict['query2'].append(round(msec_duration, 5))

In [96]:
# step 3
for i in range(0, 30):
    before = time.time()
    driver.execute_query(small_neo4j2, database = 'neo4j')
    after = time.time()
    msec_duration = (after - before) * 1000
    smallDict['query2'].append(round(msec_duration, 5))

In [97]:
smallDict

{'query1': [52.85406,
  16.70313,
  15.98001,
  27.03428,
  13.44514,
  11.38997,
  9.08017,
  9.57918,
  9.8598,
  12.26807,
  9.75084,
  20.99299,
  9.52601,
  7.74121,
  10.12492,
  8.45695,
  8.03399,
  7.53999,
  11.73592,
  10.29992,
  8.84414,
  11.32393,
  8.93307,
  8.61502,
  9.1269,
  8.71205,
  8.27789,
  7.74813,
  8.43811,
  8.33821,
  8.35586],
 'query2': [21.36397,
  20.07294,
  17.77291,
  19.6979,
  15.54513,
  10.71,
  9.23896,
  8.56018,
  7.73001,
  14.97197,
  8.28695,
  8.30197,
  8.69179,
  8.27003,
  9.57799,
  9.25684,
  8.80218,
  8.51989,
  8.219,
  8.1172,
  8.06403,
  8.61907,
  7.99108,
  8.00729,
  9.07397,
  8.40282,
  8.18586,
  9.42898,
  8.66604,
  8.11815,
  7.586],
 'query3': [],
 'query4': []}

<h4>Query 3</h4>

In [107]:
# step 1
small_neo4j3 = 'MATCH (c:course {discipline: \'maths\'}) <-[:is_enrolled]- (s:student) -[:studies]-> (m:material {mType: \'lecture slides\'}) <-[:uses]- (c:course) WHERE right(s.email, 9) = \'gmail.com\' RETURN COUNT(m);'

before = time.time()
small_query3 = driver.execute_query(small_neo4j3, database = 'neo4j')
after = time.time()

records = small_query3.records
for record in records:
    print(record.data()['COUNT(m)'])

838


In [108]:
# step 2
msec_duration = (after - before) * 1000
smallDict['query3'].append(round(msec_duration, 5))

In [109]:
# step 3
for i in range(0, 30):
    before = time.time()
    driver.execute_query(small_neo4j3, database = 'neo4j')
    after = time.time()
    msec_duration = (after - before) * 1000
    smallDict['query3'].append(round(msec_duration, 5))

<h4>Query 4</h4>

In [103]:
# step 1
small_neo4j4 = 'MATCH (s:student) -[:is_enrolled]-> (c:course {discipline: \'psychology\'}) WHERE right(s.country, 4) = \'orea\' AND left(s.dob, 1) <> \'2\' RETURN DISTINCT s.firstName, s.lastName, s.country ORDER BY s.lastName;'

before = time.time()
small_query4 = driver.execute_query(small_neo4j4, database = 'neo4j')
after = time.time()

records = small_query4.records
for record in records:
    print(record.data()['s.firstName'], record.data()['s.lastName'], record.data()['s.country'])

Cathrine Lie South Korea
Lynda Reynolds Korea
Raghav Sura North Korea


In [104]:
# step 2
msec_duration = (after - before) * 1000
smallDict['query4'].append(round(msec_duration, 5))

In [105]:
# step 3
for i in range(0, 30):
    before = time.time()
    driver.execute_query(small_neo4j4, database = 'neo4j')
    after = time.time()
    msec_duration = (after - before) * 1000
    smallDict['query4'].append(round(msec_duration, 5))

In [110]:
smallDict

{'query1': [52.85406,
  16.70313,
  15.98001,
  27.03428,
  13.44514,
  11.38997,
  9.08017,
  9.57918,
  9.8598,
  12.26807,
  9.75084,
  20.99299,
  9.52601,
  7.74121,
  10.12492,
  8.45695,
  8.03399,
  7.53999,
  11.73592,
  10.29992,
  8.84414,
  11.32393,
  8.93307,
  8.61502,
  9.1269,
  8.71205,
  8.27789,
  7.74813,
  8.43811,
  8.33821,
  8.35586],
 'query2': [21.36397,
  20.07294,
  17.77291,
  19.6979,
  15.54513,
  10.71,
  9.23896,
  8.56018,
  7.73001,
  14.97197,
  8.28695,
  8.30197,
  8.69179,
  8.27003,
  9.57799,
  9.25684,
  8.80218,
  8.51989,
  8.219,
  8.1172,
  8.06403,
  8.61907,
  7.99108,
  8.00729,
  9.07397,
  8.40282,
  8.18586,
  9.42898,
  8.66604,
  8.11815,
  7.586],
 'query3': [97.00489,
  47.47319,
  55.26805,
  30.30396,
  18.94617,
  25.069,
  22.98999,
  15.56993,
  15.136,
  18.32294,
  26.66402,
  20.37597,
  15.25402,
  14.50014,
  12.85505,
  15.02204,
  20.55216,
  15.50817,
  12.85481,
  33.4599,
  13.94987,
  14.82797,
  15.23304,
  14.69