<center>
    <h2>Online learning platform database - Neo4j</h2>
    <h3>Performing the queries and storing the queries execution time</h3>
</center>

<h3>Python - Neo4j interaction</h3>

Prior to performing the queries, I import the required modules (the <i>Neo4j</i> Python driver and the <i>time</i> and <i>csv</i> modules) and set four driver objects to which I pass the parameters to establish a connection to each of the four Neo4j instances running in Docker.

In [1]:
# import modules
from neo4j import GraphDatabase     # Neo4j driver
import time                         # time-related functions to register query execution times
import csv                          # read and write csv files

# create driver object
driver = GraphDatabase.driver(uri = 'neo4j://127.0.0.1:7687', auth = ('neo4j', 'myPassword'))
driver2 = GraphDatabase.driver(uri = 'neo4j://127.0.0.1:7692', auth = ('neo4j', 'myPassword'))
driver3 = GraphDatabase.driver(uri = 'neo4j://127.0.0.1:7693', auth = ('neo4j', 'myPassword'))
driver4 = GraphDatabase.driver(uri = 'neo4j://127.0.0.1:7694', auth = ('neo4j', 'myPassword'))

<h3>Query the Neo4j instances</h3>

I create a dictionary of lists for each of the four Neo4j instances. In these dictionaries the keys are the query names and the values are the 31 query execution times: in fact I attach the value of the query execution time of the most recent query to the list. Since query execution times are required in milliseconds, prior to attaching them, I multiply them by 1000 and round them to the fifth decimal precision.
As previously explained, the above summarized actions (for each of the four queries on each of the four instances) are performed by following a standard succession of steps. Each step is encapsulated within a notebook cell (so each query is performed 31 times by using three notebook cells), as follows:
 - step 1: define the query as a string and pass it together with the <code>database</code> parameter to the <code>execute_query()</code> method of the driver object, while contextually registering the time before and after the query execution. Finally print query results;
 - step 2: compute execution time of the first query execution and store it within the corresponding dictionary list;
 - step 3: [thirty times] perform query execution while creating prior and following timestamps, compute execution time and store it within the corresponding dictionary list.

For each Neo4j instance, after having performed the four queries, I will finally compute the mean of the query executions from step 3. Together with the first query execution, this mean value will be stored into a new dictionary, specific to a dataset. Originally, I would use these four new dictionaries to save the query execution times into a csv file for constructing histograms. I later resolved to save all the 31 recorded query execution times and pass them all to Microsoft© Excel to process them.

N.B.:
Contrary to what has been introduced when presenting the methodologies used for accessing and querying Neo4j, since I don't need to use the information contained in the <i>summary</i> object to get the query execution time, I am only interested in the query result records for showing them. I can use dot notation on the query object to access the query records (a query object has a <code>records</code> attribute). I could use the index <code>0</code> on the query object since the records are the first element in a list. By iterating on the records object and applying the <code>data()</code> method during iteration I can turn the records elements into key-value pairs (a dictionary, actually).

In [2]:
smallDict = {'query1' : list(), 'query2' : list(), 'query3' : list(), 'query4' : list()}
mediumDict = {'query1' : list(), 'query2' : list(), 'query3' : list(), 'query4' : list()}
largeDict = {'query1' : list(), 'query2' : list(), 'query3' : list(), 'query4' : list()}
humongousDict = {'query1' : list(), 'query2' : list(), 'query3' : list(), 'query4' : list()}

In [3]:
# mean function
def mean(aList):
    n = len(aList)
    sum = 0
    for value in aList:
        sum += value
    return sum / n

<h3>Neo4j instance with 250k hashes</h3>

I start with the smallest Neo4j instance.

<h4>Query 1</h4>

In [5]:
# step 1
small_neo4j1 = 'MATCH (s:student) -[:is_enrolled]-> (c:course) WHERE c.courseID = \'192\' RETURN s.firstName, s.lastName'

before = time.time()
small_query1 = driver.execute_query(small_neo4j1, database = 'neo4j')
after = time.time()

records = small_query1.records
for record in records:
    print(record.data()['s.firstName'], record.data()['s.lastName'],)

Custodia Hidalgo
Ledün Soylu
Nath Nicolas
Ana Narušis
Émile Nicolas
Sarah Lara
Patrícia Leite
Cathrine Lie
Arthur Laroche
Casandra Arenas
Narciso Ferrán
Vigilija Gaižauskas
Ingeborg Amundsen


In [6]:
# step 2
msec_duration = (after - before) * 1000
smallDict['query1'].append(round(msec_duration, 5))

In [7]:
# step 3
for i in range(0, 30):
    before = time.time()
    driver.execute_query(small_neo4j1, database = 'neo4j')
    after = time.time()
    msec_duration = (after - before) * 1000
    smallDict['query1'].append(round(msec_duration, 5))

In [8]:
smallDict

{'query1': [5489.99786,
  61.99098,
  39.325,
  47.12009,
  42.66763,
  27.58408,
  25.22206,
  23.50092,
  27.25601,
  22.06802,
  34.74927,
  19.09304,
  18.48698,
  17.29774,
  22.18509,
  20.05482,
  24.37592,
  18.41187,
  19.12689,
  19.3522,
  17.71808,
  16.66713,
  14.21213,
  15.78188,
  17.79509,
  20.87688,
  18.40186,
  21.54493,
  24.38712,
  19.82117,
  19.15503],
 'query2': [],
 'query3': [],
 'query4': []}

<h4>Query 2</h4>

In [9]:
# step 1
small_neo4j2 = 'MATCH(c:course) WHERE c.discipline = \'statistics\' AND c.courseYear = \'2022\' RETURN c.courseName'

before = time.time()
small_query2 = driver.execute_query(small_neo4j2, database = 'neo4j')
after = time.time()

records = small_query2.records
for record in records:
    print(record.data()['c.courseName'])

Econometrics: Methods and Applications
Exploratory Data Analysis
Understanding Clinical Research: Behind the Statistics
Introduction to Probability and Data with R
Bayesian Statistics: From Concept to Data Analysis
Introduction to Statistics
Python and Statistics for Financial Analysis
Basic Statistics
Foundations: Data, Data, Everywhere


In [10]:
# step 2
msec_duration = (after - before) * 1000
smallDict['query2'].append(round(msec_duration, 5))

In [11]:
# step 3
for i in range(0, 30):
    before = time.time()
    driver.execute_query(small_neo4j2, database = 'neo4j')
    after = time.time()
    msec_duration = (after - before) * 1000
    smallDict['query2'].append(round(msec_duration, 5))

In [12]:
smallDict

{'query1': [5489.99786,
  61.99098,
  39.325,
  47.12009,
  42.66763,
  27.58408,
  25.22206,
  23.50092,
  27.25601,
  22.06802,
  34.74927,
  19.09304,
  18.48698,
  17.29774,
  22.18509,
  20.05482,
  24.37592,
  18.41187,
  19.12689,
  19.3522,
  17.71808,
  16.66713,
  14.21213,
  15.78188,
  17.79509,
  20.87688,
  18.40186,
  21.54493,
  24.38712,
  19.82117,
  19.15503],
 'query2': [418.41102,
  27.74596,
  39.51097,
  21.12699,
  15.9111,
  15.93494,
  15.37609,
  20.39719,
  29.7482,
  19.72985,
  27.58718,
  15.83409,
  13.80086,
  12.52127,
  14.91094,
  13.86189,
  13.87501,
  13.00669,
  13.0291,
  15.30409,
  17.38095,
  14.43791,
  14.47678,
  17.9131,
  13.71598,
  14.62603,
  13.79895,
  14.51182,
  14.15801,
  16.29806,
  15.21301],
 'query3': [],
 'query4': []}

<h4>Query 3</h4>

In [13]:
# step 1
small_neo4j3 = 'MATCH (c:course {discipline: \'maths\'}) <-[:is_enrolled]- (s:student) -[:studies]-> (m:material {mType: \'lecture slides\'}) <-[:uses]- (c:course) WHERE right(s.email, 9) = \'gmail.com\' RETURN COUNT(m);'

before = time.time()
small_query3 = driver.execute_query(small_neo4j3, database = 'neo4j')
after = time.time()

records = small_query3.records
for record in records:
    print(record.data()['COUNT(m)'])

838


In [14]:
# step 2
msec_duration = (after - before) * 1000
smallDict['query3'].append(round(msec_duration, 5))

In [15]:
# step 3
for i in range(0, 30):
    before = time.time()
    driver.execute_query(small_neo4j3, database = 'neo4j')
    after = time.time()
    msec_duration = (after - before) * 1000
    smallDict['query3'].append(round(msec_duration, 5))

In [16]:
smallDict

{'query1': [5489.99786,
  61.99098,
  39.325,
  47.12009,
  42.66763,
  27.58408,
  25.22206,
  23.50092,
  27.25601,
  22.06802,
  34.74927,
  19.09304,
  18.48698,
  17.29774,
  22.18509,
  20.05482,
  24.37592,
  18.41187,
  19.12689,
  19.3522,
  17.71808,
  16.66713,
  14.21213,
  15.78188,
  17.79509,
  20.87688,
  18.40186,
  21.54493,
  24.38712,
  19.82117,
  19.15503],
 'query2': [418.41102,
  27.74596,
  39.51097,
  21.12699,
  15.9111,
  15.93494,
  15.37609,
  20.39719,
  29.7482,
  19.72985,
  27.58718,
  15.83409,
  13.80086,
  12.52127,
  14.91094,
  13.86189,
  13.87501,
  13.00669,
  13.0291,
  15.30409,
  17.38095,
  14.43791,
  14.47678,
  17.9131,
  13.71598,
  14.62603,
  13.79895,
  14.51182,
  14.15801,
  16.29806,
  15.21301],
 'query3': [1242.91325,
  68.9621,
  85.43992,
  62.51121,
  37.5371,
  35.65073,
  36.58605,
  31.99315,
  35.46286,
  30.77984,
  36.39507,
  31.90088,
  38.03611,
  38.8062,
  44.16895,
  45.27116,
  32.5439,
  37.34016,
  24.56403,
  

<h4>Query 4</h4>

In [17]:
# step 1
small_neo4j4 = 'MATCH (s:student) -[:is_enrolled]-> (c:course {discipline: \'psychology\'}) WHERE right(s.country, 4) = \'orea\' AND left(s.dob, 1) <> \'2\' RETURN DISTINCT s.firstName, s.lastName, s.country ORDER BY s.lastName;'

before = time.time()
small_query4 = driver.execute_query(small_neo4j4, database = 'neo4j')
after = time.time()

records = small_query4.records
for record in records:
    print(record.data()['s.firstName'], record.data()['s.lastName'], record.data()['s.country'])

Cathrine Lie South Korea
Lynda Reynolds Korea
Raghav Sura North Korea


In [18]:
# step 2
msec_duration = (after - before) * 1000
smallDict['query4'].append(round(msec_duration, 5))

In [19]:
# step 3
for i in range(0, 30):
    before = time.time()
    driver.execute_query(small_neo4j4, database = 'neo4j')
    after = time.time()
    msec_duration = (after - before) * 1000
    smallDict['query4'].append(round(msec_duration, 5))

In [20]:
smallDict

{'query1': [5489.99786,
  61.99098,
  39.325,
  47.12009,
  42.66763,
  27.58408,
  25.22206,
  23.50092,
  27.25601,
  22.06802,
  34.74927,
  19.09304,
  18.48698,
  17.29774,
  22.18509,
  20.05482,
  24.37592,
  18.41187,
  19.12689,
  19.3522,
  17.71808,
  16.66713,
  14.21213,
  15.78188,
  17.79509,
  20.87688,
  18.40186,
  21.54493,
  24.38712,
  19.82117,
  19.15503],
 'query2': [418.41102,
  27.74596,
  39.51097,
  21.12699,
  15.9111,
  15.93494,
  15.37609,
  20.39719,
  29.7482,
  19.72985,
  27.58718,
  15.83409,
  13.80086,
  12.52127,
  14.91094,
  13.86189,
  13.87501,
  13.00669,
  13.0291,
  15.30409,
  17.38095,
  14.43791,
  14.47678,
  17.9131,
  13.71598,
  14.62603,
  13.79895,
  14.51182,
  14.15801,
  16.29806,
  15.21301],
 'query3': [1242.91325,
  68.9621,
  85.43992,
  62.51121,
  37.5371,
  35.65073,
  36.58605,
  31.99315,
  35.46286,
  30.77984,
  36.39507,
  31.90088,
  38.03611,
  38.8062,
  44.16895,
  45.27116,
  32.5439,
  37.34016,
  24.56403,
  

In [21]:
smallDataset = {'query1' : list(), 'query2' : list(), 'query3' : list(), 'query4' : list()}
for key in smallDict:
    smallDataset[key].append(smallDict[key][0])
    mean30 = mean(smallDict[key][1 : 31])
    smallDataset[key].append(round(mean30, 5))
smallDataset

{'query1': [5489.99786, 24.541],
 'query2': [418.41102, 17.52477],
 'query3': [1242.91325, 39.00357],
 'query4': [449.72205, 14.5945]}

<h3>Neo4j instance with 500k hashes</h3>

<h4>Query 1</h4>

In [22]:
# step 1
medium_neo4j1 = 'MATCH (s:student) -[:is_enrolled]-> (c:course) WHERE c.courseID = \'192\' RETURN s.firstName, s.lastName'

before = time.time()
medium_query1 = driver2.execute_query(medium_neo4j1, database = 'neo4j')
after = time.time()

records = medium_query1.records
for record in records:
    print(record.data()['s.firstName'], record.data()['s.lastName'],)

Custodia Hidalgo
Ledün Soylu
Nath Nicolas
Sarah Lara
Karl Christensen
Christl Henschel
Casandra Arenas
Narciso Ferrán
Vigilija Gaižauskas
Nedas Naujokas
Ana Narušis
Émile Nicolas
Joris Kavaliauskas
Yuvaan Dara
Patrícia Leite
Cathrine Lie
Arthur Laroche
Ingeborg Amundsen
Débora Vaz
Miguel Real


In [23]:
# step 2
msec_duration = (after - before) * 1000
mediumDict['query1'].append(round(msec_duration, 5))

In [24]:
# step 3
for i in range(0, 30):
    before = time.time()
    driver2.execute_query(medium_neo4j1, database = 'neo4j')
    after = time.time()
    msec_duration = (after - before) * 1000
    mediumDict['query1'].append(round(msec_duration, 5))

In [25]:
mediumDict

{'query1': [5931.72693,
  68.09807,
  43.63704,
  34.96695,
  29.41203,
  26.83902,
  22.995,
  17.44795,
  20.20907,
  22.19391,
  36.13091,
  19.34195,
  20.31398,
  17.07315,
  17.83705,
  17.75098,
  17.9069,
  14.66393,
  19.72103,
  20.25199,
  17.94791,
  20.53118,
  47.73498,
  28.41496,
  20.54787,
  21.09098,
  19.05608,
  20.98918,
  20.18905,
  17.34018,
  18.70394],
 'query2': [],
 'query3': [],
 'query4': []}

<h4>Query 2</h4>

In [26]:
# step 1
medium_neo4j2 = 'MATCH(c:course) WHERE c.discipline = \'statistics\' AND c.courseYear = \'2022\' RETURN c.courseName'

before = time.time()
medium_query2 = driver2.execute_query(medium_neo4j2, database = 'neo4j')
after = time.time()

records = medium_query2.records
for record in records:
    print(record.data()['c.courseName'])

Econometrics: Methods and Applications
Exploratory Data Analysis
Understanding Clinical Research: Behind the Statistics
Introduction to Probability and Data with R
Bayesian Statistics: From Concept to Data Analysis
Introduction to Statistics
Python and Statistics for Financial Analysis
Basic Statistics
Foundations: Data, Data, Everywhere


In [27]:
# step 2
msec_duration = (after - before) * 1000
mediumDict['query2'].append(round(msec_duration, 5))

In [28]:
# step 3
for i in range(0, 30):
    before = time.time()
    driver2.execute_query(medium_neo4j2, database = 'neo4j')
    after = time.time()
    msec_duration = (after - before) * 1000
    mediumDict['query2'].append(round(msec_duration, 5))

In [29]:
mediumDict

{'query1': [5931.72693,
  68.09807,
  43.63704,
  34.96695,
  29.41203,
  26.83902,
  22.995,
  17.44795,
  20.20907,
  22.19391,
  36.13091,
  19.34195,
  20.31398,
  17.07315,
  17.83705,
  17.75098,
  17.9069,
  14.66393,
  19.72103,
  20.25199,
  17.94791,
  20.53118,
  47.73498,
  28.41496,
  20.54787,
  21.09098,
  19.05608,
  20.98918,
  20.18905,
  17.34018,
  18.70394],
 'query2': [276.00718,
  55.47118,
  64.97097,
  29.74319,
  16.9661,
  18.83411,
  16.51096,
  18.66078,
  17.93933,
  19.37723,
  26.30472,
  17.78913,
  15.92398,
  15.064,
  18.78595,
  15.77711,
  16.21604,
  15.05113,
  15.69676,
  26.85523,
  15.3389,
  16.45112,
  15.56897,
  16.56413,
  18.15915,
  22.98808,
  16.66093,
  13.93914,
  15.5611,
  17.14778,
  13.14402],
 'query3': [],
 'query4': []}

<h4>Query 3</h4>

In [30]:
# step 1
medium_neo4j3 = 'MATCH (c:course {discipline: \'maths\'}) <-[:is_enrolled]- (s:student) -[:studies]-> (m:material {mType: \'lecture slides\'}) <-[:uses]- (c:course) WHERE right(s.email, 9) = \'gmail.com\' RETURN COUNT(m);'

before = time.time()
medium_query3 = driver2.execute_query(medium_neo4j3, database = 'neo4j')
after = time.time()

records = medium_query3.records
for record in records:
    print(record.data()['COUNT(m)'])

1698


In [31]:
# step 2
msec_duration = (after - before) * 1000
mediumDict['query3'].append(round(msec_duration, 5))

In [32]:
# step 3
for i in range(0, 30):
    before = time.time()
    driver2.execute_query(medium_neo4j3, database = 'neo4j')
    after = time.time()
    msec_duration = (after - before) * 1000
    mediumDict['query3'].append(round(msec_duration, 5))

In [33]:
mediumDict

{'query1': [5931.72693,
  68.09807,
  43.63704,
  34.96695,
  29.41203,
  26.83902,
  22.995,
  17.44795,
  20.20907,
  22.19391,
  36.13091,
  19.34195,
  20.31398,
  17.07315,
  17.83705,
  17.75098,
  17.9069,
  14.66393,
  19.72103,
  20.25199,
  17.94791,
  20.53118,
  47.73498,
  28.41496,
  20.54787,
  21.09098,
  19.05608,
  20.98918,
  20.18905,
  17.34018,
  18.70394],
 'query2': [276.00718,
  55.47118,
  64.97097,
  29.74319,
  16.9661,
  18.83411,
  16.51096,
  18.66078,
  17.93933,
  19.37723,
  26.30472,
  17.78913,
  15.92398,
  15.064,
  18.78595,
  15.77711,
  16.21604,
  15.05113,
  15.69676,
  26.85523,
  15.3389,
  16.45112,
  15.56897,
  16.56413,
  18.15915,
  22.98808,
  16.66093,
  13.93914,
  15.5611,
  17.14778,
  13.14402],
 'query3': [1381.90222,
  95.52097,
  43.06221,
  49.02887,
  40.16209,
  47.76406,
  42.09995,
  39.33477,
  43.90907,
  40.95101,
  68.295,
  36.95273,
  39.50906,
  38.29908,
  36.65686,
  41.01515,
  38.78593,
  31.67105,
  33.92196,
 

<h4>Query 4</h4>

In [34]:
# step 1
medium_neo4j4 = 'MATCH (s:student) -[:is_enrolled]-> (c:course {discipline: \'psychology\'}) WHERE right(s.country, 4) = \'orea\' AND left(s.dob, 1) <> \'2\' RETURN DISTINCT s.firstName, s.lastName, s.country ORDER BY s.lastName;'

before = time.time()
medium_query4 = driver2.execute_query(medium_neo4j4, database = 'neo4j')
after = time.time()

records = medium_query4.records
for record in records:
    print(record.data()['s.firstName'], record.data()['s.lastName'], record.data()['s.country'])

Ninthe Horrocks Noord-Korea
Cathrine Lie South Korea
Miguel Real República de Corea
Lynda Reynolds Korea
Raghav Sura North Korea


In [35]:
# step 2
msec_duration = (after - before) * 1000
mediumDict['query4'].append(round(msec_duration, 5))

In [36]:
# step 3
for i in range(0, 30):
    before = time.time()
    driver2.execute_query(medium_neo4j4, database = 'neo4j')
    after = time.time()
    msec_duration = (after - before) * 1000
    mediumDict['query4'].append(round(msec_duration, 5))

In [37]:
mediumDict

{'query1': [5931.72693,
  68.09807,
  43.63704,
  34.96695,
  29.41203,
  26.83902,
  22.995,
  17.44795,
  20.20907,
  22.19391,
  36.13091,
  19.34195,
  20.31398,
  17.07315,
  17.83705,
  17.75098,
  17.9069,
  14.66393,
  19.72103,
  20.25199,
  17.94791,
  20.53118,
  47.73498,
  28.41496,
  20.54787,
  21.09098,
  19.05608,
  20.98918,
  20.18905,
  17.34018,
  18.70394],
 'query2': [276.00718,
  55.47118,
  64.97097,
  29.74319,
  16.9661,
  18.83411,
  16.51096,
  18.66078,
  17.93933,
  19.37723,
  26.30472,
  17.78913,
  15.92398,
  15.064,
  18.78595,
  15.77711,
  16.21604,
  15.05113,
  15.69676,
  26.85523,
  15.3389,
  16.45112,
  15.56897,
  16.56413,
  18.15915,
  22.98808,
  16.66093,
  13.93914,
  15.5611,
  17.14778,
  13.14402],
 'query3': [1381.90222,
  95.52097,
  43.06221,
  49.02887,
  40.16209,
  47.76406,
  42.09995,
  39.33477,
  43.90907,
  40.95101,
  68.295,
  36.95273,
  39.50906,
  38.29908,
  36.65686,
  41.01515,
  38.78593,
  31.67105,
  33.92196,
 

In [38]:
mediumDataset = {'query1' : list(), 'query2' : list(), 'query3' : list(), 'query4' : list()}
for key in mediumDict:
    mediumDataset[key].append(mediumDict[key][0])
    mean30 = mean(mediumDict[key][1 : 31])
    mediumDataset[key].append(round(mean30, 5))
mediumDataset

{'query1': [5931.72693, 24.64457],
 'query2': [276.00718, 20.78204],
 'query3': [1381.90222, 39.59148],
 'query4': [504.84109, 19.65367]}

<h3>Neo4j instance with 750k hashes</h3>

<h4>Query 1</h4>

In [39]:
# step 1
large_neo4j1 = 'MATCH (s:student) -[:is_enrolled]-> (c:course) WHERE c.courseID = \'192\' RETURN s.firstName, s.lastName'

before = time.time()
large_query1 = driver3.execute_query(large_neo4j1, database = 'neo4j')
after = time.time()

records = large_query1.records
for record in records:
    print(record.data()['s.firstName'], record.data()['s.lastName'],)

Custodia Hidalgo
Sarah Lara
Narciso Ferrán
Patrícia Leite
Vigilija Gaižauskas
Casandra Arenas
Ledün Soylu
Arthur Laroche
Ana Narušis
Nath Nicolas
Émile Nicolas
Cathrine Lie
Ingeborg Amundsen
Nedas Naujokas
Christl Henschel
Miguel Real
Karl Christensen
Joris Kavaliauskas
Yuvaan Dara
Débora Vaz
Urvi Dani
Collin Heerkens
Brian Thompson
Özkutlu Gül
Dorita Abella
Liliana Flaiano


In [40]:
# step 2
msec_duration = (after - before) * 1000
largeDict['query1'].append(round(msec_duration, 5))

In [41]:
# step 3
for i in range(0, 30):
    before = time.time()
    driver3.execute_query(large_neo4j1, database = 'neo4j')
    after = time.time()
    msec_duration = (after - before) * 1000
    largeDict['query1'].append(round(msec_duration, 5))

In [42]:
largeDict

{'query1': [4215.55996,
  78.70698,
  53.45297,
  24.95694,
  27.54307,
  26.86,
  23.71407,
  22.05205,
  22.24684,
  22.88294,
  32.45807,
  20.17403,
  19.24586,
  18.68176,
  18.82911,
  19.21511,
  18.56303,
  17.56287,
  17.1659,
  18.63408,
  19.10901,
  16.783,
  17.061,
  15.33794,
  16.40797,
  16.80183,
  15.99193,
  15.64407,
  36.27396,
  14.68706,
  13.49497],
 'query2': [],
 'query3': [],
 'query4': []}

<h4>Query 2</h4>

In [43]:
# step 1
large_neo4j2 = 'MATCH(c:course) WHERE c.discipline = \'statistics\' AND c.courseYear = \'2022\' RETURN c.courseName'

before = time.time()
large_query2 = driver3.execute_query(large_neo4j2, database = 'neo4j')
after = time.time()

records = large_query2.records
for record in records:
    print(record.data()['c.courseName'])

Econometrics: Methods and Applications
Exploratory Data Analysis
Understanding Clinical Research: Behind the Statistics
Introduction to Probability and Data with R
Bayesian Statistics: From Concept to Data Analysis
Introduction to Statistics
Python and Statistics for Financial Analysis
Basic Statistics
Foundations: Data, Data, Everywhere


In [44]:
# step 2
msec_duration = (after - before) * 1000
largeDict['query2'].append(round(msec_duration, 5))

In [45]:
# step 3
for i in range(0, 30):
    before = time.time()
    driver3.execute_query(large_neo4j2, database = 'neo4j')
    after = time.time()
    msec_duration = (after - before) * 1000
    largeDict['query2'].append(round(msec_duration, 5))

In [46]:
largeDict

{'query1': [4215.55996,
  78.70698,
  53.45297,
  24.95694,
  27.54307,
  26.86,
  23.71407,
  22.05205,
  22.24684,
  22.88294,
  32.45807,
  20.17403,
  19.24586,
  18.68176,
  18.82911,
  19.21511,
  18.56303,
  17.56287,
  17.1659,
  18.63408,
  19.10901,
  16.783,
  17.061,
  15.33794,
  16.40797,
  16.80183,
  15.99193,
  15.64407,
  36.27396,
  14.68706,
  13.49497],
 'query2': [314.49819,
  39.15715,
  25.54107,
  22.34411,
  14.85109,
  15.27667,
  19.50121,
  18.60189,
  15.02514,
  15.5642,
  22.07994,
  14.65797,
  13.07797,
  12.44402,
  13.47804,
  14.46509,
  15.0547,
  13.29398,
  13.5529,
  13.81993,
  15.30123,
  14.08482,
  13.36384,
  14.39571,
  13.00216,
  12.05516,
  10.09822,
  11.83987,
  12.9149,
  14.88495,
  15.56683],
 'query3': [],
 'query4': []}

<h4>Query 3</h4>

In [47]:
# step 1
large_neo4j3 = 'MATCH (c:course {discipline: \'maths\'}) <-[:is_enrolled]- (s:student) -[:studies]-> (m:material {mType: \'lecture slides\'}) <-[:uses]- (c:course) WHERE right(s.email, 9) = \'gmail.com\' RETURN COUNT(m);'

before = time.time()
large_query3 = driver3.execute_query(large_neo4j3, database = 'neo4j')
after = time.time()

records = large_query3.records
for record in records:
    print(record.data()['COUNT(m)'])

2628


In [48]:
# step 2
msec_duration = (after - before) * 1000
largeDict['query3'].append(round(msec_duration, 5))

In [49]:
# step 3
for i in range(0, 30):
    before = time.time()
    driver3.execute_query(large_neo4j3, database = 'neo4j')
    after = time.time()
    msec_duration = (after - before) * 1000
    largeDict['query3'].append(round(msec_duration, 5))

In [50]:
largeDict

{'query1': [4215.55996,
  78.70698,
  53.45297,
  24.95694,
  27.54307,
  26.86,
  23.71407,
  22.05205,
  22.24684,
  22.88294,
  32.45807,
  20.17403,
  19.24586,
  18.68176,
  18.82911,
  19.21511,
  18.56303,
  17.56287,
  17.1659,
  18.63408,
  19.10901,
  16.783,
  17.061,
  15.33794,
  16.40797,
  16.80183,
  15.99193,
  15.64407,
  36.27396,
  14.68706,
  13.49497],
 'query2': [314.49819,
  39.15715,
  25.54107,
  22.34411,
  14.85109,
  15.27667,
  19.50121,
  18.60189,
  15.02514,
  15.5642,
  22.07994,
  14.65797,
  13.07797,
  12.44402,
  13.47804,
  14.46509,
  15.0547,
  13.29398,
  13.5529,
  13.81993,
  15.30123,
  14.08482,
  13.36384,
  14.39571,
  13.00216,
  12.05516,
  10.09822,
  11.83987,
  12.9149,
  14.88495,
  15.56683],
 'query3': [1225.1122,
  100.05879,
  93.817,
  50.27795,
  49.07227,
  40.24005,
  38.25688,
  28.53012,
  35.64906,
  48.3861,
  77.55804,
  42.37103,
  40.19618,
  47.94669,
  36.75699,
  33.10704,
  30.67183,
  29.32906,
  28.35894,
  27.9

<h4>Query 4</h4>

In [51]:
# step 1
large_neo4j4 = 'MATCH (s:student) -[:is_enrolled]-> (c:course {discipline: \'psychology\'}) WHERE right(s.country, 4) = \'orea\' AND left(s.dob, 1) <> \'2\' RETURN DISTINCT s.firstName, s.lastName, s.country ORDER BY s.lastName;'

before = time.time()
large_query4 = driver3.execute_query(large_neo4j4, database = 'neo4j')
after = time.time()

records = large_query4.records
for record in records:
    print(record.data()['s.firstName'], record.data()['s.lastName'], record.data()['s.country'])

Tere Castells República Popular Democrática de Corea
Ninthe Horrocks Noord-Korea
Cathrine Lie South Korea
Miguel Real República de Corea
Lynda Reynolds Korea
Raghav Sura North Korea


In [52]:
# step 2
msec_duration = (after - before) * 1000
largeDict['query4'].append(round(msec_duration, 5))

In [53]:
# step 3
for i in range(0, 30):
    before = time.time()
    driver3.execute_query(large_neo4j4, database = 'neo4j')
    after = time.time()
    msec_duration = (after - before) * 1000
    largeDict['query4'].append(round(msec_duration, 5))

In [54]:
largeDict

{'query1': [4215.55996,
  78.70698,
  53.45297,
  24.95694,
  27.54307,
  26.86,
  23.71407,
  22.05205,
  22.24684,
  22.88294,
  32.45807,
  20.17403,
  19.24586,
  18.68176,
  18.82911,
  19.21511,
  18.56303,
  17.56287,
  17.1659,
  18.63408,
  19.10901,
  16.783,
  17.061,
  15.33794,
  16.40797,
  16.80183,
  15.99193,
  15.64407,
  36.27396,
  14.68706,
  13.49497],
 'query2': [314.49819,
  39.15715,
  25.54107,
  22.34411,
  14.85109,
  15.27667,
  19.50121,
  18.60189,
  15.02514,
  15.5642,
  22.07994,
  14.65797,
  13.07797,
  12.44402,
  13.47804,
  14.46509,
  15.0547,
  13.29398,
  13.5529,
  13.81993,
  15.30123,
  14.08482,
  13.36384,
  14.39571,
  13.00216,
  12.05516,
  10.09822,
  11.83987,
  12.9149,
  14.88495,
  15.56683],
 'query3': [1225.1122,
  100.05879,
  93.817,
  50.27795,
  49.07227,
  40.24005,
  38.25688,
  28.53012,
  35.64906,
  48.3861,
  77.55804,
  42.37103,
  40.19618,
  47.94669,
  36.75699,
  33.10704,
  30.67183,
  29.32906,
  28.35894,
  27.9

In [55]:
largeDataset = {'query1' : list(), 'query2' : list(), 'query3' : list(), 'query4' : list()}
for key in largeDict:
    largeDataset[key].append(largeDict[key][0])
    mean30 = mean(largeDict[key][1 : 31])
    largeDataset[key].append(round(mean30, 5))
largeDataset

{'query1': [4215.55996, 23.35128],
 'query2': [314.49819, 15.97649],
 'query3': [1225.1122, 45.31332],
 'query4': [434.03172, 15.21886]}

<h3>Neo4j instance with 1m hashes</h3>

<h4>Query 1</h4>

In [56]:
# step 1
humongous_neo4j1 = 'MATCH (s:student) -[:is_enrolled]-> (c:course) WHERE c.courseID = \'192\' RETURN s.firstName, s.lastName'

before = time.time()
humongous_query1 = driver4.execute_query(humongous_neo4j1, database = 'neo4j')
after = time.time()

records = humongous_query1.records
for record in records:
    print(record.data()['s.firstName'], record.data()['s.lastName'],)

Custodia Hidalgo
Sarah Lara
Narciso Ferrán
Patrícia Leite
Vigilija Gaižauskas
Casandra Arenas
Ledün Soylu
Arthur Laroche
Ana Narušis
Nath Nicolas
Émile Nicolas
Cathrine Lie
Ingeborg Amundsen
Nedas Naujokas
Christl Henschel
Miguel Real
Karl Christensen
Joris Kavaliauskas
Yuvaan Dara
Débora Vaz
Urvi Dani
Collin Heerkens
Özkutlu Gül
Brian Thompson
Dorita Abella
Liliana Flaiano
Finn Karlsen
David Miranda
Torsten Schulz
Kristen Webb
Shaan Raju
Giuseppina Scarfoglio
Mamen Teruel
Eduardo Rezende
Melania Savorgnan


In [57]:
# step 2
msec_duration = (after - before) * 1000
humongousDict['query1'].append(round(msec_duration, 5))

In [58]:
# step 3
for i in range(0, 30):
    before = time.time()
    driver4.execute_query(humongous_neo4j1, database = 'neo4j')
    after = time.time()
    msec_duration = (after - before) * 1000
    humongousDict['query1'].append(round(msec_duration, 5))

In [59]:
humongousDict

{'query1': [5255.56993,
  65.04512,
  53.98679,
  46.71192,
  24.19591,
  24.19925,
  24.81484,
  22.03536,
  17.21501,
  20.39909,
  36.15999,
  19.67287,
  21.77405,
  19.56296,
  22.46809,
  21.7011,
  19.57822,
  18.36085,
  19.979,
  20.65396,
  18.00489,
  18.1241,
  17.0002,
  16.3331,
  20.16783,
  18.66698,
  18.14485,
  16.75963,
  16.94226,
  16.00981,
  16.30807],
 'query2': [],
 'query3': [],
 'query4': []}

<h4>Query 2</h4>

In [60]:
# step 1
humongous_neo4j2 = 'MATCH(c:course) WHERE c.discipline = \'statistics\' AND c.courseYear = \'2022\' RETURN c.courseName'

before = time.time()
humongous_query2 = driver4.execute_query(humongous_neo4j2, database = 'neo4j')
after = time.time()

records = humongous_query2.records
for record in records:
    print(record.data()['c.courseName'])

Econometrics: Methods and Applications
Exploratory Data Analysis
Understanding Clinical Research: Behind the Statistics
Introduction to Probability and Data with R
Bayesian Statistics: From Concept to Data Analysis
Introduction to Statistics
Python and Statistics for Financial Analysis
Basic Statistics
Foundations: Data, Data, Everywhere


In [61]:
# step 2
msec_duration = (after - before) * 1000
humongousDict['query2'].append(round(msec_duration, 5))

In [62]:
# step 3
for i in range(0, 30):
    before = time.time()
    driver4.execute_query(humongous_neo4j2, database = 'neo4j')
    after = time.time()
    msec_duration = (after - before) * 1000
    humongousDict['query2'].append(round(msec_duration, 5))

In [63]:
humongousDict

{'query1': [5255.56993,
  65.04512,
  53.98679,
  46.71192,
  24.19591,
  24.19925,
  24.81484,
  22.03536,
  17.21501,
  20.39909,
  36.15999,
  19.67287,
  21.77405,
  19.56296,
  22.46809,
  21.7011,
  19.57822,
  18.36085,
  19.979,
  20.65396,
  18.00489,
  18.1241,
  17.0002,
  16.3331,
  20.16783,
  18.66698,
  18.14485,
  16.75963,
  16.94226,
  16.00981,
  16.30807],
 'query2': [338.23609,
  26.76678,
  23.49782,
  26.41606,
  23.66304,
  29.44207,
  15.87796,
  13.42392,
  12.1839,
  12.98833,
  20.33091,
  12.86292,
  12.52294,
  13.726,
  12.64215,
  14.28819,
  12.75182,
  12.25901,
  13.36503,
  12.57277,
  26.0272,
  13.67974,
  12.84909,
  13.4809,
  12.96997,
  13.92102,
  14.16111,
  20.82276,
  13.33714,
  25.60401,
  17.51089],
 'query3': [],
 'query4': []}

<h4>Query 3</h4>

In [64]:
# step 1
humongous_neo4j3 = 'MATCH (c:course {discipline: \'maths\'}) <-[:is_enrolled]- (s:student) -[:studies]-> (m:material {mType: \'lecture slides\'}) <-[:uses]- (c:course) WHERE right(s.email, 9) = \'gmail.com\' RETURN COUNT(m);'

before = time.time()
humongous_query3 = driver4.execute_query(humongous_neo4j3, database = 'neo4j')
after = time.time()

records = humongous_query3.records
for record in records:
    print(record.data()['COUNT(m)'])

3498


In [65]:
# step 2
msec_duration = (after - before) * 1000
humongousDict['query3'].append(round(msec_duration, 5))

In [66]:
# step 3
for i in range(0, 30):
    before = time.time()
    driver4.execute_query(humongous_neo4j3, database = 'neo4j')
    after = time.time()
    msec_duration = (after - before) * 1000
    humongousDict['query3'].append(round(msec_duration, 5))

In [67]:
humongousDict

{'query1': [5255.56993,
  65.04512,
  53.98679,
  46.71192,
  24.19591,
  24.19925,
  24.81484,
  22.03536,
  17.21501,
  20.39909,
  36.15999,
  19.67287,
  21.77405,
  19.56296,
  22.46809,
  21.7011,
  19.57822,
  18.36085,
  19.979,
  20.65396,
  18.00489,
  18.1241,
  17.0002,
  16.3331,
  20.16783,
  18.66698,
  18.14485,
  16.75963,
  16.94226,
  16.00981,
  16.30807],
 'query2': [338.23609,
  26.76678,
  23.49782,
  26.41606,
  23.66304,
  29.44207,
  15.87796,
  13.42392,
  12.1839,
  12.98833,
  20.33091,
  12.86292,
  12.52294,
  13.726,
  12.64215,
  14.28819,
  12.75182,
  12.25901,
  13.36503,
  12.57277,
  26.0272,
  13.67974,
  12.84909,
  13.4809,
  12.96997,
  13.92102,
  14.16111,
  20.82276,
  13.33714,
  25.60401,
  17.51089],
 'query3': [1648.39602,
  84.55014,
  131.42204,
  45.66669,
  45.17031,
  64.90612,
  57.19399,
  65.99879,
  51.59926,
  55.21107,
  87.50486,
  54.78597,
  96.40718,
  56.98109,
  71.29598,
  67.22999,
  52.48594,
  65.18292,
  1625.29206,

<h4>Query 4</h4>

In [68]:
# step 1
humongous_neo4j4 = 'MATCH (s:student) -[:is_enrolled]-> (c:course {discipline: \'psychology\'}) WHERE right(s.country, 4) = \'orea\' AND left(s.dob, 1) <> \'2\' RETURN DISTINCT s.firstName, s.lastName, s.country ORDER BY s.lastName;'

before = time.time()
humongous_query4 = driver4.execute_query(humongous_neo4j4, database = 'neo4j')
after = time.time()

records = humongous_query4.records
for record in records:
    print(record.data()['s.firstName'], record.data()['s.lastName'], record.data()['s.country'])

Tere Castells República Popular Democrática de Corea
Leila Gailys Korea
Ninthe Horrocks Noord-Korea
Cathrine Lie South Korea
Miguel Real República de Corea
Lynda Reynolds Korea
Debra Shaw Korea
Raghav Sura North Korea


In [69]:
# step 2
msec_duration = (after - before) * 1000
humongousDict['query4'].append(round(msec_duration, 5))

In [70]:
# step 3
for i in range(0, 30):
    before = time.time()
    driver4.execute_query(humongous_neo4j4, database = 'neo4j')
    after = time.time()
    msec_duration = (after - before) * 1000
    humongousDict['query4'].append(round(msec_duration, 5))

In [71]:
humongousDict

{'query1': [5255.56993,
  65.04512,
  53.98679,
  46.71192,
  24.19591,
  24.19925,
  24.81484,
  22.03536,
  17.21501,
  20.39909,
  36.15999,
  19.67287,
  21.77405,
  19.56296,
  22.46809,
  21.7011,
  19.57822,
  18.36085,
  19.979,
  20.65396,
  18.00489,
  18.1241,
  17.0002,
  16.3331,
  20.16783,
  18.66698,
  18.14485,
  16.75963,
  16.94226,
  16.00981,
  16.30807],
 'query2': [338.23609,
  26.76678,
  23.49782,
  26.41606,
  23.66304,
  29.44207,
  15.87796,
  13.42392,
  12.1839,
  12.98833,
  20.33091,
  12.86292,
  12.52294,
  13.726,
  12.64215,
  14.28819,
  12.75182,
  12.25901,
  13.36503,
  12.57277,
  26.0272,
  13.67974,
  12.84909,
  13.4809,
  12.96997,
  13.92102,
  14.16111,
  20.82276,
  13.33714,
  25.60401,
  17.51089],
 'query3': [1648.39602,
  84.55014,
  131.42204,
  45.66669,
  45.17031,
  64.90612,
  57.19399,
  65.99879,
  51.59926,
  55.21107,
  87.50486,
  54.78597,
  96.40718,
  56.98109,
  71.29598,
  67.22999,
  52.48594,
  65.18292,
  1625.29206,

In [72]:
humongousDataset = {'query1' : list(), 'query2' : list(), 'query3' : list(), 'query4' : list()}
for key in humongousDict:
    humongousDataset[key].append(humongousDict[key][0])
    mean30 = mean(humongousDict[key][1 : 31])
    humongousDataset[key].append(round(mean30, 5))
humongousDataset

{'query1': [5255.56993, 23.6992],
 'query2': [338.23609, 16.86485],
 'query3': [1648.39602, 128.11236],
 'query4': [704.70405, 18.21342]}

In [73]:
with open('neo4j_tests.csv', 'w', newline = '') as neo4j_tests:
    writer = csv.writer(neo4j_tests, delimiter = ',')
    keys = smallDict.keys()
    limit = len(smallDict['query1'])
    
    writer.writerow(keys)
    writer.writerow('s') # s stands for small dataset
    for i in range(0, limit):
        writer.writerow(smallDict[k][i] for k in keys)
    writer.writerow('m')  # m stands for medium dataset
    for i in range(0, limit):
        writer.writerow(mediumDict[k][i] for k in keys)
    writer.writerow('l') # l stands for large dataset
    for i in range(0, limit):
        writer.writerow(largeDict[k][i] for k in keys)
    writer.writerow('h') # h stands for humongous dataset
    for i in range(0, limit):
        writer.writerow(humongousDict[k][i] for k in keys)