# Neo4J vs MySQL

## Project Description

Graph databases are specialized data systems to query interconnected data. To this extent, they leverage optimised data structures and indexes that speed up the processing of highly complex queries.

This project requires you to implement a family of path query using MySQL and evaluate the performance difference.

The queries you have to implement investigate the FRIEND relationship with increasing number of hops

For example, given the following graph, and assuming the transitivity of FRIEND and n.id <> m.id

Alice -[FRIEND]-> Bob
Bob -[FRIEND]-> Carl
Carl -[FRIEND]-> Dave

1. (n) -[FRIEND]-> (m) will return n=Alice,m=Bob;n=Bob,m=Carl;n=Carl,m=Dave

2. (n) -[FRIEND*1..2]-> (m) will return what 1. returned, and n=Alice,m=Carl, n=Bob,m=Dave

and so on...


You can:

- use code or SQL to achieve the query result
- you can use/create any helper structure, table
- if you use any index in mysql, justify their presence by measuring their impact

# Generate the Data


<img src="schema.png" alt="5" border="0">

Try different sizes 100,500,1000,10000

In [None]:
friendsNum = 100#500/1000

In [None]:
import random
import sys
import time

num_people = int(friendsNum)
num_friends = int(friendsNum/2)

friendids = range(1,num_people+1)
friends={}
for i in friendids:
    while 1:
        sample = random.sample(friendids,num_friends)
        if i not in sample:
            break
    friends[i]=sample
    if i % 10000 == 0:
        print(i)

In [None]:
fplot = {}
for f,fs in friends.items():
    fplot[f]=len(fs)

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

In [None]:
lists = sorted(fplot.items()) # sorted by key, return a list of tuples

x, y = zip(*lists) # unpack a list of pairs into two tuples

plt.plot(x, y)
plt.show()

Example graph with 10 People.

<img src="graph.png" alt="5" border="0">

## Import Data to Neo4J


### py2neo

py2neo is one of Neo4j's Python drivers. It offers a fully-featured interface for interacting with your data in Neo4j. Install py2neo with pip install py2neo.


In [1]:
! pip install py2neo

Collecting py2neo
  Downloading py2neo-2021.1.1-py2.py3-none-any.whl (203 kB)
[K     |████████████████████████████████| 203 kB 830 kB/s eta 0:00:01
[?25hCollecting neotime~=1.7.4
  Downloading neotime-1.7.4.tar.gz (17 kB)
Collecting docker
  Downloading docker-5.0.0-py2.py3-none-any.whl (146 kB)
[K     |████████████████████████████████| 146 kB 2.8 MB/s eta 0:00:01
Collecting pansi>=2020.7.3
  Downloading pansi-2020.7.3-py2.py3-none-any.whl (10 kB)
Collecting monotonic
  Downloading monotonic-1.6-py2.py3-none-any.whl (8.2 kB)
Collecting english
  Downloading english-2020.7.0-py2.py3-none-any.whl (8.1 kB)
Building wheels for collected packages: neotime
  Building wheel for neotime (setup.py) ... [?25ldone
[?25h  Created wheel for neotime: filename=neotime-1.7.4-py3-none-any.whl size=20541 sha256=721cb50a81243a84e6db71978fbb00b6d71a30a15832ebc2a503edee27298dd9
  Stored in directory: /home/jovyan/.cache/pip/wheels/aa/47/bb/6e5c41d174666c8a7d870f7db23f120b1a70fa64b60154535f
Successfull


### Connect

Connect to Neo4j with the Graph class.


In [2]:
from py2neo import Node,Graph, Relationship, NodeMatcher
try:
    graph = Graph("bolt://neo:7687")
except:
    print("Error Connection to Neo4j DB!!")

## create nodes and relationship

In [None]:
nodes = NodeMatcher(graph)
for p,fs in friends.items():
    pn = Node("Person", id=p, name="Person"+str(p))
    graph.create(pn)

In [None]:
for p,fs in friends.items():
    for f in fs:
        pn = nodes.match("Person", id=p, name="Person"+str(p)).first()
        fn = nodes.match("Person", id=f, name="Person"+str(f)).first()
        r  = Relationship(pn, "FRIEND", fn)
        graph.create(r)

## RUN THIS QUERY 3-5 TIMES TO WARM UP THE CACHE

In [None]:
query = """ MATCH (n:Person)-[r]-> (m) RETURN n.id as n,m.id as m"""
ns = []
for node in graph.run(query):
     ns.append(node)

In [None]:
ns

## 1 HOP

In [None]:
start = time.time()
query = """ MATCH (n:Person)-[r*1..2]-> (m) WHERE n.id <> m.id RETURN n.id as n,m.id as m"""
ns = []
for node in graph.run(query):
     ns.append(node)
print(time.time()-start)

In [None]:
ns

## 2 HOPS

In [None]:
start = time.time()
query = """ MATCH (n:Person)-[r*1..3]-> (m) WHERE n.id <> RETURN n.id as n,m.id as m"""
ns = []
for node in graph.run(query):
     ns.append(node)
print(time.time()-start)

In [None]:
ns

## 5 HOPS

In [None]:
start = time.time()
query = """ MATCH (n:Person)-[r*1..5]-> (m) WHERE n.id <> m.id RETURN n.id as n,m.id as m"""
ns = []
for node in graph.run(query):
     ns.append(node)
print(time.time()-start)

In [None]:
ns

## N HOPS

In [None]:
start = time.time()
query = """ MATCH (n:Person)-[r*]-> (m) WHERE n.id <> m.id RETURN n.id as n,m.id as m"""
ns = []
for node in graph.run(query):
     ns.append(node)
print(time.time()-start)

In [None]:
ns

## Import Data to MySQL

## mysql-connector

In [3]:
! pip install mysql-connector-python 

Collecting mysql-connector-python
  Downloading mysql_connector_python-8.0.25-cp39-cp39-manylinux1_x86_64.whl (25.4 MB)
[K     |████████████████████████████████| 25.4 MB 11.3 MB/s eta 0:00:01   |███                             | 2.4 MB 2.1 MB/s eta 0:00:11
Installing collected packages: mysql-connector-python
Successfully installed mysql-connector-python-8.0.25


### Connect

Connect to MySQL and configure the database




In [4]:
import mysql.connector

mydb = mysql.connector.connect(
  host="mysql",
  user="root",
  password="pass1234")


mycursor = mydb.cursor()

In [None]:
mycursor.execute("CREATE DATABASE graph")

In [None]:
mycursor.execute("SHOW DATABASES")

for x in mycursor:
  print(x) 

In [None]:
mycursor.execute("USE graph")

In [None]:
mycursor.execute("CREATE TABLE friends (person1 VARCHAR(255), person2 VARCHAR(255))")

for x in mycursor:
  print(x) 

In [None]:
mycursor.execute("SHOW TABLES")

for x in mycursor:
  print(x) 

In [None]:
sql = "INSERT INTO friends (person1, person2) VALUES (%s, %s)"
val = []
for p,fs in friends.items():
    for f in fs:
        val.append(("Person"+str(p), "Person"+str(f)))

mycursor.executemany(sql, val)

mydb.commit()

print(mycursor.rowcount, "record inserted.")

## RUN THIS QUERY 5 TIMES TO WARM UP THE CACHE

In [None]:
mycursor.execute("SELECT * FROM friends")
for x in mycursor:
  print(x) 

# Task

In [None]:
# Write here you code

## Cleanup

In case you need to clean up the databases.

In [None]:
graph.delete_all()

In [None]:
mycursor.execute("DROP TABLE friends")