# About

This notebook will focus on pattern matching in Cypher QL.

In [None]:
from neo4j import GraphDatabase, Record, ResultSummary, EagerResult
from neo4j.time import Date

import pandas as pd
pd.set_option('display.max_colwidth', 100)

import os 
import sys
from dotenv import load_dotenv 
load_dotenv()

# Add the utils directory to sys.path
sys.path.append(os.path.abspath("../utils"))

from Neo4jParser import Neo4jParser


NEO4J_URI = os.getenv("NEO4J_URI")
NEO4J_USERNAME = os.getenv("NEO4J_USERNAME")
NEO4J_PASSWORD = os.getenv("NEO4J_PASSWORD")

driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD))

**NOTES:**

* *Patterns:*
    * `(a)-->(b)` returns all nodes in the graph that have a relationship. These are called "related nodes".
    * `(a)-->(c)<--(b)` nodes that have more than one relationship in Neo4j are called "paths".
    * `(a)--(b)` if you don't want to show directions, you don't have to. However, this typically produces a lot of extra results. Say you have a node A that matches to Node B like this: `(A)-->(b)`. With the pattern `(a)--(b)`, without specifiying a relationship, this will return A and B twice, which is usually not desired, but technically what the query author is asking for.
    * `(a)-[r]->(b)` the only way to assess the relationship of a path, is by declaring a variable. In this case 'r'.
    * `(a)-[r:RelType]->(b)` returns all records of a specific relationship type.
    * `(a)-[r:FOLLOWS|BUYS]->(b)` returns all records of relationships that are either 'FOLLOWS' or 'BUYS'
    * `(a)-[*]->(b)` returns all possible paths. Extremly complex query. Not necessarily advised to run without some filtered parameters.
    * `(a)-[*3]->(b)` returs all paths that are three relationships deep.
    * `(a)-[*1..2]->(b)` returns all paths that are 1 and 2 relationships deep.
    * `(a)-[*0..4]->(b)` returns all paths that are 0 and 4 relationships deep. Works like an optional relatioship or `OPTIONAL MATCH`.
        * You can specify optional relationships with: `MATCH (p:Person {name:"Tom Hanks"})-[*0..1]->(m:Movie) RETURN *`
            * This query says show Tom Hanks for all relationships with 0 to 1 times with Movies. The '0' part is the node referring to itself. Like of it like `OPTIONAL MATCH`.
    * `(a)-[*..2]->(b)` returns maximum of 2 relationships deep.
    * `(a)-[*2..]->(b)` returns minimum of 2 relationships deep.

* All patterns using an asterik can contain any relationship type in front of it to filter the relationship type allowed.
* `allShortestPaths` function takes a pattern as input and returns the shortest path(s) between them. Can be a very powerful function.

In [30]:
# Return all relationships in the database
result = driver.execute_query(
    """ 
    MATCH ()-[r]-()
    RETURN DISTINCT type(r) as relationships
    """, 
    database_="neo4j"
)

data = Neo4jParser.parse(result, True, False)
data

Started streaming 7 records after 0 ms and completed after 0 ms.

Query executed against database: 'neo4j':  
    MATCH ()-[r]-()
    RETURN DISTINCT type(r) as relationships
    


{'relationships': ['ACTED_IN',
  'FRIENDS_WITH',
  'DIRECTED',
  'PRODUCED',
  'WROTE',
  'FOLLOWS',
  'REVIEWED']}

In [31]:
# Return ALL data from my graph
result = driver.execute_query(
    """ 
    MATCH (n)-[r]-(m)
    RETURN *
    """, 
    database_="neo4j"
)

data = Neo4jParser.parse(result, True, False)
# For each key returned, show the top 5 records
head = {}
for key in data.keys():
    head[key] = data[key][:5]

head

Started streaming 863 records after 0 ms and completed after 27 ms.

Query executed against database: 'neo4j':  
    MATCH (n)-[r]-(m)
    RETURN *
    


{'r': [{'startNode': {'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:1',
    'labels': frozenset({'Person'}),
    'properties': {'born': 1964, 'name': 'Keanu Reeves'}},
   'elementId': '5:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:1152921504606846977',
   'type': 'ACTED_IN',
   'properties': {'roles': ['Neo']},
   'endNode': {'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:0',
    'labels': frozenset({'Movie'}),
    'properties': {'tagline': 'Welcome to the Real World',
     'title': 'The Matrix',
     'released': 1999}}},
  {'startNode': {'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:1',
    'labels': frozenset({'Person'}),
    'properties': {'born': 1964, 'name': 'Keanu Reeves'}},
   'elementId': '5:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:1152921504606846977',
   'type': 'ACTED_IN',
   'properties': {'roles': ['Neo']},
   'endNode': {'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:0',
    'labels': frozenset({'Movie'}),
    'properties': {'tagline': 'Welcome to the Real

In [32]:
# you can return nested relationships in Neo4j by specifing "*x..y" where the number of relationships is at a minimum "x" and maximum "y"
result = driver.execute_query(
    """ 
    MATCH (n:Person)-[r:ACTED_IN*1..2]-(m:Movie)
    RETURN *
    """, 
    database_="neo4j"
)

data = Neo4jParser.parse(result, True, False)
# For each key returned, show the top 5 records
head = {}
for key in data.keys():
    head[key] = data[key][:5]

head

Started streaming 180 records after 2 ms and completed after 21 ms.

Query executed against database: 'neo4j':  
    MATCH (n:Person)-[r:ACTED_IN*1..2]-(m:Movie)
    RETURN *
    


{'r': [[<Relationship element_id='5:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:1152921504606846977' nodes=(<Node element_id='4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:1' labels=frozenset({'Person'}) properties={'born': 1964, 'name': 'Keanu Reeves'}>, <Node element_id='4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:0' labels=frozenset({'Movie'}) properties={'tagline': 'Welcome to the Real World', 'title': 'The Matrix', 'released': 1999}>) type='ACTED_IN' properties={'roles': ['Neo']}>],
  [<Relationship element_id='5:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:1152921504606846978' nodes=(<Node element_id='4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:2' labels=frozenset({'Person'}) properties={'born': 1967, 'name': 'Carrie-Anne Moss'}>, <Node element_id='4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:0' labels=frozenset({'Movie'}) properties={'tagline': 'Welcome to the Real World', 'title': 'The Matrix', 'released': 1999}>) type='ACTED_IN' properties={'roles': ['Trinity']}>],
  [<Relationship element_id='5:552b0252-2f83-4c7e

In [33]:
# you can return all relationships with two nodes, two layers deep
result = driver.execute_query(
    """ 
    MATCH x = (n:Person)-[*2]-(m:Movie)
    RETURN x LIMIT 1
    """, 
    database_="neo4j"
)

data = Neo4jParser.parse(result, True, False)
# For each key returned, show the top 5 records
head = {}
for key in data.keys():
    head[key] = data[key][:5]

head

Started streaming 1 records after 1 ms and completed after 1 ms.

Query executed against database: 'neo4j':  
    MATCH x = (n:Person)-[*2]-(m:Movie)
    RETURN x LIMIT 1
    


{'x': [{'startNodeElementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:5',
   'nodes': [{'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:5',
     'labels': frozenset({'Person'}),
     'properties': {'born': 1967, 'name': 'Andy Wachowski'}},
    {'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:1',
     'labels': frozenset({'Person'}),
     'properties': {'born': 1964, 'name': 'Keanu Reeves'}},
    {'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:0',
     'labels': frozenset({'Movie'}),
     'properties': {'tagline': 'Welcome to the Real World',
      'title': 'The Matrix',
      'released': 1999}}],
   'relationships': [{'startNode': {'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:1',
      'labels': frozenset({'Person'}),
      'properties': {'born': 1964, 'name': 'Keanu Reeves'}},
     'elementId': '5:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:6917536724222476293',
     'type': 'FRIENDS_WITH',
     'properties': {},
     'endNode': {'elementId': '4:552b0252-2f83-4c7e-a0b

In [34]:
# A query to return all relationships where 1 node matches to another node
result = driver.execute_query(
    """ 
    MATCH x = ()-[*1..1]->()
    RETURN x
    """, 
    database_="neo4j"
)

data = Neo4jParser.parse(result, True, False)
# For each key returned, show the top 5 records
head = {}
for key in data.keys():
    head[key] = data[key][:5]

head

Started streaming 433 records after 0 ms and completed after 21 ms.

Query executed against database: 'neo4j':  
    MATCH x = ()-[*1..1]->()
    RETURN x
    


{'x': [{'startNodeElementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:1',
   'nodes': [{'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:1',
     'labels': frozenset({'Person'}),
     'properties': {'born': 1964, 'name': 'Keanu Reeves'}},
    {'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:0',
     'labels': frozenset({'Movie'}),
     'properties': {'tagline': 'Welcome to the Real World',
      'title': 'The Matrix',
      'released': 1999}}],
   'relationships': [{'startNode': {'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:1',
      'labels': frozenset({'Person'}),
      'properties': {'born': 1964, 'name': 'Keanu Reeves'}},
     'elementId': '5:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:1152921504606846977',
     'type': 'ACTED_IN',
     'properties': {'roles': ['Neo']},
     'endNode': {'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:0',
      'labels': frozenset({'Movie'}),
      'properties': {'tagline': 'Welcome to the Real World',
       'title': 'The Matrix',
  

In [35]:
# A query to return nodes that have four relationships deep
result = driver.execute_query(
    """ 
    MATCH x = ()-[*4]->()
    RETURN x
    """, 
    database_="neo4j"
)

data = Neo4jParser.parse(result, True, False)
# For each key returned, show the top 5 records
head = {}
for key in data.keys():
    head[key] = data[key][:5]

head

Started streaming 69 records after 0 ms and completed after 15 ms.

Query executed against database: 'neo4j':  
    MATCH x = ()-[*4]->()
    RETURN x
    


{'x': [{'startNodeElementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:12',
   'nodes': [{'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:12',
     'labels': frozenset({'Person'}),
     'properties': {'born': 1975, 'name': 'Charlize Theron'}},
    {'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:71',
     'labels': frozenset({'Person'}),
     'properties': {'born': 1956, 'name': 'Tom Hanks'}},
    {'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:71',
     'labels': frozenset({'Person'}),
     'properties': {'born': 1956, 'name': 'Tom Hanks'}},
    {'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:5',
     'labels': frozenset({'Person'}),
     'properties': {'born': 1967, 'name': 'Andy Wachowski'}},
    {'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:0',
     'labels': frozenset({'Movie'}),
     'properties': {'tagline': 'Welcome to the Real World',
      'title': 'The Matrix',
      'released': 1999}}],
   'relationships': [{'startNode': {'elementId': '4:552b025

In [36]:
# Write a query to show zero length paths
result = driver.execute_query(
    """ 
    MATCH (p:Person {name:"Tom Hanks"})-[*0..1]->(m:Movie)
    RETURN *
    """, 
    database_="neo4j"
)

data = Neo4jParser.parse(result, True, False)
# For each key returned, show the top 5 records
head = {}
for key in data.keys():
    head[key] = data[key][:5]

head

Started streaming 13 records after 1 ms and completed after 4 ms.

Query executed against database: 'neo4j':  
    MATCH (p:Person {name:"Tom Hanks"})-[*0..1]->(m:Movie)
    RETURN *
    


{'p': [{'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:71',
   'labels': frozenset({'Person'}),
   'properties': {'born': 1956, 'name': 'Tom Hanks'}},
  {'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:71',
   'labels': frozenset({'Person'}),
   'properties': {'born': 1956, 'name': 'Tom Hanks'}},
  {'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:71',
   'labels': frozenset({'Person'}),
   'properties': {'born': 1956, 'name': 'Tom Hanks'}},
  {'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:71',
   'labels': frozenset({'Person'}),
   'properties': {'born': 1956, 'name': 'Tom Hanks'}},
  {'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:71',
   'labels': frozenset({'Person'}),
   'properties': {'born': 1956, 'name': 'Tom Hanks'}}],
 'm': [{'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:67',
   'labels': frozenset({'Movie'}),
   'properties': {'tagline': 'At odds in life... in love on-line.',
    'title': "You've Got Mail",
    'released': 1998}},
  {'eleme

In [37]:
# You can find the shortest path between two nodes by matching two and then using the 'allShortestPath' function
result = driver.execute_query(
    """ 
    MATCH 
        (jessica:Person {name:"Jessica Thompson"}), 
        (tom:Person {name: "Tom Hanks"}),
        shortestPath = allShortestPaths((jessica)-[*]-(tom))
    RETURN shortestPath
    """, 
    database_="neo4j"
)

data = Neo4jParser.parse(result, True, False)
# For each key returned, show the top 5 records
head = {}
for key in data.keys():
    head[key] = data[key][:5]

# This is a good example to run in Neo4j desktop to better visualize the results. But there are two equally shortest paths.
head

Started streaming 2 records after 1 ms and completed after 9 ms.

Query executed against database: 'neo4j':  
    MATCH 
        (jessica:Person {name:"Jessica Thompson"}), 
        (tom:Person {name: "Tom Hanks"}),
        shortestPath = allShortestPaths((jessica)-[*]-(tom))
    RETURN shortestPath
    


{'shortestPath': [{'startNodeElementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:167',
   'nodes': [{'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:167',
     'labels': frozenset({'Person'}),
     'properties': {'name': 'Jessica Thompson'}},
    {'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:105',
     'labels': frozenset({'Movie'}),
     'properties': {'tagline': 'Everything is connected',
      'title': 'Cloud Atlas',
      'released': 2012}},
    {'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:71',
     'labels': frozenset({'Person'}),
     'properties': {'born': 1956, 'name': 'Tom Hanks'}}],
   'relationships': [{'startNode': {'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:167',
      'labels': frozenset({'Person'}),
      'properties': {'name': 'Jessica Thompson'}},
     'elementId': '5:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:1152927002164986023',
     'type': 'REVIEWED',
     'properties': {'summary': 'An amazing journey', 'rating': 95},
     'endNode': {'el