# Start Here
___________________________________________________________________________________________________________________________________________________________________________________________________

This notebook will focus on interacting with a Neo4j database from Python focusing on the `MATCH` and `RETURN` cypher keywords. The downside to this, is it's hard to visualize some of the queries being ran. It's highly encourage to switch between the notebook and your local Neo4j Desktop database.

To download Neo4j Desktop, follow the instructions [here.](https://neo4j.com/download/)


🛑 **IMPORTANT**

In the "./utils" directory is a helper file `Neo4jParser.py`. I've gone ahead and wrote some code in assisting with parsing the results of queries returned from Neo4j. According to Neo4j's [documentation](https://neo4j.com/docs/api/python-driver/current/api.html#), there are several methods to parse an `EagerResult`. Please go to the utils directory and open `Neo4jParser.py` to understand the functions available.
> If you are newer to Python, or only looking for the data behind your queries, use: `Neo4jParser.simple_parse()`. Please be aware that the `neo4j.Record.data()` method will look different from the results you see in Neo4j Browser. For this reason, it's encouraged to use `Neo4jParser.parse()`.

> For an all encompassing view of the data and to match a format more similar to what you will see from the Neo4j Browser, use: `Neo4jParser.parse()`. This is the recommended approach and the one that will be used throughout this series.


⚠️ **NOTICE**

**If working from WSL...**

When working from WSL2 and Neo4j Desktop is installed on the Windows side, you have to set up port forwarding. To do this, open a Powershell administrator window and run the following:
1. Start the default Neo4j Desktop database "Movies DBMS".
2. Run `ipconfig` to fetch your machine's IP address
    * From this point forward assume you have a Windows ip address of: '123.456.78.900'
3. Run `netsh interface portproxy set v4tov4 listenport=7687 listenaddress=123.456.78.900 connectport=7687 connectaddress=127.0.0.1`
    * NOTE: The 'listenport' and 'connectport' should be the same port your Neo4j database is running on.
4. To verify, run `netsh interface portproxy show v4tov4`
5. To disable the port forwarding, run `netsh interface portproxy delete v4tov4 listenport=1234 listenaddress=123.456.78.900`

If working from a windows/mac environment where Neo4j Desktop is installed, the default 'localhost' URI should be sufficient.

In [1]:
from neo4j import GraphDatabase, Record, ResultSummary, EagerResult
from neo4j.time import Date

import pandas as pd
pd.set_option('display.max_colwidth', 100)

import os 
import sys
import socket
from dotenv import load_dotenv 
load_dotenv()

# Add the utils directory to sys.path
sys.path.append(os.path.abspath("../utils"))

from Neo4jParser import Neo4jParser


NEO4J_URI = os.getenv("NEO4J_URI")
NEO4J_USERNAME = os.getenv("NEO4J_USERNAME")
NEO4J_PASSWORD = os.getenv("NEO4J_PASSWORD")

driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD))

## `MATCH`

* `MATCH` -> Used like SQL's `SELECT` statement. It is a read only command to extract data from the graph.
    * To return everything from the graph we can run `MATCH (n) RETURN n;`
* To return related nodes, we call a node's label. We can write `MATCH (n:Person) RETURN n;` to return ALL 'Person' nodes.
    * To return related nodes with more than one label: `MATCH (n:Person:Doctors) RETURN n;` to return ALL 'Person' and 'Doctor' nodes.
* We can match nodes via a pattern by not specifying a direction in the relationship, for example: `MATCH (n:Person)--(d:Doctor) RETURN *` tells the query to return all instances of person and doctor regardless of the direction of the relationship.
* You can use backticks '\`' to introduce uncommon characters in your queries. For example `MATCH (``THIS IS MY NODE VARIABLE``) RETRUN ``THIS IS MY NODE VARIABLE``;`
* You can use `MATCH` to select a node and use it's properties as properties for a new node, for example:<br>

    `MATCH (n:Person {name:"Tom Hanks"})`<br>
    `CREATE (n:Person {name:n.name})`<br>
    `RETURN *`
* We can do the same for relationships, for example:<br>

    `MATCH (n:Person {name:"Sally"})-[r:PURCHASED]->(f:Food {item:"Pickles})`<br>
    `CREATE (n:Person {name:"Sally", purchased_pickles_on: r.purchased_on})`<br>
    `RETURN *`
* To match with multiple relationships you separate the names with "|" like so: `MATCH (p:Person)-[:ACTED_IN|DIRECTED]->(m:Movie) RETURN p, m;`

## `RETURN`

* At this point are familiar with the basics of `RETURN`, but here we will look at it a little bit deeper.
* `RETURN` is equivalent to SQL's `SELECT`, much like the `MATCH` keyword we learned about previously.
* `RETURN` allows us to return all, or we can return specific properties, aggregations, filtered data, etc. just like `SELECT` in SQL.
* If I have a query: `MATCH (n:Person) RETURN n;` this will return the nodes and we will see a picture of all the nodes. If we want to return the data in tabular format. We can return the specific properties or use a function `properties` to return all properties of my nodes: `MATCH (n) RETURN properties(n) as prop;`. This will not include the label of the nodes. To add that you can do: `MATCH (n) RETURN properties(n) as prop, labels(n) as n_label;`
* For returned relationship data, we have a few options that include but are not limited to, assuming we have the query `MATCH (n)-[r]-(m)`: `type(r)` -> returns the name or label of the relationship, `r` returns all data about 'r', or `r.<property_name>` to return specific properties.
* You can use `DISTINCT` keyword after `RETURN` the same way you would use it in SQL, to return non-duplicated information.
* You can use `RETURN reltionships(n)` to return all relationships in your query. This is helpful in a complicated path variable.

In [2]:
# Let's query our graph and process the results to extract the labels, properties, and relationship types
result = driver.execute_query(
    """ 
    MATCH (p:Person)
    RETURN {
        nodes: {properties: properties(p), labels: labels(p)}
    } AS record
    """,
    database_="neo4j"
)

df = Neo4jParser.parse(result, True, True)
df.head()

Started streaming 139 records after 2 ms and completed after 14 ms.

Query executed against database: 'neo4j':  
    MATCH (p:Person)
    RETURN {
        nodes: {properties: properties(p), labels: labels(p)}
    } AS record
    


Unnamed: 0,record
0,"{'nodes': {'labels': ['Person'], 'properties': {'born': 1964, 'name': 'Keanu Reeves'}}}"
1,"{'nodes': {'labels': ['Person'], 'properties': {'born': 1967, 'name': 'Carrie-Anne Moss'}}}"
2,"{'nodes': {'labels': ['Person'], 'properties': {'born': 1961, 'name': 'Laurence Fishburne'}}}"
3,"{'nodes': {'labels': ['Person'], 'properties': {'born': 1960, 'name': 'Hugo Weaving'}}}"
4,"{'nodes': {'labels': ['Person'], 'properties': {'born': 1967, 'name': 'Andy Wachowski'}}}"


In [3]:
# Let's return all relationships in a very open ended path match pattern
result = driver.execute_query(
    """ 
    MATCH x = (p:Person {name:"Tom Hanks"})--()--()
    RETURN x
    """,
    database_="neo4j"
)

df = Neo4jParser.parse(result, True, True)
df.head()

Started streaming 59 records after 2 ms and completed after 22 ms.

Query executed against database: 'neo4j':  
    MATCH x = (p:Person {name:"Tom Hanks"})--()--()
    RETURN x
    


Unnamed: 0,x
0,"{'startNodeElementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:71', 'nodes': [{'elementId': '4:5..."
1,"{'startNodeElementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:71', 'nodes': [{'elementId': '4:5..."
2,"{'startNodeElementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:71', 'nodes': [{'elementId': '4:5..."
3,"{'startNodeElementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:71', 'nodes': [{'elementId': '4:5..."
4,"{'startNodeElementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:71', 'nodes': [{'elementId': '4:5..."


In [4]:
# Let's match all records where people directed in a movie
result = driver.execute_query(
    """ 
    MATCH (p:Person)-[:DIRECTED]->(m:Movie) RETURN p, m;
    """,
    database_="neo4j"
)

df = Neo4jParser.parse(result, True, True)
df.head()

Started streaming 44 records after 0 ms and completed after 2 ms.

Query executed against database: 'neo4j':  
    MATCH (p:Person)-[:DIRECTED]->(m:Movie) RETURN p, m;
    


Unnamed: 0,p,m
0,"{'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:5', 'labels': ('Person'), 'properties': {'...","{'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:0', 'labels': ('Movie'), 'properties': {'t..."
1,"{'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:6', 'labels': ('Person'), 'properties': {'...","{'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:0', 'labels': ('Movie'), 'properties': {'t..."
2,"{'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:5', 'labels': ('Person'), 'properties': {'...","{'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:9', 'labels': ('Movie'), 'properties': {'t..."
3,"{'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:6', 'labels': ('Person'), 'properties': {'...","{'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:9', 'labels': ('Movie'), 'properties': {'t..."
4,"{'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:5', 'labels': ('Person'), 'properties': {'...","{'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:10', 'labels': ('Movie'), 'properties': {'..."


In [5]:
# Let's match all records where people either acted in or directed a movie. Notice the record count difference from the previous cell
result = driver.execute_query(
    """ 
    MATCH (p:Person)-[:ACTED_IN|DIRECTED]->(m:Movie) RETURN p, m;
    """,
    database_="neo4j"
)

df = Neo4jParser.parse(result, True, True)
df.head()

Started streaming 224 records after 1 ms and completed after 17 ms.

Query executed against database: 'neo4j':  
    MATCH (p:Person)-[:ACTED_IN|DIRECTED]->(m:Movie) RETURN p, m;
    


Unnamed: 0,p,m
0,"{'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:1', 'labels': ('Person'), 'properties': {'...","{'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:0', 'labels': ('Movie'), 'properties': {'t..."
1,"{'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:2', 'labels': ('Person'), 'properties': {'...","{'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:0', 'labels': ('Movie'), 'properties': {'t..."
2,"{'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:3', 'labels': ('Person'), 'properties': {'...","{'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:0', 'labels': ('Movie'), 'properties': {'t..."
3,"{'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:4', 'labels': ('Person'), 'properties': {'...","{'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:0', 'labels': ('Movie'), 'properties': {'t..."
4,"{'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:8', 'labels': ('Person'), 'properties': {'...","{'elementId': '4:552b0252-2f83-4c7e-a0bf-f921a4b1b7cf:0', 'labels': ('Movie'), 'properties': {'t..."


In [7]:
# Who are other people who acted in Movies with Keanu Reeves?
result = driver.execute_query(
    """ 
    MATCH (keanu:Person {name:"Keanu Reeves"})-[:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(p:Person) RETURN DISTINCT p.name;
    """,
    database_="neo4j"
)

df = Neo4jParser.parse(result, True, False)
df

Started streaming 14 records after 16 ms and completed after 17 ms.

Query executed against database: 'neo4j':  
    MATCH (keanu:Person {name:"Keanu Reeves"})-[:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(p:Person) RETURN DISTINCT p.name;
    


{'p.name': ['Carrie-Anne Moss',
  'Laurence Fishburne',
  'Hugo Weaving',
  'Emil Eifrem',
  'Charlize Theron',
  'Al Pacino',
  'Brooke Langton',
  'Gene Hackman',
  'Orlando Jones',
  'Takeshi Kitano',
  'Dina Meyer',
  'Ice-T',
  'Jack Nicholson',
  'Diane Keaton']}