# Start Here
___________________________________________________________________________________________________________________________________________________________________________________________________

This notebook will focus on interacting with a Neo4j database from Python focusing on the `CREATE` cypher keyword. The downside to this, it's hard to visualize some of the queries being ran. It's highly encourage to switch between the notebook and your local Neo4j Desktop database.

To download Neo4j Desktop, follow the instructions [here.](https://neo4j.com/download/)


🛑 **IMPORTANT**

In the "./utils" directory is a helper file `Neo4jParser.py`. I've gone ahead and wrote some code in assisting with parsing the results of queries returned from Neo4j. According to Neo4j's [documentation](https://neo4j.com/docs/api/python-driver/current/api.html#), there are several methods to parse an `EagerResult`. Please go to the utils directory and open `Neo4jParser.py` to understand the functions available.
> If you are newer to Python, or only looking for the data behind your queries, use: `Neo4jParser.simple_parse()`. Please be aware that the `neo4j.Record.data()` method will look different from the results you see in Neo4j Browser. For this reason, it's encouraged to use `Neo4jParser.parse()`.

> For an all encompassing view of the data and to match a format more similar to what you will see from the Neo4j Browser, use: `Neo4jParser.parse()`. This is the recommended approach and the one that will be used throughout this series.


⚠️ **NOTICE**

**If working from WSL...**

When working from WSL2 and Neo4j Desktop is installed on the Windows side, you have to set up port forwarding. To do this, open a Powershell administrator window and run the following:
1. Start the default Neo4j Desktop database "Movies DBMS".
2. Run `ipconfig` to fetch your machine's IP address
    * From this point forward assume you have a Windows ip address of: '123.456.78.900'
3. Run `netsh interface portproxy set v4tov4 listenport=7687 listenaddress=123.456.78.900 connectport=7687 connectaddress=127.0.0.1`
    * NOTE: The 'listenport' and 'connectport' should be the same port your Neo4j database is running on.
4. To verify, run `netsh interface portproxy show v4tov4`
5. To disable the port forwarding, run `netsh interface portproxy delete v4tov4 listenport=1234 listenaddress=123.456.78.900`

If working from a windows/mac environment where Neo4j Desktop is installed, the default 'localhost' URI should be sufficient.

## `CREATE`

`CREATE` -> Used to create nodes in a graph.
* **Creating Nodes**
    * **WARNING:** Although this section is focused on using `CREATE`, it does have the ability to generate duplicate nodes. For example, `CREATE (n:Person {id: 1}) RETURN n;` will generate a new 'Person' node with property: '{id: 1}' as many times as the query is executed. However, swapping `CREATE` for `MERGE` will prevent duplicates from appearing in the graph.
        * It is better practice to use `MERGE` over `CREATE`
    * Create a node with single labels by using: `MERGE (n:Person) RETURN n`. Specify multiple labels like `MERGE (n:Person:Animal:Professional) RETURN n`
    * Create a node with properties inside curly brackets like so: `MERGE (n:Person {name: "Henry", age: 17}) RETURN n`

* **Creating Relationships**
    * This is best illustrated through an example. Let's say Sally goes to the store to buy some pickles. To represent this graphically, first create two nodes for entities Sally (person) and pickles (food). The action or relationship between the two is the act of purchasing. So, in cypher we would write:
        > `CREATE (p:Person {name: "Sally"}), (f:Food {item: "Pickles"}), (p)-[:PURCHASES]->(f) RETURN p, f;`

        > `MERGE (p:Person {name: "Sally"})`<br>
          `MERGE (f:Food {item: "Pickles"})`<br>
          `MERGE (p)-[:PURCHASES]->(f)`<br>
          `RETURN p, f;`<br>

`RETURN` -> Instructed the graph to send back data from the graph based on what follows the 'RETURN' keyword.
* `RETURN *` will return all variables and properties from the query.

In [1]:
from neo4j import GraphDatabase, Record, ResultSummary, EagerResult
from neo4j.time import Date

import pandas as pd
pd.set_option('display.max_colwidth', 100)

import os 
import sys
import socket
from dotenv import load_dotenv 
load_dotenv()

# Add the utils directory to sys.path
sys.path.append(os.path.abspath("../utils"))

from Neo4jParser import Neo4jParser

NEO4J_URI = os.getenv("NEO4J_URI")
NEO4J_USERNAME = os.getenv("NEO4J_USERNAME")
NEO4J_PASSWORD = os.getenv("NEO4J_PASSWORD")

driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD))

In [2]:
# We are going to create a dev database so that we can do whatever we want in it
driver.execute_query("CREATE DATABASE dev IF NOT EXISTS;")

EagerResult(records=[], summary=<neo4j._work.summary.ResultSummary object at 0x7fab99760950>, keys=[])

What the heck is an ['EagerResult'](https://neo4j.com/docs/api/python-driver/current/api.html#neo4j.EagerResult)?

In [3]:
# Let's create 5 nodes 
for _ in range(5):
    result: EagerResult[list[Record], ResultSummary, list[str]] = driver.execute_query(
        """ 
        CREATE (n) RETURN n;
        """,
        database_="dev"
    )

**NOTE:**

To keep things clean, I'll only annotate the 'result' variable once. Clearly the response returned from Neo4j is a pretty complex object that holds a ton of useful information. I would encourage you to explore their documentation and modify the `Neo4jParser.parse` function to provide you with information that is the most helpful to you!

In [5]:
# Let's see how many nodes are in our graph
result: EagerResult[list[Record], ResultSummary, list[str]] = driver.execute_query(
    """ 
    MATCH (n) RETURN n;
    """,
    database_="dev"
)

df = Neo4jParser.parse(result, True)
df.head()

Started streaming 21 records after 1 ms and completed after 3 ms.

Query executed against database: 'dev':  
    MATCH (n) RETURN n;
    


Unnamed: 0,n
0,"{'elementId': '4:ff73e06d-56ad-4959-b409-fcc3d9dce978:30', 'labels': ('Person'), 'properties': {..."
1,"{'elementId': '4:ff73e06d-56ad-4959-b409-fcc3d9dce978:31', 'labels': ('Food'), 'properties': {'i..."
2,"{'elementId': '4:ff73e06d-56ad-4959-b409-fcc3d9dce978:32', 'labels': ('Person'), 'properties': {..."
3,"{'elementId': '4:ff73e06d-56ad-4959-b409-fcc3d9dce978:33', 'labels': ('Food'), 'properties': {'i..."
4,"{'elementId': '4:ff73e06d-56ad-4959-b409-fcc3d9dce978:34', 'labels': ('Person'), 'properties': {..."


**NOTE:**

Notice how `CREATE` made 5 copies of the same node.

In [6]:
# Let's create a person in our graph named Sally
result = driver.execute_query(
    """ 
    CREATE (p:Person {name: "Sally"})
    RETURN p
    """,
    database_="dev"
)

df = Neo4jParser.parse(result, True)

# Let's take a look at how our record is represented in python
df.head()

Started streaming 1 records after 16 ms and completed after 17 ms.

Query executed against database: 'dev':  
    CREATE (p:Person {name: "Sally"})
    RETURN p
    


Unnamed: 0,p
0,"{'elementId': '4:ff73e06d-56ad-4959-b409-fcc3d9dce978:29', 'labels': ('Person'), 'properties': {..."


In [7]:
# Let's delete all records in Neo4j
result = driver.execute_query(
    """ 
    MATCH (n) DETACH DELETE n;
    """,
    database_="dev"
)

df = Neo4jParser.parse(result, True)

Started streaming 0 records after 18 ms and completed after 18 ms.

Query executed against database: 'dev':  
    MATCH (n) DETACH DELETE n;
    


In [8]:
# Let's create a relationship with one direction to represent the scenario "Sally goes to the store and purchases pickles"
result = driver.execute_query(
    """ 
    CREATE (p:Person {name: "Sally"}), (f:Food {item: "Pickles"}), (p)-[r:PURCHASES]->(f) RETURN *;
    """,
    database_="dev"
)

df = Neo4jParser.parse(result, True, False)
df

Started streaming 1 records after 1 ms and completed after 2 ms.

Query executed against database: 'dev':  
    CREATE (p:Person {name: "Sally"}), (f:Food {item: "Pickles"}), (p)-[r:PURCHASES]->(f) RETURN *;
    


{'r': [{'startNode': {'elementId': '4:ff73e06d-56ad-4959-b409-fcc3d9dce978:30',
    'labels': frozenset({'Person'}),
    'properties': {'name': 'Sally'}},
   'elementId': '5:ff73e06d-56ad-4959-b409-fcc3d9dce978:1152921504606847006',
   'type': 'PURCHASES',
   'properties': {},
   'endNode': {'elementId': '4:ff73e06d-56ad-4959-b409-fcc3d9dce978:31',
    'labels': frozenset({'Food'}),
    'properties': {'item': 'Pickles'}}}],
 'p': [{'elementId': '4:ff73e06d-56ad-4959-b409-fcc3d9dce978:30',
   'labels': frozenset({'Person'}),
   'properties': {'name': 'Sally'}}],
 'f': [{'elementId': '4:ff73e06d-56ad-4959-b409-fcc3d9dce978:31',
   'labels': frozenset({'Food'}),
   'properties': {'item': 'Pickles'}}]}

In [9]:
# Add a date property to our relationship for event or transactional data
purchase_date = Date(year=2025, month=1, day=31)

result = driver.execute_query(
    """ 
    CREATE (p:Person {name: "Sally"}), (f:Food {item: "Pickles"}), (p)-[r:PURCHASES {purchased_on: $purchase_date}]->(f) RETURN *;
    """,
    database_="dev",
    purchase_date=purchase_date
)

df = Neo4jParser.parse(result, True)

Started streaming 1 records after 0 ms and completed after 0 ms.

Query executed against database: 'dev':  
    CREATE (p:Person {name: "Sally"}), (f:Food {item: "Pickles"}), (p)-[r:PURCHASES {purchased_on: $purchase_date}]->(f) RETURN *;
    


**NOTE:**

Notice the additional argument in the `driver.execute_query` function. Neo4j lets you parametize queries for a number of reasons. In this case, we are defining a variable using a Neo4j datatype so we know it will insert into the graph the way we expect. Parametize queries by prefixing the variable with "$". Check out the datatypes for the Neo4j python driver [here](https://neo4j.com/docs/python-manual/current/data-types/)!

In [10]:
# Let's create a bi-directional relationship. Let's say Sally has a friend, Sarah
driver.execute_query(
    """ 
    CREATE (p:Person {name: "Sarah"});
    """,
    database_="dev"
)

result = driver.execute_query(
    """ 
    MATCH (p:Person {name: "Sally"}), (f:Person {name: "Sarah"})
    WITH p, f
    CREATE (p)-[r:HAS_FRIEND]->(f)-[rr:HAS_FRIEND]->(p)
    RETURN *;
    """,
    database_="dev"
)

df = Neo4jParser.parse(result, True, True)
df.head()

Started streaming 2 records after 1 ms and completed after 2 ms.

Query executed against database: 'dev':  
    MATCH (p:Person {name: "Sally"}), (f:Person {name: "Sarah"})
    WITH p, f
    CREATE (p)-[r:HAS_FRIEND]->(f)-[rr:HAS_FRIEND]->(p)
    RETURN *;
    


Unnamed: 0,r,p,rr,f
0,"{'startNode': {'elementId': '4:ff73e06d-56ad-4959-b409-fcc3d9dce978:30', 'labels': ('Person'), '...","{'elementId': '4:ff73e06d-56ad-4959-b409-fcc3d9dce978:30', 'labels': ('Person'), 'properties': {...","{'startNode': {'elementId': '4:ff73e06d-56ad-4959-b409-fcc3d9dce978:34', 'labels': ('Person'), '...","{'elementId': '4:ff73e06d-56ad-4959-b409-fcc3d9dce978:34', 'labels': ('Person'), 'properties': {..."
1,"{'startNode': {'elementId': '4:ff73e06d-56ad-4959-b409-fcc3d9dce978:32', 'labels': ('Person'), '...","{'elementId': '4:ff73e06d-56ad-4959-b409-fcc3d9dce978:32', 'labels': ('Person'), 'properties': {...","{'startNode': {'elementId': '4:ff73e06d-56ad-4959-b409-fcc3d9dce978:34', 'labels': ('Person'), '...","{'elementId': '4:ff73e06d-56ad-4959-b409-fcc3d9dce978:34', 'labels': ('Person'), 'properties': {..."


**NOTE:**

You can also use pathes to create a complex pattern.

In [11]:
# Use a path to create a pattern where Mike follows Jackie who is followed by Tim
result = driver.execute_query(
    """ 
    CREATE p=(mike:Person {name:"Mike"})-[:FOLLOWS]->(jackie:Person {name:"Jackie"})<-[:FOLLOWS]-(tim:Person {name:"Tim"})
    RETURN p;
    """,
    database_="dev"
)

df = Neo4jParser.parse(result, True)

Started streaming 1 records after 0 ms and completed after 0 ms.

Query executed against database: 'dev':  
    CREATE p=(mike:Person {name:"Mike"})-[:FOLLOWS]->(jackie:Person {name:"Jackie"})<-[:FOLLOWS]-(tim:Person {name:"Tim"})
    RETURN p;
    


**NOTE:**

Cypher keywords *ARE NOT* case sensitive. Let's try it below.

In [12]:
# Create a long series of node and relationships for a family tree
result = driver.execute_query(
    """ 
    create 
    (D:Person{name:'Dan'}),
    (K:Person{name:'Kate'}),
    (M:Person{name:'Mike'}),
    (L:Person{name:'Luke'}),
    (S:Person{name:'Steve'}),
    (F:Person{name:'Favour'}),
    (faith:Person{name:'Faith'}),
    (J:Person{name:'Jane'}),
    (D)-[:MARRIED_TO]->(K)-[:MARRIED]->(D),
    (D)-[:PARENT_OF]->(M)<-[:PARENT_OF]-(K),
    (D)-[:PARENT_OF]->(L)<-[:PARENT_OF]-(K),
    (D)-[:PARENT_OF]->(S)<-[:PARENT_OF]-(K),
    (F)-[:MARRIED_TO]->(S)-[:MARRIED]->(F),
    (F)-[:PARENT_OF]->(faith)<-[:PARENT_OF]-(S),
    (F)-[:PARENT_OF]->(J)<-[:PARENT_OF]-(S)
    return *
    """,
    database_="dev"
)

df = Neo4jParser.parse(result, True)

Started streaming 1 records after 1 ms and completed after 2 ms.

Query executed against database: 'dev':  
    create 
    (D:Person{name:'Dan'}),
    (K:Person{name:'Kate'}),
    (M:Person{name:'Mike'}),
    (L:Person{name:'Luke'}),
    (S:Person{name:'Steve'}),
    (F:Person{name:'Favour'}),
    (faith:Person{name:'Faith'}),
    (J:Person{name:'Jane'}),
    (D)-[:MARRIED_TO]->(K)-[:MARRIED]->(D),
    (D)-[:PARENT_OF]->(M)<-[:PARENT_OF]-(K),
    (D)-[:PARENT_OF]->(L)<-[:PARENT_OF]-(K),
    (D)-[:PARENT_OF]->(S)<-[:PARENT_OF]-(K),
    (F)-[:MARRIED_TO]->(S)-[:MARRIED]->(F),
    (F)-[:PARENT_OF]->(faith)<-[:PARENT_OF]-(S),
    (F)-[:PARENT_OF]->(J)<-[:PARENT_OF]-(S)
    return *
    
