# Graph Storage of Venues
Now that we have extracted the keywords from our venue list, it is time to store the Venue's and keywords in a graph database. To do this we will need to parse the extracted keyword-venue JSON objects to create Cypher Statements for writing the entities and relationships to our Neo4J database.

This notebook will walk us through a few principle steps:
1. Creating Cyphers from the extracted JSON data
2. Using Cyphers to write to Neo4J
3. Basic querying and exploration of the Graph



In [25]:
import os
from dotenv import load_dotenv

load_dotenv()


True

In [26]:
# Function for extracting the data from the JSON file into the Cypher Text
import json
import re

from tqdm import tqdm

def format_string_value(val: str, split_delim: str) -> str:
    words = val.split(split_delim)
    title_case = "".join([word.title() for word in words])

    # Surround any numbers with backticks
    num_pattern = r"\d+"
    title_case = re.sub(num_pattern, lambda match: f"`{match.group()}`", title_case)
    
    # Surrond any special characters with backticks
    special_chars_pattern = r"[\'\(\),\.\:\;\!\/\&\-\ ]"
    title_case = re.sub(special_chars_pattern, lambda match: f"`{match.group()}`", title_case)

    return title_case

with open("../data/cypher_entities.json", 'r') as location_data:
    locations = json.load(location_data)

Now we have the `venue_data` list, which is a list of dictonaries with the format:
```python
{
    'venue': {
        'id': 'LslssTS75mcFf-6pttxKBQ',
        'name': 'Panetino Bakery',
        'city': 'NYC',
        'rating': 5.0
    },
    'keywords': ['Coffee Culture', 'Baking Passion', 'Gourmet']
}
```

The strings in the keyword list are the labels of the keyword nodes. The label of the venue node is stored in the `label` field, and the Venue's properties are stored in the `properties` field.

In [27]:
from typing import Dict, List, Union

def make_safe(val: Union[str| float]) -> Union[str | float]:
    """Make a string with ' characters safe for Cypher"""
    if type(val) == str:
        return val.replace("'", "\\'")
    return val
    
def generate_cypher(venue: Dict[str, Union[str, float, List[str]]]) -> str:
    e_statements = []
    r_statements = []

    venue_properties = ", ".join([f"{key}: '{make_safe(value)}'" for key, value in venue['venue'].items()]) if 'venue' in venue else ""
    venue_cypher = f"MERGE (v:Venue {{ {venue_properties} }})"
    e_statements.append(venue_cypher)

    for i, keyword_data in enumerate(venue['keywords']):
        keyword, weight = keyword_data
        keyword_cypher = f"MERGE (k{i+1}:Keyword {{ value: '{make_safe(keyword)}' }})"
        e_statements.append(keyword_cypher)
        r_statements.append(f"MERGE (v)-[r{i+1}:HAS_KEYWORD]->(k{i+1}) SET r{i+1}.weight = {weight}") 
    
    return e_statements, r_statements



In [28]:

cypher_statements = []

# Conver the cypher entity data into cypher statements
for venue in locations:
    e_statements, r_statements = generate_cypher(venue)
    cypher_statemnt = "\n".join(e_statements + r_statements)
    cypher_statements.append(cypher_statemnt)

len(cypher_statements)

2813

In [29]:

with open("../data/cypher_statements.json", "w") as f:
    json.dump(cypher_statements, f, indent=4)

In [31]:
import os
from neo4j import GraphDatabase

DB_USER = os.getenv("NEO4J_DATABASE_USERNAME")
DB_URL = os.getenv("NEO4J_DATABASE_URL")
DB_PASSWORD = os.getenv("NEO4J_DATABASE_PASSWORD")

driver = GraphDatabase.driver(DB_URL, auth=(DB_USER, DB_PASSWORD))
def execute_query(driver, query):
    with driver.session() as session:
        return session.run(query)

for statement in cypher_statements:
    execute_query(driver, statement)

driver.close()
