<div style="text-align: center; line-height: 0; padding-top: 2px;">
  <img src="https://www.quantiaconsulting.com/logos/quantia_logo_orizz.png" alt="Quantia Consulting" style="width: 600px; height: 250px">
</div>

# Neo4j

In this notebook you will see how to intreact with Neo4j using Python


In [None]:
from neo4j import GraphDatabase
import pandas as pd
import qcutils

In [None]:
NEO4J_HOST=qcutils.read_config_value(key="neo4j.host", cf_path="config/nosql-config.yaml")
NEO4J_PORT=qcutils.read_config_value(key="neo4j.port", cf_path="config/nosql-config.yaml")

NEO4J_URL = "bolt://{}:{}".format(NEO4J_HOST, NEO4J_PORT)
USER = qcutils.read_config_value(key="neo4j.username", cf_path="config/nosql-config.yaml")
PASSWORD = qcutils.read_config_value(key="neo4j.pwd", cf_path="config/nosql-config.yaml")

## Loading the driver

In [None]:
driver = GraphDatabase.driver(NEO4J_URL, auth=(USER, PASSWORD))
db = driver.session()

## Datamodel

In this exercise we will make use of this datamodel

<div style="text-align: center; line-height: 0; padding-top: 2px;">
  <img src="img/datamodel_neo4j.png" alt="datamodel" style="width: 600px; height: 250px">
</div>

## Read


#### Query by node property

Retrieve all the listings with 2 bedrooms.

You can use the **as** keyword in the RETURN statement to create aliases. The result will be a list of **Record**, you can access to their fields in the same way you access to Python dictionaries.

In [None]:
query = """MATCH (l:Listing {bedrooms:1}) 
RETURN l.name as name, l.listing_id as id, l.property_type as type
"""

results = db.run(query)

for r in results:
    print(r["name"])

To better visualize the results we can load them in a DataFrame. 

Notice we need to perform again the query because iterating over a Neo4J Result exhaust it.

In [None]:
query = """MATCH (l:Listing {bedrooms:2}) 
RETURN l.name as name, l.listing_id as id, l.property_type as type
"""

results = db.run(query)

df = pd.DataFrame([r.values() for r in results], columns=results.keys())
df.head()

#### Query using **comparison** operators

Retrieve all the listings having 2 bedrooms with price lower than 300$. Here we need to use the **WHERE** keyword.

In [None]:
query = """MATCH (l:Listing) 
WHERE l.bedrooms = 2 AND l.price <=300
RETURN l.name as name, l.listing_id as id, l.property_type as type, l.price as price
"""

results = db.run(query)

df = pd.DataFrame([r.values() for r in results], columns=results.keys())
df.head()

## Query by navigating the graph

Give a listing id, retrieve all the amenities it has 

In [None]:
query = """MATCH (l:Listing {listing_id: '8210932'}) -[:HAS]-> (r:Amenity) 
RETURN r.name as name
"""

results = db.run(query)

df = pd.DataFrame([r.values() for r in results], columns=results.keys())
df.head()

Retrieve the 10 most popular neigborhood - i.e., the neighborhoods that have the most listings

In [None]:
query = """MATCH (l:Listing)-[r:IN_NEIGHBORHOOD]->(n:Neighborhood) 
WITH n, count(l) as listing_number
ORDER BY listing_number DESC LIMIT 10 
RETURN n.neighborhood_id as id ,n.name as name ,listing_number 
"""

results = db.run(query)

df = pd.DataFrame([r.values() for r in results], columns=results.keys())
df

Repeat the query, but now consider the most popular neighborhood the ones having the most reivews

In [None]:
query = """MATCH (l:Listing)-[r_n:IN_NEIGHBORHOOD]->(n:Neighborhood) 
MATCH (r:Review) -[r_r:REVIEWS]-> (l:Listing)
WITH n, count(r) as review_number
ORDER BY review_number DESC LIMIT 10 
RETURN n.neighborhood_id,n.name,review_number
"""

results = db.run(query)

df = pd.DataFrame([r.values() for r in results], columns=results.keys())
df

Retrieve the 10 users who wrote most of the reveiws

In [None]:
query = """MATCH (u:User)-[:WROTE]->(r:Review)
WITH u, count(r) AS reviews_number
ORDER BY reviews_number DESC LIMIT 10
RETURN u.name as name , u.user_id as user_id, reviews_number"""

results = db.run(query)

df = pd.DataFrame([r.values() for r in results], columns=results.keys())
df

## Insert

Insert a new Listing node in the graph 

In [None]:
def create_listing(tx,node):
    query = """CREATE (l:Listing { 
    bedrooms: $bedrooms , 
    price: $price, 
    accomodates:$accomodates,
    property_type:$property_type,
    availability: $availability}
    ) 
    RETURN id(l) AS node_id"""
    
    result = tx.run(query,bedrooms=node["bedrooms"],
                price=node["price"],
                accomodates=node["accomodates"],
                property_type=node["property_type"],
                availability=node["availability"])

    record = result.single()
    return record["node_id"]

In [None]:
listing = {
    "bedrooms":1,
    "price":200,
    "accomodates":1,
    "name":"Simple and cozy apartement - Andrea",
    "property_type":"House",
    "availability":231
}

listing_id = db.write_transaction(create_listing,listing)

In [None]:
listing_id

Connect the Listing just inserted with the Host with id 377044  and Neighborhood with id 78739

In [None]:
def connect_listing(tx,l_id):
    query = """MATCH (h:Host {host_id:"377044"})
    MATCH (n:Neighborhood {neighborhood_id:"78739"})
    MATCH (l:Listing) WHERE id(l)=$id
    CREATE (l)-[rel_n:IN_NEIGHBORHOOD]->(n)
    CREATE (h)-[rel_h:HOSTS]->(l)
    RETURN id(rel_n),id(rel_h)"""
    
    result = tx.run(query,id=l_id)
    
    return result

In [None]:
db.write_transaction(connect_listing,listing_id)

In [None]:
query = """MATCH (h:Host) -[:HOSTS]-> (l:Listing{}) -[:IN_NEIGHBORHOOD]-> (n:Neighborhood)
WHERE id(l) = $l_id
return id(h),id(n)
"""

results = db.run(query,l_id=listing_id)

df = pd.DataFrame([r.values() for r in results], columns=results.keys())
df

We could have done it all together using a transaction

## Update

Updates in Neo4J works by first **matching** the nodes or relations, and then using the **SET** keyword to modify a field or creating a new one.

Change the type of the Listing you just inserted to Apartment

In [None]:
def change_type(tx,l_id):
    query = """MATCH (l:Listing) WHERE id(l)=$id
               SET l.property_type = "Apartement"
               RETURN l
            """
    result = tx.run(query,id=l_id)
    return result.single()

In [None]:
listing = db.write_transaction(change_type,listing_id)
df = pd.DataFrame([listing[0].values()], columns=listing[0].keys())
df

You can add a filed by using the **SET** command

In [None]:
def add_field(tx,l_id):
    query = """MATCH (l:Listing) WHERE id(l)=$id
               SET l.new_field = "value"
               RETURN l
            """
    result = tx.run(query,id=l_id)
    return result.single()

In [None]:
listing = db.write_transaction(add_field,listing_id)
df = pd.DataFrame([listing[0].values()], columns=listing[0].keys())
df

You can remove a field by either using the **REMOVE** command or setting it to null

In [None]:
def remove_field(tx,l_id):
    query = """MATCH (l:Listing) WHERE id(l)=$id
               SET l.new_field = null
               RETURN l
            """
    
    # Uncomment this to try it
    #query = """MATCH (l:Listing) WHERE id(l)=$id
    #           REMOVE l.new_field
    #           RETURN l
    #        """
    
    result = tx.run(query,id=l_id)
    return result.single()

In [None]:
listing = db.write_transaction(remove_field,listing_id)
df = pd.DataFrame([listing[0].values()], columns=listing[0].keys())
df

## Delete

The delete expects as input a **MATCH** to select the nodes to delete. If the node(s) have relations, you need also to use the **DETACH** keyword.

Delete the node you inserted

In [None]:
def remove_nodes(tx,l_id):
    query = """MATCH (l:Listing) WHERE id(l)=$id
               DETACH DELETE l
            """
    result = tx.run(query,id=l_id)
    return result

In [None]:
db.write_transaction(remove_nodes,listing_id)
df = pd.DataFrame([listing[0].values()], columns=listing[0].keys())
df

You can delete duplicates (both nodes and relations) by using the **MERGE** keyword 

The following cells create a duplicate node

In [None]:
def create_duplicate_listing(tx,node):
    query = """CREATE (l:Listing { 
    bedrooms: $bedrooms , 
    price: $price, 
    accomodates:$accomodates,
    property_type:$property_type,
    availability: $availability}
    ) 
    RETURN id(l) AS node_id"""
    
    result = tx.run(query,bedrooms=node["bedrooms"],
                price=node["price"],
                accomodates=node["accomodates"],
                property_type=node["property_type"],
                availability=node["availability"])

    record = result.single()
    return record["node_id"]

In [None]:
# Put as name something unique for you

listing = {
    "bedrooms":1,
    "price":200,
    "accomodates":1,
    "name":"Duplicate house",
    "property_type":"House",
    "availability":231
}

listing_id_1 = db.write_transaction(create_listing,listing)
listing_id_2 = db.write_transaction(create_listing,listing)

Now, we merge all the nodes having the name "Duplicate house"

In [None]:
def merge(tx):
    query = """MERGE (l:Listing {name:"Duplicate house"})
    RETURN l
    """
    
    result = tx.run(query)

    record = result.single()
    return record

In [None]:
results = db.write_transaction(merge)

df = pd.DataFrame([results[0].values()], columns=results[0].keys())
df

## Interesting Queries

Given a user - Christopher with user_id '26763569' - use his past review to select the listing that is most likely to like (i.e., they contain the same amenities)

In [None]:
query = """MATCH (u:User {user_id: "26763569"})-[:WROTE]->(r:Review)-[:REVIEWS]->(l:Listing)-[:HAS]->(a:Amenity)
WITH COLLECT(DISTINCT l) as reviewed
MATCH (a)<-[:HAS]-(rec:Listing)
WHERE NOT rec IN reviewed
RETURN rec.listing_id, rec.name, COUNT(DISTINCT a) AS score ORDER BY score DESC LIMIT 10
"""

results = db.run(query)

df = pd.DataFrame([r.values() for r in results], columns=results.keys())
df

Repeat the query, considering also the location of the listing as feature (i.e., limit the reccomendation only to listing located where Cristopher stayed before)

In [None]:
query = """MATCH (u:User {user_id: "26763569"})-[:WROTE]->(r:Review)-[:REVIEWS]->(l:Listing)-[:IN_NEIGHBORHOOD]->(n:Neighborhood)
WITH u, l, COLLECT(DISTINCT n) AS neighborhoods, COLLECT(DISTINCT l) as reviewed
MATCH (u:User {user_id: "26763569"})-[:WROTE]->(r:Review)-[:REVIEWS]->(l:Listing)-[:HAS]->(a:Amenity)
MATCH (rec)-[:IN_NEIGHBORHOOD]->(n:Neighborhood)
MATCH (a)<-[:HAS]-(rec:Listing)
WHERE NOT rec IN reviewed
WITH rec, n, neighborhoods, COUNT(DISTINCT a) AS score WHERE n IN neighborhoods
RETURN rec.listing_id, rec.name, score ORDER BY score DESC LIMIT 10
"""

results = db.run(query)

df = pd.DataFrame([r.values() for r in results], columns=results.keys())
df

##### ![Quantia Tiny Logo](https://www.quantiaconsulting.com/logos/quantia_logo_tiny.png) 2020 Quantia Consulting, srl. All rights reserved.