# Recommendations

In this section we'll learn how to make listing recommendations using Neo4j. As usual, let's first import some libraries:

In [1]:
from py2neo import Graph
import pandas as pd

pd.set_option('display.max_colwidth', -1)

In [2]:
graph = Graph("bolt://localhost", auth=("neo4j", "neo"))

Sicne we're going to make listing suggestions so let's find some users who have written the most reviews so that we have some data to work with.

In [3]:
popular_users_query = """
MATCH (u:User)
RETURN u.id AS id, u.name AS user, size((u)-[:WROTE]->()) AS reviews
ORDER BY reviews DESC
LIMIT 10
"""

graph.run(popular_users_query).to_data_frame()

Unnamed: 0,id,reviews,user
0,15355355,59,Karen
1,141581986,58,Salvador
2,111293458,54,Elizabeth
3,39274139,54,Van
4,3973614,53,Christian
5,86126627,52,James
6,17387960,51,Obawole
7,563572,51,Daniel
8,197711,50,J. B.
9,16609485,45,Cliff


We can probably pick anyone from this list but 'Salvador' happens to have written reviews of places that have also been reviewed by other people so we'll use him for our example.

The following query finds the listings that Salvador has reviewed the most:

In [36]:
user_query = """
MATCH (u:User {id: $userId})-[:WROTE]->(review)-[:REVIEWS]->(listing:Listing)-[:IN_NEIGHBORHOOD]->(nh)
RETURN listing.id, listing.name, listing.propertyType, nh.name, count(*) AS times
ORDER BY times DESC
"""

user_id = "141581986"

graph.run(user_query, {"userId": user_id}).to_data_frame()

Unnamed: 0,listing.id,listing.name,listing.propertyType,nh.name,times
0,14133414,Space to rest near LaGuardia Airport,House,Jackson Heights,22
1,21134697,"Pilots, FA only, 15 min. walk LGA. 3 locations",House,Jackson Heights,9
2,17665781,Female Pilots & Female FA only 15 min walk to LGA.,House,Jackson Heights,7
3,21248963,Female Pilots & Female FA only 15 min walk to LGA.,House,Jackson Heights,7
4,17754072,Bed in Family Home Near LGA Airport,Townhouse,Jackson Heights,4
5,17222454,Sun Room Family Home LGA Airport NO CLEANING FEE,Townhouse,Jackson Heights,3
6,11912865,CrashPadsUSA for Airline Crew. Nightly HOTBEDS,House,Richmond Hill,2
7,16276632,Cozy Room Family Home LGA Airport NO CLEANING FEE,Townhouse,Jackson Heights,2
8,21461663,"SMALL CLEAN ROOM, Easy to Time Square, Airport",House,Rego Park,1
9,16553353,❤️❤️❤️ COZY Place by the Park for ONE ❤️❤️❤️,House,Middle Village,1


It looks like Salvador stays in places near to La Guardia Airport based on the names of the listings. 

The following query finds users who have reviewed the same places as Salvador:

In [None]:
similar_users_query = """
MATCH (u:User {id: $userId})-[:WROTE]->()-[:REVIEWS]->(listing:Listing),
      (other)-[:WROTE]->()-[:REVIEWS]->(listing)
WHERE u <> other      
WITH other, count(distinct listing) AS commonListings      
RETURN other.id, other.name, commonListings
ORDER BY commonListings DESC
LIMIT 10
"""

user_id = "141581986"

graph.run(similar_users_query, {"userId": user_id}).to_data_frame()

We'll return to these users in the next section.

## Collaborative Filtering

[Collaborative filtering](https://towardsdatascience.com/various-implementations-of-collaborative-filtering-100385c6dfe0) is based on the assumption that people like things similar to other things they like, and things that are liked by other people with similar taste.

<img src="https://cdn-images-1.medium.com/max/1600/1*6_NlX6CJYhtxzRM-t6ywkQ.png" width="500px" />

We can use a simple variant of this approach to find listings that were reviewed by people who stayed in other places that Salvador reviewed:

In [6]:
collabarative_filtering_query = """
MATCH (u:User {id: $userId})-[:WROTE]->()-[:REVIEWS]->(listing:Listing),
      (other)-[:WROTE]->()-[:REVIEWS]->(listing)
WHERE u <> other      
WITH u, other, count(distinct listing) AS commonListings
ORDER BY commonListings DESC
LIMIT 10
MATCH (other)-[:WROTE]->(review)-[:REVIEWS]->(listing)
WHERE not((u)-[:WROTE]->()-[:REVIEWS]->(listing))
RETURN listing, [user in collect(DISTINCT other) | user.name] AS users
ORDER BY size(users) DESC
LIMIT 10
"""

user_id = "141581986"
graph.run(collabarative_filtering_query, {"userId": user_id}).to_data_frame()

Unnamed: 0,listing,users
0,"{'bedrooms': 1, 'availability365': 343, 'price': 48.0, 'propertyType': 'Townhouse', 'accommodates': 1, 'name': 'Cute Tiny Room Family Home by LGA NO CLEANING FEE', 'id': '18173787', 'bathrooms': 2}","[Renee, Mary, Hikaru, Donald]"
1,"{'bedrooms': 1, 'availability365': 340, 'price': 50.0, 'propertyType': 'Townhouse', 'accommodates': 2, 'name': 'Comfy Room Family Home LGA Airport NO CLEANING FEE', 'id': '5115372', 'bathrooms': 2}","[Luke, Renee, Mary, Sharon]"
2,"{'bedrooms': 1, 'availability365': 164, 'price': 45.0, 'propertyType': 'House', 'accommodates': 2, 'name': 'Walking distance to LaGuardia pvt room', 'cleaningFee': 10.0, 'id': '11618854', 'bathrooms': 1}","[Ellie, Dawn, Hikaru]"
3,"{'bedrooms': 1, 'availability365': 58, 'price': 40.0, 'weeklyPrice': 250.0, 'propertyType': 'House', 'accommodates': 1, 'name': 'JFK 10 & LGA 15 MINUTES A/C PRIVATE BEDROOM', 'id': '7670562', 'bathrooms': 1}","[Luke, Renee, Hikaru]"
4,"{'bedrooms': 1, 'availability365': 134, 'price': 33.0, 'accommodates': 2, 'propertyType': 'House', 'name': 'Private cozy room near LGA airport', 'cleaningFee': 10.0, 'id': '16324410', 'bathrooms': 1}","[Mary, Dawn]"
5,"{'bedrooms': 1, 'availability365': 168, 'weeklyPrice': 240.0, 'price': 45.0, 'propertyType': 'House', 'accommodates': 3, 'name': 'Only Steps away from LaGuardia arpt', 'cleaningFee': 10.0, 'id': '10186192', 'bathrooms': 1}","[Katie, Ellie]"
6,"{'bedrooms': 1, 'availability365': 132, 'price': 50.0, 'accommodates': 2, 'propertyType': 'House', 'name': 'Close by La Guardia airport', 'cleaningFee': 10.0, 'id': '18855980', 'bathrooms': 1}","[Renee, Ellie]"
7,"{'bedrooms': 1, 'availability365': 125, 'price': 45.0, 'propertyType': 'House', 'accommodates': 2, 'name': 'Private room near LGA Airport with queen bed', 'cleaningFee': 10.0, 'id': '16325899', 'bathrooms': 1}","[Mary, Donald]"
8,"{'bedrooms': 1, 'availability365': 365, 'price': 59.0, 'accommodates': 3, 'propertyType': 'House', 'name': 'PRIVATE BED ROOM 12 MINS FROM JFK', 'cleaningFee': 0.0, 'id': '15328242', 'bathrooms': 1}",[Luke]
9,"{'bedrooms': 1, 'availability365': 145, 'price': 40.0, 'propertyType': 'House', 'accommodates': 3, 'name': 'Spacious private room near LGA airport', 'cleaningFee': 10.0, 'id': '16475570', 'bathrooms': 1}",[Mary]


The previous query considered users to be similar to each other if they've written reviews on the same listings, but we could do something more sophisticated. 

We want to work out similar users for each user using a similarity measure (e.g. Jaccard, Cosine, Pearson). These algorithms have a complexity of O(n^2) so let's check how many users we have before we do this:

In [7]:
user_query = """
MATCH (u:User)
RETURN count(*)
"""

graph.run(user_query).to_data_frame()

Unnamed: 0,count(*)
0,877779


We have more than 800,000 users which would result in more than 77 trillion comparisons to work out similarity scores. 

In [8]:
clustering_query = """
CALL algo.labelPropagation(
  "MATCH (u:User) WITH u SKIP {skip} LIMIT {limit} RETURN id(u) AS id",
  "MATCH (u1:User) WITH u1 SKIP {skip} LIMIT {limit} MATCH (u1:User)-[:WROTE]->()-[:REVIEWS]->()<-[:REVIEWS]-()<-[:WROTE]-(u2)
   return id(u1) AS source, id(u2) AS target, count(*) AS weight", "BOTH",
  {graph: "cypher", batchSize: 100}
)
"""

Now we'll iterate through each of the partitions and calculate the similarity of users in each partition:

In [7]:
cluster_query = """
MATCH (u:User)
WHERE exists(u.partition)
RETURN u.partition AS partition, count(*) AS count
ORDER BY count DESC
LIMIT 10
"""

clusters = graph.run(cluster_query).to_table()
clusters

partition,count
89838,17651
68512,16787
159338,12001
44413,10433
143153,9571
97898,6954
54910,6836
71702,5320
126025,5091
139794,4643


In [69]:
user_query = """
MATCH (u:User {id: $userId})
WITH u.partition AS partition, id(u) AS userId
MATCH (u:User {partition: partition})
RETURN partition, count(*), userId
"""

result = graph.run(user_query, {"userId": user_id}).to_table()
partition, _, user_node_id = result[0]
result

partition,count(*),userId
329748,2539,1264506


In [70]:
similarity_query = """
MATCH (u:User {partition: $cluster})
MATCH (u)-[:WROTE]->()-[:REVIEWS]->(l)
WITH {source:id(u), targets: collect(distinct id(l))} as userData
WITH collect(userData) as data
CALL algo.similarity.jaccard.stream(data, {similarityCutoff: 0.0})
YIELD source1, source2, count1, count2, intersection, similarity
WHERE source1 = $userNodeId or source2 = $userNodeId
RETURN source1, source2, count1, count2, intersection, similarity
ORDER BY similarity DESC
LIMIT 10
"""

result = graph.run(similarity_query, {"cluster": partition, "userNodeId": user_node_id}).to_table()
result

source1,source2,count1,count2,intersection,similarity
1264506,1585260,10,4,3,0.2727272727272727
1247875,1264506,5,10,3,0.25
1188415,1264506,6,10,3,0.2307692307692307
1247895,1264506,3,10,2,0.1818181818181818
1247711,1264506,3,10,2,0.1818181818181818
1188391,1264506,4,10,2,0.1666666666666666
1264506,1406555,10,4,2,0.1666666666666666
1188381,1264506,4,10,2,0.1666666666666666
1188184,1264506,5,10,2,0.1538461538461538
1264506,1390769,10,5,2,0.1538461538461538


In [71]:
similarity_query = """
MATCH (u:User {partition: $cluster})
MATCH (u)-[:WROTE]->()-[:REVIEWS]->(l)
WITH {source:id(u), targets: collect(distinct id(l))} as userData
WITH collect(userData) as data
CALL algo.similarity.jaccard(data, {similarityCutoff: 0.2, write: true})
YIELD nodes, min, max, mean, stdDev, p25, p50, p75, p90, p95, p99, p999, p100, similarityPairs 
RETURN nodes, min, max, mean, stdDev, p25, p50, p75, p90, p95, p99, p999, p100, similarityPairs
"""

result = graph.run(similarity_query, {"cluster": partition}).to_table()
result

nodes,min,max,mean,stdDev,p25,p50,p75,p90,p95,p99,p999,p100,similarityPairs
2539,0.1999998092651367,1.0000066757202148,0.7702628198275977,0.2959236800044217,0.5000028610229492,1.0000066757202148,1.0000066757202148,1.0000066757202148,1.0000066757202148,1.0000066757202148,1.0000066757202148,1.0000066757202148,221999


Now we can make some suggestions to Salvador based on the similar people that we've found:

In [78]:
recommendations_query = """
MATCH (u:User {id: $userId})-[:SIMILAR]-(other),
      (other)-[:WROTE]->(review)-[:REVIEWS]->(listing)
WHERE not((u)-[:WROTE]->()-[:REVIEWS]->(listing))
RETURN listing.id, listing.name, listing.propertyType, count(*), collect(DISTINCT other.name) AS people
"""

graph.run(recommendations_query, {"userId": user_id}).to_table()

listing.id,listing.name,listing.propertyType,count(*),people
11618854,Walking distance to LaGuardia pvt room,House,8,"['Dawn', 'Ellie']"
10186192,Only Steps away from LaGuardia arpt,House,4,['Ellie']
16324410,Private cozy room near LGA airport,House,1,['Dawn']
16601841,Great for La Guardia airport guests.,House,1,['Scott']
18855980,Close by La Guardia airport,House,1,['Ellie']
