## Graph Based Recommendations With Neo4j, NLP, and Python Data Science Tools

In [1]:
!pip install py2neo
!pip install python-igraph
!pip install textblob

Collecting python-igraph
[?25l  Downloading https://files.pythonhosted.org/packages/0f/a0/4e7134f803737aa6eebb4e5250565ace0e2599659e22be7f7eba520ff017/python-igraph-0.7.1.post6.tar.gz (377kB)
[K    100% |████████████████████████████████| 378kB 4.1MB/s ta 0:00:01
[?25hBuilding wheels for collected packages: python-igraph
  Building wheel for python-igraph (setup.py) ... [?25ldone
[?25h  Stored in directory: /home/dhanendra/.cache/pip/wheels/41/d6/02/34eebae97e25f5b87d60f4c0687e00523e3f244fa41bc3f4a7
Successfully built python-igraph
Installing collected packages: python-igraph
Successfully installed python-igraph-0.7.1.post6
Collecting textblob
[?25l  Downloading https://files.pythonhosted.org/packages/60/f0/1d9bfcc8ee6b83472ec571406bd0dd51c0e6330ff1a51b2d29861d389e85/textblob-0.15.3-py2.py3-none-any.whl (636kB)
[K    100% |████████████████████████████████| 645kB 33kB/s ta 0:00:01
Installing collected packages: textblob
Successfully installed textblob-0.15.3


## Import data

We'll use [py2neo](http://py2neo.org/v3/), one of the Python drivers for Neo4j.

To connect to our Neo4j server we'll need to make note of the host, post, username and password.

If we're using [Neo4j Sandbox](http://neo4j.com/sandbox) we can grab these details form the "Details" tab of our running sandbox:

![](img/sandboxcredentials.png)

In [1]:
# Import py2neo and connect to Neo4j
from py2neo import Graph

# just an example, replace with credentials for your own Neo4j instance
graph = Graph(bolt=False, host="54.164.111.140", http_port=32894, user='neo4j', password='subprogram-sidewalk-flame')

In [None]:
# Hello world, sanity check
graph.run("MATCH (a) RETURN COUNT(a) AS numberOfNodes").evaluate()

### Import Groups and Topics
![](http://guides.neo4j.com/bostonmeetup/img/group_has_topic.png)

In [None]:
graph.run("CREATE CONSTRAINT ON (g:Group) ASSERT g.id IS UNIQUE;")

In [None]:
graph.run("CREATE CONSTRAINT ON (t:Topic) ASSERT t.id IS UNIQUE;")

In [None]:
graph.run("CREATE INDEX ON :Group(name)")

In [None]:
graph.run("CREATE INDEX ON :Topic(name)")

In [None]:
graph.run('''
LOAD CSV WITH HEADERS
FROM "https://raw.githubusercontent.com/johnymontana/harvard-bar/master/data/groups.csv"
AS row
MERGE (group:Group { id:row.id })
ON CREATE SET
  group.name = row.name,
  group.urlname = row.urlname,
  group.rating = toInt(row.rating),
  group.created = toInt(row.created)
''')

In [None]:
graph.run('''
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/johnymontana/harvard-bar/master/data/groups_topics.csv"  AS row
MERGE (topic:Topic {id: row.id})
ON CREATE SET topic.name = row.name, topic.urlkey = row.urlkey
''')

In [None]:
graph.run('''
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/johnymontana/harvard-bar/master/data/groups_topics.csv"  AS row
MATCH (topic:Topic {id: row.id})
MATCH (group:Group {id: row.groupId})
MERGE (group)-[:HAS_TOPIC]->(topic)
''')

### Find similar groups to Graph Database - Austin
By looking at topics, can we find groups that have similar topics to Graph Database Austin?

In [None]:
result = graph.run('''
MATCH (group:Group)-[:HAS_TOPIC]->(topic)<-[:HAS_TOPIC]-(otherGroup)
WHERE group.name CONTAINS "Graph Database"
RETURN otherGroup.name, COUNT(topic) AS topicsInCommon,
       COLLECT(topic.name) AS topics
ORDER BY topicsInCommon DESC, otherGroup.name
LIMIT 10
''')

for row in result:
    print(row)

## Topic Similarity

Using community detection to find similar topics

![](https://camo.githubusercontent.com/0054c52996fdd9bb82456406cb867bdb3985d14c/687474703a2f2f7777772e6c796f6e776a2e636f6d2f7075626c69632f696d672f636f6d6d756e6974792d312e706e67)

In [None]:
from igraph import Graph as IGraph

Find all pairs of topics and find the number of common groups that share each pair of topics. We'll use this as weight to build a "virtual graph" of the form`(Topic)-[:OCCURS_WITH {weight}]-(Topic)`

In [None]:

query = """
MATCH (topic:Topic)<-[:HAS_TOPIC]-()-[:HAS_TOPIC]->(other:Topic)
WHERE ID(topic) < ID(other)
RETURN topic.name, other.name, COUNT(*) AS weight
ORDER BY weight DESC
LIMIT 10
"""

result = graph.run(query)
for row in result:
    print(row)


Now let's run this query again and build an igraph instance from the results:

In [None]:
query = """
MATCH (topic:Topic)<-[:HAS_TOPIC]-()-[:HAS_TOPIC]->(other:Topic)
WHERE ID(topic) < ID(other)
RETURN topic.name, other.name, COUNT(*) AS weight
"""

ig = IGraph.TupleList(graph.run(query), weights=True)
ig

Now we'll run the Walktrap community detection algorithm to find clusters / communities:

In [None]:
clusters = IGraph.community_walktrap(ig, weights="weight")
clusters = clusters.as_clustering()
len(clusters)

Let's inspect the results:

In [None]:

nodes = [node["name"] for node in ig.vs]
nodes = [{"id": x, "label": x} for x in nodes]
nodes[:5]

for node in nodes:
    idx = ig.vs.find(name=node["id"]).index
    node["group"] = clusters.membership[idx]
    
nodes[:5]

Now we'll write the results back to Neo4j, extended our graph model:
![](http://guides.neo4j.com/bostonmeetup/img/cluster_datamodel.png)

In [None]:
query = """
UNWIND {params} AS p 
MATCH (t:Topic {name: p.id}) 
MERGE (cluster:Cluster {name: p.group})
MERGE (t)-[:IN_CLUSTER]->(cluster)
"""

graph.run(query, params = nodes)

We can see which clusters the Python related topics end up being in:
![](http://guides.neo4j.com/bostonmeetup/img/python_cluster.png)

In [None]:
graph.run('''
MATCH (cluster:Cluster)<-[inCluster:IN_CLUSTER]-(topic)
WHERE topic.name CONTAINS "Python"
RETURN *
''')

## My Similar Groups

We need to add Member data in order to build more relevant
recommendations:
![](http://guides.neo4j.com/bostonmeetup/img/group_has_topic_member_of.png)

In [None]:
graph.run('''
CREATE CONSTRAINT ON (m:Member)
ASSERT m.id IS UNIQUE''')

In [None]:
graph.run('''
USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS
FROM "https://raw.githubusercontent.com/johnymontana/harvard-bar/master/data/members.csv" AS row
WITH DISTINCT row.id AS id, row.name AS name
MERGE (member:Member {id: id})
ON CREATE SET member.name = name
''')

In [None]:
graph.run('''
USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS
FROM "https://raw.githubusercontent.com/johnymontana/harvard-bar/master/data/members.csv" AS row
WITH row WHERE NOT row.joined is null
MATCH (member:Member {id: row.id})
MATCH (group:Group {id: row.groupId})
MERGE (member)-[membership:MEMBER_OF]->(group)
ON CREATE SET membership.joined=toInt(row.joined);
''')

In [None]:
graph.run('''
MATCH (member:Member)-[membership:MEMBER_OF]->(group)
RETURN member, group, membership
LIMIT 10
''')

![](http://guides.neo4j.com/bostonmeetup/img/group_members.png)

In [None]:
graph.run("CREATE INDEX ON :Member(name)")

### Find my similar groups

In [None]:
results = graph.run('''MATCH (member:Member {name: "Will Lyon"})-[:MEMBER_OF]->()-[:HAS_TOPIC]->()<-[:HAS_TOPIC]-(otherGroup:Group)
WHERE NOT (member)-[:MEMBER_OF]->(otherGroup)
RETURN otherGroup.name,
       COUNT(*) AS topicsInCommon
ORDER BY topicsInCommon DESC
LIMIT 10''')

for row in results:
    print(row)

## Events
![](http://guides.neo4j.com/bostonmeetup/img/event_datamodel.png)

In [None]:
graph.run("CREATE CONSTRAINT ON (e:Event) ASSERT e.id IS UNIQUE")

In [None]:
graph.run("CREATE INDEX ON :Event(time)")

In [None]:
graph.run('''USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/johnymontana/harvard-bar/master/data/events.csv" AS row
MERGE (event:Event {id: row.id})
ON CREATE SET event.name = row.name,
              event.description = row.description,
              event.time = toInt(row.time),
              event.utcOffset = toInt(row.utc_offset)
''')
              

In [None]:
graph.run('''
USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/johnymontana/harvard-bar/master/data/events.csv" AS row

WITH distinct row.group_id as groupId, row.id as eventId
MATCH (group:Group {id: groupId})
MATCH (event:Event {id: eventId})
MERGE (group)-[:HOSTED_EVENT]->(event)
''')

In [None]:
graph.run('''
MATCH (group:Group)-[hosted:HOSTED_EVENT]->(event)
WHERE group.name CONTAINS "Graph Database" AND event.time < timestamp()
RETURN event, group, hosted
ORDER BY event.time DESC
LIMIT 10
''')

### Adding Member RSVPs
TODO: add datamodel image

In [None]:
graph.run('''
USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:///rsvps.csv" AS row
WITH row WHERE row.response = "yes"

MATCH (member:Member {id: row.member_id})
MATCH (event:Event {id: row.event_id})
MERGE (member)-[rsvp:RSVPD {id: row.rsvp_id}]->(event)
ON CREATE SET rsvp.created = toint(row.created),
              rsvp.lastModified = toint(row.mtime),
              rsvp.guests = toint(row.guests)
''')

![](![](http://guides.neo4j.com/bostonmeetup/img/graph_database_events.png)

### Extracting keywords from event descriptions
Note that we have topics for groups, but not for Events. We can use some NLP techniques to extract keywords from event descriptions and extend our datamodel to take those keywords into account in our recommendation queries.

![](http://guides.neo4j.com/bostonmeetup/img/keyword_datamodel.png)

In [None]:
from textblob import TextBlob

In [None]:
# fetch one event
desc = graph.run("MATCH (e:Event) WHERE e.description IS NOT null WITH e, rand() AS r ORDER BY r RETURN e.description LIMIT 1").evaluate()
desc

In [None]:
# Helper function for stripping HTML
from HTMLParser import HTMLParser
class MLStripper(HTMLParser):
    def __init__(self):
        self.reset()
        self.fed = []
    def handle_data(self, d):
        self.fed.append(d)
    def get_data(self):
        return ''.join(self.fed)
    
def strip_tags(html):
    s = MLStripper()
    s.feed(html)
    return s.get_data()

In [None]:
# extract keywords with TextBlob
blob = TextBlob(strip_tags(desc))

# sentiment analysis
print("Sentiment: ")
print(blob.sentiment.polarity)
print()

# keyword extraction (using noun phrases)
print("Keywords: ")
print(blob.noun_phrases)

In [None]:
graph.run("CREATE CONSTRAINT ON (k:Keyword) ASSERT k.name IS UNIQUE")

In [None]:
def addKeywords(query):
    result = graph.run(query)
    for row in result:
        blob = TextBlob(strip_tags(row['desc']))
        kws = blob.noun_phrases
        if kws:
            
            p = {
                'kws': kws,
                'e_id': str(row['e_id'])
            }
            print(p)
            
            query = '''
            WITH {kws} AS kws
                MATCH (e:Event) WHERE e.id = {e_id}
                UNWIND kws AS kw
                MERGE (k:Keyword {name: kw})
                MERGE (e)-[:HAS_TAG]->(k)
            '''
            
            graph.run(query, parameters = p)


In [None]:
addKeywords('''
        MATCH (e:Event)<-[:HOSTED_EVENT]-(g:Group {name: "Graph Database - Austin"})
        RETURN e.id AS e_id, e.description AS desc
        ''')

In [None]:
    addKeywords('''
        MATCH (e:Event) WHERE e.description IS NOT NULL AND NOT exists((e)-[:HAS_TAG]->(:Keyword))
        WITH e, rand() AS r ORDER BY r SKIP 0 LIMIT 100 
        RETURN e.id AS e_id, e.description AS desc
        ''')