# Twitter Graph

En este ejercicio vamos a modelar la red social de Twitter en Neo4j.

* Un usuario genera tweets, por lo tanto es su autor.
* Los tweets contienen un texto y este texto puede tener hashtags.
* Un usuario puede mencionar a otro usuario en un tweet.
* Un usuario puede retweetear un tweet de otro usuario en un nuevo tweet.

El grafo que quermos generar es el siguiente:

![png](../images/neo4j/twitter1.png)

Antes de empezar con el ejercico, vamos a importar las librerías necesarias para trabajar sobre Neo4j

In [4]:
%load_ext cypher

The cypher extension is already loaded. To reload it, use:
  %reload_ext cypher


Como hacemos siempre, borramos todos los nodos y relaciones que existen en la base de datos para partir de un entorno limpio.

In [6]:
%%cypher  http://neo4j:1234@127.0.0.1:7474/db/data
MATCH (n)
OPTIONAL MATCH (n)-[r]-()
DELETE n,r

17 nodes deleted.
21 relationship deleted.


Antes de empezar a insertar nodos y relaciones, queremos crear una serie de ídices y constraints

### Ejercicio1: La propiedad id de los nodos etiquetados como Tweet debe ser único:

In [8]:
%%cypher http://neo4j:1234@127.0.0.1:7474/db/data
CREATE CONSTRAINT ON (t:Tweet) ASSERT t.id IS UNIQUE

1 constraints added.


### Ejercicio2: La propiedad username de los nodos etiquetados como User debe ser único.

In [9]:
%%cypher http://neo4j:1234@127.0.0.1:7474/db/data
CREATE CONSTRAINT ON (u:User) ASSERT u.username IS UNIQUE

1 constraints added.


### Ejercicio3: La propiedad hashtag de los nodos etiquetados como HashTag debe ser único.

In [24]:
%%cypher http://neo4j:1234@127.0.0.1:7474/db/data
CREATE CONSTRAINT ON (h:HashTag) ASSERT h.hashtag IS UNIQUE

1 constraints added.


Antes de realizar las búsquedas, vamos a isnertar unos cuantos datos en el grafo con la estructura que hemos definido.

Para trabajar con neo4j desde python vamos a utilizar la librerúa py2neo. Puedes encontrar la documentación en su página web: https://py2neo.org/2.0/index.html

In [32]:
# Como instalar py2neo
#!pip install pprintpp
#!pip install py2neo

In [11]:
from pprintpp import pprint as pp


In [38]:
from py2neo import Graph, Relationship, Node
import json

# Crea una conexión a la base de datos. Le pasamos la URI en formato JDBC con usuario y contraseña.
graph = Graph("http://neo4j:1234@127.0.0.1:7474/db/data")

In [39]:
def add_property(obj, json, name):
    if name in json:
        obj[name] = json[name]

In [44]:
def parse_user(user_json):
    #
    
    #user = graph.create(Node("User", username = user_json['screen_name']))
    user = graph.merge("User", username = user_json['screen_name'])
    
    add_property(user, user_json, 'created_at')
    add_property(user, user_json, 'description')
    add_property(user, user_json, 'favourites_count')
    add_property(user, user_json, 'followers_count')
    add_property(user, user_json, 'friends_count')
    add_property(user, user_json, 'statuses_count')
    add_property(user, user_json, 'time_zone')
    add_property(user, user_json, 'name')
    add_property(user, user_json, 'profile_image_url')
    
    user.push()
    return user

In [45]:
def parse_tweet(tweet_json):
    user = parse_user(tweet_json['user'])
    
    tweet = graph.merge("Tweet", "id", tweet_json['id'])
    add_property(tweet, tweet_json, 'created_at')
    add_property(tweet, tweet_json, 'lang')
    add_property(tweet, tweet_json, 'retweet_count')
    add_property(tweet, tweet_json, 'source')
    add_property(tweet, tweet_json, 'text')
    
    tweet.push()
    
    user_tweeted_tweet = Relationship(user, "TWEETED", tweet)
    graph.create_unique(user_tweeted_tweet)
    
    if 'user_mentions' in tweet_json:
        for user_mention_json in tweet_json['user_mentions']:
            user_mencioned = parse_user(user_mention_json)
            tweet_mencioned_user = Relationship(tweet, "MENCIONED", user_mencioned)
            graph.create_unique(tweet_mencioned_user)
 
    if 'entities' in tweet_json:
        for entity in tweet_json['entities']:
            hashtag = graph.merge_one("HashTag", "hashtag", entity)
            tweet_HashTag_hashtag = Relationship(tweet, "HASHTAG", hashtag)
            graph.create_unique(tweet_HashTag_hashtag)

    if 'retweeted_status' in tweet_json:
        user_retweeted = parse_user(tweet_json['retweeted_status']['user'])
        tweet_retweetOf_user = Relationship(tweet, "RETWEET_OF", user_retweeted)
        graph.create_unique(tweet_retweetOf_user)
        
        parse_tweet(tweet_json['retweeted_status'])

In [46]:
def load_file(tweets_data_path):
    tweets_file = open(tweets_data_path, "r")
    for tweet in tweets_file:
        parse_tweet(json.loads(tweet))

In [47]:
load_file('../data/mongoDB/tweets.json')

TypeError: merge() got an unexpected keyword argument 'username'

In [None]:
%%cypher http://neo4j:1234@127.0.0.1:7474/db/data
MATCH p = ((u:User {username : 'couchbase'})-[r:TWEETED]->t)
RETURN u.username, t.text, type(r)
LIMIT 10

In [None]:
%%cypher http://neo4j:1234@127.0.0.1:7474/db/data
match (n:HashTag)-[r]-() 
return n.hashtag, count(r) as degree 
order by degree desc
limit 10

In [None]:
%matplotlib inline

In [None]:
results = %%cypher match (n:HashTag)-[r]-()  return n.hashtag as HashTag, count(r) as Degree order by Degree desc limit 10

In [None]:
results.get_dataframe()

In [None]:
results.pie()

In [None]:
results.plot()

In [None]:
results.bar()

In [None]:
results = %cypher match (n)-[r]-() return n, r limit 10
results.draw()

In [None]:
from py2neo import Graph
graph = Graph()
cypher = graph.cypher

In [None]:
query = """
    MATCH (h:HashTag)<-[:HASHTAG]-(:Tweet)-[:HASHTAG]->(HashTag {hashtag:"neo4j"}) 
    WHERE h.hashtag <> "neo4j"
    RETURN h.hashtag AS hashtag, COUNT(*) AS count
    ORDER BY count DESC
    LIMIT 10
"""

results = cypher.execute(query )
print results



In [None]:
results = cypher.execute(
"""
MATCH (u:User)
WHERE exists(u.followers_count)
return distinct u.username, u.followers_count
order by u.followers_count DESC LIMIT 10
""")

print results

In [None]:
type(results)

In [None]:
%%cypher http://neo4j:1234@127.0.0.1:7474/db/data
MATCH (u:User)
WHERE exists(u.followers_count)
return distinct u.username, u.followers_count
order by u.followers_count DESC LIMIT 10

In [None]:
%%cypher http://neo4j:1234@127.0.0.1:7474/db/data
MATCH n
return distinct labels(n)

In [None]:

result = %cypher MATCH (hashtag:HashTag)<-[:HASHTAG]-(tweet:Tweet) \
                 RETURN hashtag.name AS hashtag, COUNT(tweet) AS tweets \
                 ORDER BY tweets DESC LIMIT 5
        
df = result.get_dataframe()
df.head()