## This notebook contains queries for the previously created graph data model

### Problem 1: Find the top five most used sources (app or site) to post or share a tweet. For each source return the source name and the number of tweets sent via that source.
MATCH (s:Source)<-[:USING]-(t:Tweet)  
RETURN s.sourceName, count(s.sourceName) AS amount  
ORDER BY amount DESC LIMIT 5  

In [2]:

import pandas as pd
p1 = pd.read_csv("Problem1.csv")
p1

Unnamed: 0,s.sourceName,amount
0,Twitter for iPhone,3155
1,Twitter Web Client,1998
2,Twitter for Android,1414
3,TweetDeck,557
4,Twitter for iPad,379


### Problem 2: Find the top five most used hashtags across all tweets on each day between 26th and 31st March 2016 (inclusive). For each day, return the date and a list of the five top hashtags in order of popularity on that day.
CALL {  
        MATCH (rt:Retweet)-[:TAGS]->(h:Hashtag)  
        WHERE datetime('2016-03-26T00:00:00Z') <= rt.postedTime < datetime('2016-04-01T00:00:00Z')  
        RETURN date(rt.postedTime) AS day,  
        h.hashtag AS tags,  
        count(rt) AS tweets  
        ORDER BY tweets DESC  
}  
RETURN day, collect(tags)[0..5] AS topHashtags  
ORDER BY day

In [3]:
p2 = pd.read_csv("Problem2.csv")
p2

Unnamed: 0,day,topHashtags
0,"""2016-03-26""","[auspol,timemanagement,socialmedia,Brands,Thre..."
1,"""2016-03-27""","[Leo,WHM,STEMfem,BrusselsAttacks,isis]"
2,"""2016-03-28""","[fighthunger,lfc,VibeWithUs,AgentsofSHIELD,CTU..."
3,"""2016-03-29""","[WomenOfCourage,ISIL,GreatBarrierReef,HighLife..."
4,"""2016-03-30""","[5SOSFam,BestFanArmy,iHeartAwards,auspol,Clean..."
5,"""2016-03-31""","[BestFanArmy,iHeartAwards,5SOSFam,auspol,ALDUB..."


### Problem 3: Find all users that use any of the same hashtags as user "m_mrezamm". This query must exclude any retweets since these posts would automatically contain common tags. The query must return the user name and the number of hashtags that were used in their tweets that are also used by "m_mrezamm". Order results by the number of hashtags used in common.
MATCH (u1:User {twitterName: 'm_mrezamm'})-[:POSTS]->(t:Tweet)-[:TAGS]->(h:Hashtag)  
MATCH (h)<-[:TAGS]-(t1:Tweet)<-[:POSTS]-(other:User)  
WHERE other.twitterName <> 'm_mrezamm'  
WITH other, collect(h) AS tags  
RETURN other.twitterName AS otherUser, size(tags) AS commonTags  
ORDER BY commonTags DESC

In [4]:
p3 = pd.read_csv("Problem3.csv")
p3

Unnamed: 0,otherUser,commonTags
0,bhardost,8
1,Hadi_IraniAsl,4
2,chiniejamaican,1
3,Protest_Safely,1
4,rejivohedofe,1
5,Shakor_Kakavand,1


### Problem 4: The original dataset does not contain information about which users follow each other. For this exercise we will infer that any user that MENTIONS another user in a tweet FOLLOWS that user. Write a Cypher expression which creates FOLLOW relationships based on this assumption, for example if UserA mentions UserB in a tweet, then it is assumed that UserA FOLLOWS UserB. Each FOLLOWS relationship added to the graph should also have a ‘weight’ property with a value of 1.
MATCH (u:User)-[:POSTS]->(t:Tweet)-[:MENTIONS]->(m:mentionedUser)  
MERGE (u)-[:FOLLOWS{weight:1}]->(m)

### Problem 5: Using the FOLLOWS relationship derived in Problem 4, use the Neo4j Graph Data Science library to calculate the most popular nodes using Degree Centrality from the FOLLOWS subgraph. Then, find the top five users with the highest Degree Centrality score. (consider using NATURAL orientation and the weight property for the graph projection).
#### Graph Projection
CALL gds.graph.project(  
  'centralityGraph',  
  ['User','mentionedUser'],  
  {  
    FOLLOWS: {  
      orientation: 'NATURAL',  
      properties: ['weight']  
    }})  
#### Degree Centrality
CALL gds.degree.stream('centralityGraph',  
    {relationshipWeightProperty: 'weight',  
    orientation: 'REVERSE'}  
    )  
YIELD nodeId, score  
RETURN gds.util.asNode(nodeId).twitterName AS user, score AS weightedScore  
ORDER BY weightedScore DESC, user LIMIT 5

In [5]:
p5 = pd.read_csv("Problem5.csv")
p5

Unnamed: 0,user,weightedScore
0,YouTube,91.0
1,BernieSanders,84.0
2,5SOS,53.0
3,Tha5SOSFamily,39.0
4,HillaryClinton,37.0
