# Twitter networks : retweet, reply and hashtag

In this notebook we will indtroduce to some basic networkx while working with
different types of twitter networks. The datasets you have access to are smaller subset of the Danish twitter dataset. You will all be able to get more of data, but for now we simply want to build different types of networks and demostrate som very basic ways in which this can help you in your issue analysis.

In [1]:
# import libraries
import pandas as pd
import networkx as nx
import matplotlib as plt
%matplotlib inline 

# Retweet network

We will import a dataframe of retweets. Make it into a list of edges that networkx can transform into a network. We will then calculate indegree and inspect the most retweeted actors in the corona issue. 

We will likewise calculate the out degree to see who is spreading the word/spamming or sharing.

In [2]:
# importing retweet edges list. 'user' is the one 
# twitting and 'retweeted_user' is being retweeted.
df_retweet_edges = pd.read_csv('retweet_edges_test.csv') 

In [3]:
# pandas to_records transforms a dataframe into a list of truples, which networkx can work with
edges_retweet = df_retweet_edges.to_records(index=False)

In [4]:
# Construct a directed graph
digraph_retweet = nx.DiGraph()

In [5]:
# A directed graph takes an list of tuples.
digraph_retweet.add_edges_from(edges_retweet)

In [6]:
# return number of nodes and edges
digraph_retweet.number_of_nodes(), digraph_retweet.number_of_edges()

(257, 197)

In [7]:
# Calculate indegree and turn into a dictionary
outdegree_retweet = digraph.in_degree()
outdegree_retweet_d = dict(indegree)

# Turn in_degree into a series
outdegree_retweet_s = pd.Series(indegree_d)

In [8]:
# View most retweeted users for qualitative insepction. 
outdegree_retweet_s.nlargest(20)

Knutti_ETH         24
juergenzimmerer    20
cekicozlem          9
karolinenoehr       5
RosaLundEl          5
jonasholmdk         4
dr2deadline         4
TaniaGroth          4
jwcph               3
finn_skou           3
kaaretraberg        3
uffeelbaek          3
Susanne_Zimmer_     3
Joedisksamfund      3
pomaEB              3
tv2politik          3
rasmuskongshoej     3
TorbenLl1           3
larskohler          2
OjylPoliti          2
dtype: int64

In [9]:
# Quick describtion of the indegree distribtion. 
outdegree_retweet_s.describe()

count    257.000000
mean       0.766537
std        2.186600
min        0.000000
25%        0.000000
50%        0.000000
75%        1.000000
max       24.000000
dtype: float64

In [13]:
# Calculate outdegree and turn into a dictionary
outdegree_retweet = digraph.out_degree()
outdegree_retweet_d = dict(outdegree)

In [14]:
# Turn out_degree into a series
outdegree_retweet_s = pd.Series(outdegree_d)

In [15]:
outdegree_retweet_s.nlargest(20)

Politik Papegøjen🦜                          5
marianne stockmarr                          4
Bibsen 🍀🌿🍃🌱                                 4
Tɯitterhjerne                               3
Peter Petersen                              3
Fungo                                       3
E. Jensen 🇩🇰 (https://gab.com/BulgariaDK    3
Peter Barkentin                             2
Mie Kjaer                                   2
S                                           2
turkan balci                                2
Alexander Kjærsgaard                        2
Jonatan Schloss                             2
Annett Rios                                 2
Bríd Conneely                               2
Christian Niepoort                          2
Mads Brandsen                               2
Lise Vestergaard                            2
Pip fra politiker                           2
Kjeld Gaard-Frederiksen                     2
dtype: int64

# Reply network

We will import csv on reply into a dataframe. Make it into a list of edges that networkx can transform into a network. We will then calculate indegree and inspect the most talked to actors in the corona issue. 

We will likewise calculate the out degree to see who is the most dialogical actors in the Twitter corona issue public.

In [16]:
# importing retweet edges list. 'user' is the one 
# twitting and 'retweeted_user' is being retweeted.
df_reply_edges = pd.read_csv('reply_edges_test.csv')

In [18]:
reply_edges = df_reply_edges.to_records(index=False)
digraph_reply = nx.DiGraph()
digraph_reply.add_edges_from(reply_edges)

In [19]:
# Calculate indegree and turn into a dictionary
indegree_reply = digraph_reply.in_degree()
indegree_reply_d = dict(indegree_reply)
# from dict to series
indegree_reply_s = pd.Series(indegree_reply_d) 
indegree_reply_s.nlargest(100)

cekicozlem         6
JanEJoergensen     5
emilholmcph        3
RunestenConsult    3
KatrineVillar      3
                  ..
MFVMin             1
OleHauris          1
Janus18752879      1
04b11dk            1
pitbulltroldma2    1
Length: 100, dtype: int64

# Hashtag network

We will import csv on hashtags-user relations into a dataframe. Make it into a list of edges that networkx can transform into a network. We will then calculate indegree and inspect the most talked to actors in the corona issue. 

We will likewise calculate the out degree to see who is the most dialogical actors in the Twitter corona issue public.

In [27]:
df_hashtags = pd.read_csv('hashtags_test.csv')

In [29]:
# because of list of hasttag is formate as a string we can use a python library to get python to recognize 
# that it in fact is a python list. 
import ast
hashtag_list_of_list = [ast.literal_eval(hashtags) for hashtags in df_hashtags['hashtags']]
df_hashtags['hashtags'] = hashtag_list_of_list

In [30]:
# We run through our list of co-occurring hashtags and build a edgelist 
# where every hashtag is connected to every other within the same tweet. 
# We then add tweet edgelist to a egdelist for the whole collection of tweets. 
 
edge_list = [] # input for hashtag graph

for hashtags in df_hashtags['hashtags']:
    G = nx.complete_graph(hashtags) # makes a fully connected graph out of oc-occurring hashtags
    G_edgelist = G.edges() # gives us the edgelist
    edge_list.extend(G_edgelist) # add the edgelist from tweet to our edgelist for the whole collection of tweets

In [31]:
G_hashtag = nx.Graph() # Build non-directed graph
G_hashtag.add_edges_from(edge_list) # add edges from edge_list
G_hashtag.nodes() # view nodes

NodeView(('dkpol', 'dkcivil', 'spejder', 'dkmedier', 'dkgreen', 'dkenergi', 'dkklima', 'IErJoHeltVæk', 'DKpol', 'DKMedier', 'B34', 'Jugement', 'etik', 'coronavirus', 'dkpolitik', 'tætpåsandheden', 'partistøtte', 'STEM', 'dkbiz', 'dkforsk', 'china', 'China_kills_Muslims', 'sundfornift', 'overvågning', 'ansigtsgenkendelse', 'LGBT', 'begravelse', 'Pallywood', 'BesøgAsturias', 'VisitAsturias', 'MadamBlå', 'tRump', 'StopRacisme', 'kæmpforytringsfrihedmensviharden', 'oprørenmulighedmenkunudenvold', 'skolechat', 'whiterussian', '3554borgere', 'moral', '3554BorgereDØDEpgaJobcenterne', 'forretningsmodel'))