# Network measures in networkx

In this notebook we will calculate some relevant network measures on a retweet Twitter network, espacially centrality meseasures. We will focus on the actor level, on Tuesday(28/4) we will concentrate on calculating features of whole networks. You can finde more about how networkx does centrality calculation [here](https://networkx.github.io/documentation/networkx-1.10/reference/algorithms.centrality.html).  

 
We will:

0. Introduce to the twitter network data and the choices made 
1. Import edgelist and contruct a directed retweet network
2. Calculate and inspect patterns of outdegree centrality
3. Calculate and inspect patterns of indegree centrality
4. Calculate and inspect patterns of betweenness centrality
5. Calculate and inspect patterns of closeness centrality
6. Calculate and inspect patterns of ego-network density
7. Export directed graph to networkx


## 0.Introduce to the Twitter network data and the choices made 

In Digital Methods onedrive folder there is a retweet networks, a reply network and soon a hashtag network(not ready).

**Reply and retweet network construction**
1. the network bounderies drawn with reference to activity, namely partaking in the twitter public around the corona issue. To ensure this every tweets was matched up against a list of corona issue terms and only tweets that contained one or more of these corona issue terms became a part of the dataset.  

2. For a user to be a part of the network the user had to to tweet about the corona issue more than 7 times.

3. all the actors in both the reply and retweet network has to have either out or in going edges. In other words they have to either be retweetet/replied or retweet/reply. 

**Hashtag networks** 

1. All the hashtag had to come from tweets about the corona issue. Looking at both text and hashtags. 

2. The 1000 most used hashtags, besides the ones used to select the twittes, constitutes the nodes of the hashtag network. An edges between them means that they co-occurer in one tweet or more. NOT READY

In [7]:
# import libraries
import pandas as pd
import networkx as nx
import matplotlib as plt
%matplotlib inline 

## 1. Import edgelist and contruct a directed retweet network

In [8]:
# importing retweet edges list. 'user' is the one 
# twitting and 'retweeted_user' is being retweeted.
directory = '/Users/hjalmarbangcarlsen/OneDrive - Københavns Universitet/digital_methods2020spring/' # remember to change directory
df_retweet_edges = pd.read_csv(directory+'DM2020_corona_twitter_data/retweet_edges_small.csv') 

# pandas to_records transforms a dataframe into a list of truples, which networkx can work with
edges_retweet = df_retweet_edges.to_records(index=False)

# Construct a directed graph
digraph_retweet = nx.DiGraph()

# A directed graph takes an list of tuples.
digraph_retweet.add_edges_from(edges_retweet)

In [9]:
# lets check the number of nodes and edges
digraph_retweet.number_of_nodes(), digraph_retweet.number_of_edges()

(3050, 9044)

### 2.Calculate and inspect patterns of outdegree centrality


In [10]:
# Calculating out_degree centrality
retweet_out_degree_centrality = nx.out_degree_centrality(digraph_retweet)
pd.Series(retweet_out_degree_centrality).nlargest(10)

Michaela Rousing ❄    0.072483
Politik Papegøjen🦜    0.048213
Tɯitterhjerne         0.037389
Pip fra journalist    0.034438
Rasmus Kongshøj       0.033782
Bibsen 🍀🌿🍃🌱           0.025582
Stine Charlotte       0.023286
Henrik Brinels        0.022302
Gitte K. Persson      0.022302
Brian Martini         0.021974
dtype: float64

### 3.Calculate and inspect patterns of indegree centrality

In [11]:
# Calculating indegree centrality
retweet_in_degree_centrality = nx.in_degree_centrality(digraph_retweet)
pd.Series(retweet_in_degree_centrality).nlargest(10)

regeringDK         0.074779
M_B_Petersen       0.060676
Rigspoliti         0.058052
SSTSundhed         0.056740
mortensode         0.036405
AndersLadekarl     0.036077
LMSTSenderovitz    0.035093
noedhjaelp         0.030830
OleRyborg          0.025910
Heunicke           0.025254
dtype: float64

Indegree centrality does not care about the global reach and thereby be driven local patterns.

### 4. Calculate and inspect patterns of betweenness centrality

In [12]:
# Calculating betweenness centrality
retweet_betweenness_centrality = nx.betweenness_centrality(digraph_retweet)
pd.Series(retweet_betweenness_centrality).nlargest(10)

nunanoskar         2.248919e-05
Carlsbergfondet    3.443322e-06
SMVdanmark         2.044472e-06
Hunrejen           4.304152e-07
Tokeroed           4.304152e-07
Westegnen          1.076038e-07
TV 2 NYHEDERNE     0.000000e+00
tv2newsdk          0.000000e+00
Chi Hon 韓志         0.000000e+00
john burmeister    0.000000e+00
dtype: float64

### 5. Calculate and inspect patterns of closeness centrality


In [13]:
# Calculating closeness centrality
retweet_closeness_centrality = nx.closeness_centrality(digraph_retweet)
pd.Series(retweet_closeness_centrality).nlargest(10)

regeringDK         0.075224
M_B_Petersen       0.060691
Rigspoliti         0.058052
SSTSundhed         0.056740
AndersLadekarl     0.036543
mortensode         0.036405
LMSTSenderovitz    0.035093
noedhjaelp         0.030843
OleRyborg          0.025910
Heunicke           0.025866
dtype: float64

### 6. Calculate and inspect patterns of ego-network density


In [14]:
# calculate density for ego networks
density = []
network_size = []
nodes = []

for node in digraph_retweet.nodes():
    
    nodes.append(node)
    ego_G = nx.ego_graph(digraph_retweet,node,undirected=False)
    
    ego_density = nx.density(ego_G)
    density.append(ego_density)
    
    size = ego_G.number_of_nodes()
    network_size.append(size)
    
df_ego_density = pd.DataFrame(columns=["density",'network_size','nodes'])

df_ego_density['density'] = density
df_ego_density['network_size'] = network_size
df_ego_density['nodes'] = nodes

In [15]:
df_ego_density.sort_values(by='density',ascending=False)[:10]

Unnamed: 0,density,network_size,nodes
2682,0.5,2,FB
1394,0.5,2,Kristian M. Puggaard
1361,0.5,2,Morten Bove
1366,0.5,2,Dänische Botschaft
2492,0.5,2,Aktuel Kommentar
299,0.5,2,Klaus Egelund
298,0.5,2,Jesper Olsen
1372,0.5,2,Line Brisson
2489,0.5,2,Sophie H. Andersen
295,0.5,2,Kristoffer 🇩🇰🇰🇼


### 7. Export directed graph to networkx

In [16]:
# Export to gephi format, remember to change directory
nx.write_gexf(digraph_retweet, directory+"DM2020_corona_twitter_data/Hjalmar/retweet_network.gexf")