### Network centrality

In this notebook we will go through the introduced centrality measures. For the analysis, we will use two networks on the same set of nodes: families in the 15th century Italy. The two networks describe their business and marriage connections

In [None]:
# Start with the important libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Importing the new library for network analysis
import networkx as nx

In [None]:
# The files are in graphml format, it is also frequently used in practice

family_business = nx.read_graphml("PADGB.GraphML")

# In order to draw the graph with the family names, we can extract it as it is a node attribute
names_b = nx.get_node_attributes(family_business, 'name')

# There are different drawing layout styles, here we try one we have not seen before
# You can check the documentation of layout styles and experiment here:
# https://networkx.org/documentation/stable/reference/drawing.html

nx.draw_kamada_kawai(family_business, labels = names_b)

In [None]:
# We can also check what extra information we have about the families
# If you check, we do not have any extra data on the edges

family_business.nodes(data= True)

In [None]:
# We can import the netwrok depicting marriage relationships similarly

family_marriage = nx.read_graphml("PADGM.GraphML")

# In order to draw the graph with the family names, we can extract it as it is a node attribute
names_m = nx.get_node_attributes(family_marriage, 'name')

# There are different drawing layout styles, here we try one we have not seen before
# You can check the documentation of layout styles and experiment here:
# https://networkx.org/documentation/stable/reference/drawing.html

nx.draw_kamada_kawai(family_marriage, labels = names_m)

In [None]:
# We can start with the simplest and most intuitive centrality, the degree
# As introduced, degree centrality is calculated as the fraction of nodes that a node is connected to
# The higher the value of degree centrality, the more central/important the node is

# We calculate first for the business network

deg_b = nx.degree_centrality(family_business)

# The result is a dictionary
deg_b

In [None]:
# In order to collect the values for each centrality for a final comparison, we can create a dataframe
# The index will be the name of the families

centrality_b = pd.DataFrame(index = names_b.values())

# We can add degree centrality as the first column
# We can get the values from the dictionary

centrality_b['degree_centrality'] = deg_b.values()

centrality_b

In [None]:
# We can do the same for the marriage network
# We calculate the degree centrality, create the dataframe and add the values as the first column

# Calculate centrality

deg_m = nx.degree_centrality(family_marriage)

# Create the dataframe

centrality_m = pd.DataFrame(index = names_m.values())

# Add degree centrality as the first column

centrality_m['degree_centrality'] = deg_m.values()

centrality_m

In [None]:
# The next measure is closeness centrality 
# As introduced, it calculates the average distance to all the other nodes
# The smaller the avergae distance, the more central the node is
# However, the implementation in networkx calculates a transformation of this value
# So also in this case the higher the value that is calculated, the more central the node is
# We start again with the business network
# We can also not that nodes that are not connected, automatically get closness centrality 0

closeness_b = nx.closeness_centrality(family_business)

# Add closeness centrality as the second column

centrality_b['closeness_centrality'] = closeness_b.values()

centrality_b

In [None]:
# The same for the marriage network

closeness_m = nx.closeness_centrality(family_marriage)

# Add closeness centrality as the second column

centrality_m['closeness_centrality'] = closeness_m.values()

centrality_m

In [None]:
# The next one is betweenness centrality 
# As introduced, it quantifies the number of times a node # acts as a 
# bridge along the shortest path between two other nodes
# We start with the business network

betweenness_b = nx.betweenness_centrality(family_business)

# Add closeness centrality as the third column

centrality_b['betweenness_centrality'] = betweenness_b.values()

centrality_b

In [None]:
# The same for the marriage network

betweenness_m = nx.betweenness_centrality(family_marriage)

# Add closeness centrality as the second column

centrality_m['betweenness_centrality'] = betweenness_m.values()

centrality_m

In [None]:
# The next one is eigenvector centrality 
# While not discussed in detail, intuitively a node is important if it is connected to other important nodes
# We can start with the business network

eigenvector_b = nx.eigenvector_centrality(family_business)

# Add closeness centrality as the fourth column

centrality_b['eigenvector_centrality'] = eigenvector_b.values()

centrality_b

In [None]:
# The same for the marriage network

eigenvector_m = nx.eigenvector_centrality(family_marriage)

# Add closeness centrality as the fourth column

centrality_m['eigenvector_centrality'] = eigenvector_m.values()

centrality_m

In [None]:
# Finally PageRank, that we discussed in detail
# It is related to eigenvector centrality
# We can start with the business network

pagerank_b = nx.pagerank(family_business)

# Add closeness centrality as the fifth column

centrality_b['pagerank'] = pagerank_b.values()

centrality_b

In [None]:
# The same for marriage network

pagerank_m = nx.pagerank(family_marriage)

# Add closeness centrality as the fifth column

centrality_m['pagerank'] = pagerank_m.values()

centrality_m

In [None]:
# We can compare the top 5 most important families
# Business firt

for col in centrality_b.columns:
    result = list(centrality_b.sort_values(by = col, ascending = False).index[:5])
    print('The top 5 families based on', col, 'in the business network are', result)

In [None]:
# Marriage network
for col in centrality_m.columns:
    result = list(centrality_m.sort_values(by = col, ascending = False).index[:5])
    print('The top 5 families based on', col, 'in the marriage network are', result)

In [None]:
# In order to get a comparison on how similar rankings based on the different centrality measures are
# we can calucalte correlation for the values
# The higher the correlation, the more similar two centrality measures are

# For the business network

centrality_b.corr()

In [None]:
# For the marriage network

centrality_m.corr()