# Network Analysis — Workbook

In this series of lessons, we're going to learn about network analysis. Network analysis will help us better understand the complex relationships between groups of people, fictional characters, and other kinds of things.

## Install NetworkX

In [None]:
!pip install --upgrade networkx

## Import Libraries

In [None]:
import networkx

import pandas as pd
pd.options.display.max_rows = 400
import matplotlib.pyplot as plt

## *Game of Thrones* Network

The network data that we're going to use in this lesson is taken from Andrew Beveridge and Jie Shan's paper, ["Network of Thrones."](https://www.maa.org/sites/default/files/pdf/Mathhorizons/NetworkofThrones%20%281%29.pdf)

These researchers calculated how many times each Game of Thrones character appeared within 15 words of another character in *A Storm of Swords*, the third book in the series.

| Network Element      | GOT |         
| :-------------: |:-------------:| 
| Node    | GOT character | 
| Edge     | Mutually mentioned within 15 words      | 
| Float | Decimal Numbers      |  
| Boolean | True/False     |   


For example, the following sentence counts as an "edge" or connection between Jon Snow and Sam Tarly:

> "**Arya** gave **Gendry** a sideways look. *He said it with me, like **Jon** used to do, back in Winterfell.* She missed **Jon Snow** the most of all her brothers.""

In [None]:
got_df = pd.read_csv('../data/got-edges.csv', encoding='utf-8')

In [None]:
got_df

## Create a Network From a Pandas DataFrame

In [None]:
G = networkx.from_pandas_edgelist(got_df,
                                  source='Source',
                                  target='Target',
                                  edge_attr='Weight')

## Draw a Simple Network

In [None]:
networkx.draw(G)

In [None]:
plt.figure(figsize=(10,15))
networkx.draw(G, with_labels=True, node_color='skyblue', width=.3, font_size=8)

## Calculate Degree

Who has the most number of connections in the network?

In [None]:
networkx.degree(G)

Make the degree values a dictionary, then add it as a network "attribute"

In [None]:
degrees = dict(networkx.degree(G))
networkx.set_node_attributes(G, name='degree', values=degrees)

Make a Pandas dataframe from the degree data `G.nodes(data='degree')`, then sort from highest to lowest

In [None]:
degree_df = pd.DataFrame(G.nodes(data='degree'), columns=['node', 'degree'])
degree_df

**Your turn!** Sort the DataFrame from highest degree centrality to lowest degree centrality. Then re-assign this sorted DataFrame to the variable `degree_df`.

In [None]:
degree_df ... #Your code here


Plot the nodes with the highest degree values

In [None]:
num_nodes_to_inspect = 10
degree_df[:num_nodes_to_inspect].plot(x='node', y='degree', kind='barh').invert_yaxis()

## Calculate Betweenness Centrality Scores

Who connects the most other nodes in the network?

In [None]:
networkx.betweenness_centrality(G)

In [None]:
betweenness_centrality = networkx.betweenness_centrality(G)

Add `betweenness_centrality` (which is already a dictionary) as a network "attribute" with `networkx.set_node_attributes()`

In [None]:
networkx.set_node_attributes(G, name='betweenness', values=betweenness_centrality)

Make a Pandas dataframe from the betweenness data `G.nodes(data='betweenness')`, then sort from highest to lowest

In [None]:
betweenness_df = pd.DataFrame(G.nodes(data='betweenness'), columns=['node', 'betweenness'])
betweenness_df

**Your turn!** Sort the DataFrame from highest betweenness centrality to lowest betweenness centrality. Then re-assign this sorted DataFrame to the variable `betweenness_df`.

In [None]:
betweenness_df = ... #Your code here


Plot the nodes with the highest betweenness centrality scores

In [None]:
num_nodes_to_inspect = 10
betweenness_df[:num_nodes_to_inspect].plot(x='node', y='betweenness', color='green', kind='barh').invert_yaxis()

## Discussion

- Which characters have the highest degree centrality scores?
- Which characters have the highest betweenness centrality scores?
- Why do these metrics differ? What can betweenness centrality tell us about the roles that the characters play in the GOT universe?


## Calculate Weighted Degree

Who has the most number of connections in the network (if you factor in edge weight)?

In [None]:
networkx.degree(G, weight='Weight')

Make the weighted degree values a `dict`ionary, then add it as a network "attribute" with `networkx.set_node_attributes()`

In [None]:
weighted_degrees = dict(networkx.degree(G, weight='Weight'))
networkx.set_node_attributes(G, name='weighted_degree', values=weighted_degrees)

Make a Pandas dataframe from the degree data `G.nodes(data='weighted_degree')`, then sort from highest to lowest

In [None]:
weighted_degree_df = pd.DataFrame(G.nodes(data='weighted_degree'), columns=['node', 'weighted_degree'])
weighted_degree_df = weighted_degree_df.sort_values(by='weighted_degree', ascending=False)
weighted_degree_df

Plot the nodes with the highest weighted degree values

In [None]:
num_nodes_to_inspect = 10
weighted_degree_df[:num_nodes_to_inspect].plot(x='node', y='weighted_degree', color='orange', kind='barh').invert_yaxis()

## Communities

Who forms distinct communities within this network?

In [None]:
from networkx.algorithms import community

Calculate communities with `community.greedy_modularity_communities()`

In [None]:
communities = community.greedy_modularity_communities(G)

In [None]:
communities

Make a `dict`ionary by looping through the communities and, for each member of the community, adding their community number

In [None]:
# Create empty dictionary
modularity_class = {}
#Loop through each community in the network
for community_number, community in enumerate(communities):
    #For each member of the community, add their community number
    for name in community:
        modularity_class[name] = community_number

Add modularity class to the network as an attribute

In [None]:
networkx.set_node_attributes(G, modularity_class, 'modularity_class')

Make a Pandas dataframe from modularity class network data `G.nodes(data='modularity_class')`

In [None]:
communities_df = pd.DataFrame(G.nodes(data='modularity_class'), columns=['node', 'modularity_class'])
communities_df = communities_df.sort_values(by='modularity_class', ascending=False)

In [None]:
communities_df

Inspect nodes in the DataFrame with the modularity class 2

In [None]:
communities_df[communities_df['modularity_class'] ... # Your code here]

## Visualization with nx_altair

Install nx_altair for visualization

In [None]:
!pip install nx_altair

Import nx_altair and altair

In [None]:
import nx_altair as nxa
import altair as alt

Add node name as an attribute

In [None]:
for node in G.nodes():
    G.nodes[node]['name'] = node

Create a network layout

In [None]:
pos = networkx.spring_layout(G)

Draw the graph with nx_altair

In [None]:
viz = nxa.draw_networkx(
    G,
    pos=pos,
    node_color= 'modularity_class',
    cmap='viridis',
    width=1,
    edge_color='black',
    node_tooltip = ['name', 'modularity_class']
)
alt.vconcat(viz)

## Discussion

- What do you think the communities detected by the modularity algorithm can tell us about the GOT universe?
- Which communities make sense, and which, if any, don't seem to make sense?