# Class 15 Exercises — Network Analysis

In this series of lessons, we're going to learn about network analysis. Network analysis will help us better understand the complex relationships between groups of people, fictional characters, and other kinds of things.

## Install NetworkX

In [None]:
!pip install networkx

## Import Libraries

In [1]:
import networkx

import pandas as pd
pd.options.display.max_rows = 400
import matplotlib.pyplot as plt

## *Game of Thrones* Network

The network data that we're going to use in this lesson is taken from Andrew Beveridge and Jie Shan's paper, ["Network of Thrones."](https://www.maa.org/sites/default/files/pdf/Mathhorizons/NetworkofThrones%20%281%29.pdf)

These researchers calculated how many times each Game of Thrones character appeared within 15 words of another character in *A Storm of Swords*, the third book in the series.

| Network Element      | GOT |         
| :-------------: |:-------------:| 
| Node    | GOT character | 
| Edge     | Mutually mentioned within 15 words      | 
| Float | Decimal Numbers      |  
| Boolean | True/False     |   


For example, the following sentence counts as an "edge" or connection between Jon Snow and Sam Tarly:

> "**Arya** gave **Gendry** a sideways look. *He said it with me, like **Jon** used to do, back in Winterfell.* She missed **Jon Snow** the most of all her brothers.""

## Create a Network From a Pandas DataFrame

Read in the CSV file "got-edges.csv" and assign to the variable `got_df`

In [None]:
got_df = # Your code here

Examine 15 randows rows

In [None]:
# Your code here

Fill in the code to correctly identify the source, target, and edge weight columns from the DataFrame

In [None]:
G = networkx.from_pandas_edgelist(got_df,
                                  source=# Your code here,
                                  target=# Your code here,
                                  edge_attr=# Your code here)

## Draw a Simple Network

In [None]:
networkx.draw(G)

In [None]:
# Set figure size
plt.figure(figsize=(10,15))

networkx.draw(G, with_labels=True, node_color='skyblue', width=.3, font_size=8)

## Calculate Degree

Who has the most number of connections in the network?

In [None]:
networkx.degree(G)

We can transform this special NetworkX object into a Python dictionary with `dict()`

In [None]:
dict(networkx.degree(G))

Make the degree values a dictionary, then add it as a network "attribute"

In [27]:
degrees = dict(networkx.degree(G))

# Add the degrees dictionary as an attribute
networkx.set_node_attributes(G, name='degree', values=degrees)

Make a Pandas dataframe from `networkx.degree(G)` and give it the column names "node" and "degree"

In [113]:
pd.DataFrame(#Your code here)

Unnamed: 0,node,degree
0,Aemon,5
1,Grenn,4
2,Samwell,15
3,Aerys,4
4,Jaime,24
5,Robert,18
6,Tyrion,36
7,Tywin,22
8,Alliser,3
9,Mance,12


Assign this DataFrame to the variable `degree_df`

In [None]:
degree_df = #Your code here

Sort the DataFrame from highest degree centrality to lowest degree centrality

In [None]:
degree_df... #Your code here

Re-assign this sorted DataFrame to the variable `degree_df`

In [None]:
degree_df = ... #Your code here

Examine the top 10 nodes with the highest degree values

In [None]:
degree_df... #Your code here

Plot the 10 nodes with the highest degree values as a horizontal bar chart `"barh"`. If you want to flip the Y axis so that the highest values are at the top, you can use `.invert_yaxis()`. 

In [None]:
degree_df... #Your code here

## Calculate Weighted Degree

Who has the most number of connections in the network (if you factor in edge weight)?

In [None]:
networkx.degree(G, weight='Weight')

Make the weighted degree values a `dict`ionary, then add it as a network "attribute" with `networkx.set_node_attributes()`

In [73]:
weighted_degrees = dict(networkx.degree(G, weight='Weight'))

# Add the weighted degrees dictionary as an attribute
networkx.set_node_attributes(G, name='weighted_degree', values=weighted_degrees)

Make a Pandas dataframe from the weighted degree data `networkx.degree(G, weight='Weight')`, then sort from highest to lowest

In [None]:
weighted_degree_df = pd.DataFrame(networkx.degree(G, weight='Weight'), columns=['node', 'weighted_degree'])
weighted_degree_df = weighted_degree_df.sort_values(by='weighted_degree', ascending=False)
weighted_degree_df

Plot the 10 nodes with the highest weighted degree values as a horizontal bar chart `"barh"` and make it orange. If you want to flip the Y axis so that the highest values are at the top, you can use `.invert_yaxis()`. 

In [None]:
weighted_degree_df...#Your code here

## Calculate Betweenness Centrality Scores

Who connects the most other nodes in the network?

In [None]:
networkx.betweenness_centrality(G)

In [None]:
betweenness_centrality = networkx.betweenness_centrality(G)

Add `betweenness_centrality` (which is already a dictionary) as a network "attribute" with `networkx.set_node_attributes()`

In [None]:
networkx.set_node_attributes(G, name='betweenness', values=betweenness_centrality)

Make a Pandas dataframe from the betweenness data `networkx.betweenness_centrality(G)` and give it the column names "node" and "betweenness"

In [116]:
betweenness_df = pd.DataFrame(#Your code here)
betweenness_df

Unnamed: 0,node,betweenness


Sort the DataFrame from highest betweeness centrality to lowest degree centrality

In [None]:
betweenness_df... #Your code here

Re-assign this sorted DataFrame to the variable `betweeness_df`

In [None]:
betweenness_df = ... #Your code here

Examine the top 10 nodes with the highest betweenness centrality scores

In [None]:
betweenness_df... #Your code here

Plot the 10 nodes with the highest betweenness centrality scores as a horizontal bar chart `"barh"` and make it green. If you want to flip the Y axis so that the highest values are at the top, you can use `.invert_yaxis()`. 

In [None]:
betweenness_df... #Your code here

## Discussion

- Which characters have the highest degree centrality scores?
- Which characters have the highest betweenness centrality scores?
- Why do these metrics differ? What can betweenness centrality tell us about the roles that the characters play in the GOT universe?


## Communities

Who forms distinct communities within this network?

In [83]:
from networkx.algorithms import community

Calculate communities with `community.greedy_modularity_communities()`

In [84]:
communities = community.greedy_modularity_communities(G)

Fill in the code below so that you loop through `communities` and print out the following statement for each community: "Community *N*: [list of characters in community]".  

*Hint: there's an [important built-in Python](https://docs.python.org/3/library/functions.html) function missing!*

In [92]:
# Fill in the code below
for your_code_here, community in communities:
    print(f"Community {number}: {community}\n")

Community 0: frozenset({'Bronn', 'Qyburn', 'Myrcella', 'Renly', 'Mace', 'Margaery', 'Elia', 'Pycelle', 'Chataya', 'Ilyn', 'Joffrey', 'Balon', 'Kevan', 'Jaime', 'Ellaria', 'Walton', 'Aerys', 'Tommen', 'Meryn', 'Oberyn', 'Sandor', 'Doran', 'Shae', 'Varys', 'Gregor', 'Lancel', 'Tyrion', 'Loras', 'Olenna', 'Tywin', 'Amory', 'Podrick'})

Community 1: frozenset({'Craster', 'Janos', 'Davos', 'Gilly', 'Jojen', 'Karl', 'Styr', 'Ygritte', 'Aemon', 'Cressen', 'Mance', 'Hodor', 'Jon', 'Shireen', 'Alliser', 'Stannis', 'Qhorin', 'Salladhor', 'Val', 'Melisandre', 'Eddison', 'Samwell', 'Rattleshirt', 'Meera', 'Dalla', 'Bowen', 'Grenn', 'Orell'})

Community 2: frozenset({'Rickard', 'Lysa', 'Theon', 'Marillion', 'Roose', 'Hoster', 'Roslin', 'Eddard', 'Brynden', 'Luwin', 'Rickon', 'Catelyn', 'Lothar', 'Jeyne', 'Nan', 'Robert Arryn', 'Sansa', 'Robb', 'Ramsay', 'Edmure', 'Bran', 'Cersei', 'Arya', 'Petyr', 'Brienne', 'Walder'})

Community 3: frozenset({'Jorah', 'Drogo', 'Rakharo', 'Illyrio', 'Daario', 'Jon 

Make a `dict`ionary by looping through the communities and, for each member of the community, adding their community number

In [93]:
# Create empty dictionary
modularity_class = {}
#Loop through each community in the network
for community_number, community in enumerate(communities):
    #For each member of the community, add their community number
    for name in community:
        modularity_class[name] = community_number

In [None]:
modularity_class

Add modularity class to the network as an attribute

In [95]:
networkx.set_node_attributes(G, modularity_class, 'modularity_class')

Make a Pandas dataframe from modularity class network data `G.nodes(data='modularity_class')`

In [96]:
communities_df = pd.DataFrame(G.nodes(data='modularity_class'), columns=['node', 'modularity_class'])
communities_df = communities_df.sort_values(by='modularity_class', ascending=False)

In [None]:
communities_df

Inspect nodes in the DataFrame with the modularity class 2

In [None]:
community_filter = # Your code here
communities_df[community_filter]

Inspect nodes in the DataFrame with the modularity class 3

In [None]:
community_filter = # Your code here
communities_df[community_filter]

## Visualization with nx_altair

Install nx_altair for visualization

In [None]:
!pip install nx_altair

Import nx_altair and altair

In [99]:
import nx_altair as nxa
import altair as alt

Add node name as an attribute

In [100]:
for node in G.nodes():
    G.nodes[node]['name'] = node

Create a network layout

In [101]:
pos = networkx.spring_layout(G)

Draw the graph with nx_altair

In [None]:
viz = nxa.draw_networkx(
    G,
    pos=pos,
    node_color= 'modularity_class',
    cmap='viridis',
    width=1,
    edge_color='black',
    node_tooltip = ['name', 'modularity_class']
)
alt.vconcat(viz)

## Visualization with Bokeh

In [None]:
!pip install bokeh

Import Bokeh modules and special Bokeh function

In [105]:
from bokeh.io import output_notebook, show, save
from bokeh_network import make_interactive_network

Load Bokeh in notebook

In [106]:
output_notebook()

In [None]:
make_interactive_network(G,
                         labels=False,
                         title='Game of Thrones Network',
                         node_color='modularity_color')

## Discussion

- What do you think the communities detected by the modularity algorithm can tell us about the GOT universe?
- Which communities make sense, and which, if any, don't seem to make sense?

# Make Character Network Data

Here's a Colab notebook for making character network data with BookNLP:

https://colab.research.google.com/drive/1D_85ACEj9gN-wTrd8cHIZGvlm2zwE3ub?usp=sharing