# Network Playground using GoT Dataset (1): Basic Metrics

Network or Graph is a special representation of entities which have relationships among themselves. It is made up of a collection of two generic objects — (1) node: which represents an entity, and (2) edge: which represents the connection between any two nodes. In a complex network, we also have attributes or features associated with each node and edge. 

## Winter is Coming. Let's load the dataset ASAP!

If you haven't heard of Game of Thrones, then you must be really good at hiding. Game of Thrones is the hugely popular television series by HBO based on the (also) hugely popular book series A Song of Ice and Fire by George R.R. Martin. In this notebook, we will analyze the co-occurrence network of the characters in the Game of Thrones books. Here, two characters are considered to co-occur if their names appear in the vicinity of 15 words from one another in the books.

Note we dont care what happened exactly to a person. A dead person who is often mentioned is in some way also powerful

This notebook tries to answer the following question by applying graph theory. We try to give intuitive explanation for graph metrices/techniques, without complex mathemetical derivatives. 
- Who is the mightiest person?
- Who is influential over Westeros, because more information will pass through that person?

By answer these questions, we will also understand two important graph metrics:
- Degree Centrality
- Eigenvector Centrality
- Betweenness Centrality

## Install some Python Packages

In [1]:
!pip install pandas
!pip install python-dateutil
!pip install networkx
!pip install pyvis



## Load the Dataset

<a href="https://www.kaggle.com/code/mmmarchetti/game-of-thrones-network-analysis">GoT Dataset</a>
Please download the csv files from the link above and save them under the same directory as this notebook.

Lets directly jump to book 5 which is the last dataset, since we want to guess who is winning in the end. We will show the network extracted from book 5. 

In [2]:
import networkx as nx
import pandas as pd
from pyvis.network import Network
# Reading in datasets/book5.csv
book5 = pd.read_csv('book5.csv')

# Printing out the head of the dataset
print(book5.head())

G5 = nx.from_pandas_edgelist(book5, source='Source', target="Target")
net5 = Network(notebook=True)
net5.from_nx(G5)
net5.show("test5.html")

                             Source              Target        Type  weight  \
0                 Aegon-I-Targaryen  Daenerys-Targaryen  undirected       4   
1  Aegon-Targaryen-(son-of-Rhaegar)  Daenerys-Targaryen  undirected      11   
2  Aegon-Targaryen-(son-of-Rhaegar)        Elia-Martell  undirected       4   
3  Aegon-Targaryen-(son-of-Rhaegar)    Franklyn-Flowers  undirected       3   
4  Aegon-Targaryen-(son-of-Rhaegar)              Haldon  undirected      14   

   book  
0     5  
1     5  
2     5  
3     5  
4     5  
Local cdn resources have problems on chrome/safari when used in jupyter-notebook. 


## Who is the mightiest person?

From the graph above we can also see some people have more connections (Jon, Daenerys etc.) than the others. Calculating the degree centrality gets the same result.

In [3]:
centrality5 = nx.degree_centrality(G5)
centrality_list5 = sorted(centrality5.items(), key=lambda x: x[1], reverse=True)

# Print the names with the most centralities
print(centrality_list5[:5])

[('Jon-Snow', 0.1962025316455696), ('Daenerys-Targaryen', 0.18354430379746836), ('Stannis-Baratheon', 0.14873417721518986), ('Tyrion-Lannister', 0.10443037974683544), ('Theon-Greyjoy', 0.10443037974683544)]


We can also calculate the eigenvector centrality. Eigenvector centrality computes the centrality for a node based on the centrality of its neighbors. The eigenvector centrality for node $i$
 is the 
$i$-th element of the vector $x$
 defined by the equation
 $Ax = \lambda x$
 
 An intuitive explanation would be important people connected to important people, who have lots of connections. The eigenvector score is different from the centrality score. In this way, we found out Daenerys has the greatest eigenvector centrality, which means she is the most powerful person.

In [4]:
eigen_centrality5 = nx.eigenvector_centrality(G5, max_iter=200)
eigen_centrality_list5 = sorted(eigen_centrality5.items(), key=lambda x: x[1], reverse=True)

# Print the names with the most eigenvector centralities
print(eigen_centrality_list5[:5])

[('Daenerys-Targaryen', 0.4026451144974876), ('Barristan-Selmy', 0.23153906683464165), ('Stannis-Baratheon', 0.22788340078917774), ('Tyrion-Lannister', 0.22067143120644875), ('Hizdahr-zo-Loraq', 0.2014542130740377)]


## Who is influential over Westeros?

We can calculate the betweenness centrality. Betweenness centrality finds wide application in network theory; it represents the degree to which nodes stand between each other. 

The result shows that Stannis has greatest betweenness centralities. He may not dominate the kingdom in the end, but certainly he will play essential role in the story!

In [5]:
betweenness_centrality5 = nx.betweenness_centrality(G5)
betweenness_centrality_list5 = sorted(betweenness_centrality5.items(), key=lambda x: x[1], reverse=True)

# Print the names with the most betweenness centralities
print(betweenness_centrality_list5[:5])

[('Stannis-Baratheon', 0.45283060689247934), ('Daenerys-Targaryen', 0.2959459062106149), ('Jon-Snow', 0.24484873673158666), ('Tyrion-Lannister', 0.20961613179551256), ('Robert-Baratheon', 0.17716906651536968)]


# Conclusion
In this notebook we have learned what is a graph and three of its metrics:
- Degree Centriality
- Eigenvector Centrality
- Betweenness Centrality