# Seminar 1 - A Song of Graphs and Search

---

**Course**: Graphs and Network Analysis

**Degree**: Artificial Intelligence Degree (UAB)

**Topic**: Practical seminar that includes exercises from units 1 to 6

**Activity description**: Most of us are familiar with the Game of Thrones books or series. For those who do not know it, it is a fictional series from the HBO chain, inspired by the series of novels "A Song of Ice and Fire", which tells the experiences of a group of characters from different noble houses on the fictional continent of *Westeros* to have control of the Iron Throne and rule the seven kingdoms that make up the territory. The series' success has spawned many blogs and other sources about the series, with additional resources. The graphs that we propose to use in this exercise represent the characters of the series (or books) as nodes, and their co-appearance in a scene (the weights of the edges are higher if two characters appear simultaneously more times). So we have a social network of characters. We will use these graphs to work on some of the concepts seen in the first units of the course (graph and node metrics, search and routes). Finally, synthetic graphs that simulate a realistic network will be generated.

## Qualification

**Submission**: An '.ipynb' file from the colab corresponding to each group will be delivered (this very same file, adding the code blocks and explanations that correspond to each activity). To get the file you will need to go to File --> Download. Remember that you will have to answer and analyze the different problems. Coding alone will NOT be evaluated: explaining and reasoning about the solution of the problem is essential. **You should provide explanations of the obtained results for at least the exercises marked with the 💬 symbol**.
The outcome of this seminar will thus be an analysis of the network at different levels: global metrics, node importance, shortest paths, random graphs, and visualization.

**Delivery form**: The work must be done in **groups of two people** and delivered through the virtual campus (in the section corresponding to Seminar 1).

**Doubts**: For any questions, apart from class sessions, you can contact cristina.perez@uab.cat.

**Deadline**: March 13th (during all day).

**Marks**: The grade of the seminars (seminar 1 + seminar 2) has a weight of 10% on the final grade of the subject.


# Authors

**Lab group:** GrupLab-9

**Student 1 - Name (NIU): 1668936**

**Student 2 - Name (NIU):**

## 1. Environment setup
----

The main libraries that will be used in this seminar are the following:

* [NetworkX](https://networkx.github.io/)
* [Pandas](https://pandas.pydata.org/)
* [Matplotlib](https://matplotlib.org/)
* [NumPy](https://numpy.org/)



In [1]:
!pip install --upgrade scipy networkx

Collecting scipy
  Using cached scipy-1.12.0-cp310-cp310-win_amd64.whl.metadata (60 kB)
Using cached scipy-1.12.0-cp310-cp310-win_amd64.whl (46.2 MB)
Installing collected packages: scipy
  Attempting uninstall: scipy
    Found existing installation: scipy 1.11.4
    Uninstalling scipy-1.11.4:
      Successfully uninstalled scipy-1.11.4
Successfully installed scipy-1.12.0




In [2]:
!pip install pygraphviz

Collecting pygraphviz
  Using cached pygraphviz-1.12.tar.gz (104 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Installing backend dependencies: started
  Installing backend dependencies: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: pygraphviz
  Building wheel for pygraphviz (pyproject.toml): started
  Building wheel for pygraphviz (pyproject.toml): finished with status 'error'
Failed to build pygraphviz


  error: subprocess-exited-with-error
  
  × Building wheel for pygraphviz (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [49 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build\lib.win-amd64-cpython-310
      creating build\lib.win-amd64-cpython-310\pygraphviz
      copying pygraphviz\agraph.py -> build\lib.win-amd64-cpython-310\pygraphviz
      copying pygraphviz\graphviz.py -> build\lib.win-amd64-cpython-310\pygraphviz
      copying pygraphviz\scraper.py -> build\lib.win-amd64-cpython-310\pygraphviz
      copying pygraphviz\testing.py -> build\lib.win-amd64-cpython-310\pygraphviz
      copying pygraphviz\__init__.py -> build\lib.win-amd64-cpython-310\pygraphviz
      creating build\lib.win-amd64-cpython-310\pygraphviz\tests
      copying pygraphviz\tests\test_attribute_defaults.py -> build\lib.win-amd64-cpython-310\pygraphviz\tests
      copying pygraphviz\tests\test_clear.py -> build\lib.

In [3]:
import networkx as nx
from networkx.drawing.nx_agraph import graphviz_layout

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

from collections import Counter

## 2. Data collection

---

This seminar is based on data from *Game of Thrones* and "A Song of Ice and Fire" curated by Andrew Beveridge. Data is available from two different github repositories:

* [Book to Network](https://github.com/mathbeveridge/asoiaf)
* [Script to Network](https://github.com/mathbeveridge/gameofthrones)

In each of them, there is a *data* folder with several *.csv* files that encode nodes and edges of different networks.

To download the data in the *colab* environment you can run the following command:

```
$ !wget https://raw.githubusercontent.com/mathbeveridge/repo_name/master/data/file_id-nodes.csv
$ !wget https://raw.githubusercontent.com/mathbeveridge/repo_name/master/data/file_id-edges.csv
```


where,

* **repo_name** is the name of the repository, *asoiaf* for the Books and *gameofthrones* for the Script.
* **file_id** is the ID of the file you can find with the link. This indicates the book or season number.

For example, to download the graph of the first season of the series, we would run:

```
$ !wget https://raw.githubusercontent.com/mathbeveridge/gameofthrones/master/data/got-s1-nodes.csv
$ !wget https://raw.githubusercontent.com/mathbeveridge/gameofthrones/master/data/got-s1-edges.csv
```

The downloaded files can be found in */content/file_name*.

For this activity, we will work with the graph generated from all the books.


*  **Download the two .csv files corresponding to the graph generated from all the books (asoiaf-all)**.

In [8]:
!wget https://raw.githubusercontent.com/mathbeveridge/gameofthrones/master/data/got-s1-nodes.csv
!wget https://raw.githubusercontent.com/mathbeveridge/gameofthrones/master/data/got-s1-edges.csv

"wget" no se reconoce como un comando interno o externo,
programa o archivo por lotes ejecutable.
"wget" no se reconoce como un comando interno o externo,
programa o archivo por lotes ejecutable.


## 3. Data load

---

The function *csv_to_graph()* creates a NetworkX graph from the *.csv* files encoding edges and nodes.

In [5]:
def csv_to_graph(file_id_nodes: str, file_id_edges: str, origin: str = 'book') \
                    -> nx.graph:
    """Return a nx.graph

    Build a graph given a csv file for nodes and edge.
    origin controls the source of the graph to adapt the node features.
    """

    if origin == 'book':
        key1, key2 = 'weight', 'book'
    elif origin == 'script':
        key1, key2 = 'Weight', 'Season'
    else:
        raise NameError('Unknown origin {}'.format(origin))

    nodes = pd.read_csv(file_id_nodes)
    edges = pd.read_csv(file_id_edges)

    if key2 not in edges:
        key2 = 'id'

    g = nx.Graph()
    for row in nodes.iterrows():
        g.add_node(row[1]['Id'], name=row[1]['Label'])

    for row in edges.iterrows():
        g.add_edge(row[1]['Source'],row[1]['Target'],
                   weight=1/row[1][key1], id=row[1][key2])

    return g


* **Create a NetworkX graph from the downloaded files using the `csv_to_graph` function.** [Optionally, you can repeat the process with the graph generated from the series]

In [6]:
g_book = csv_to_graph('', '', origin='')

NameError: Unknown origin 

* **Generate a first exploratory visualization of the graph.**

In [None]:
plt.rcParams['figure.figsize'] = [12, 12]
plt.rcParams['figure.dpi'] = 100




## 4. General graph metrics
---

Perform a general summary of the Network properties.

* **💬  Obtain the order, size and density of the graph, as well as the average degree of its nodes.**


💬 :

* **Check that it is a connected undirected graph.**

* **💬 Make a small report on the metrics of the given graph (diameter, radius, average network distance, clustering coefficient).**

💬 :

## 5. Centrality metrics: Characters' importance
---


In this section, we will study the importance of the characters according to their centrality in the graph.

* **Compute the 10 most central nodes in the network taking into account the different types of centrality (degree, betweenness, closeness and eigenvector centrality). Moreover, use page rank to assess importance of the characters.**

  * *centrality_bar_plot()*: Given the corresponding centrality draw a bar graph.
  * 💬 Try to reason about the changes that occur with the different types of centrality.

In [None]:
def centrality_bar_plot(centrality, name='betweenness', n=10):
    values = ... # Nodes' names
    label = ... # Centrality value

    df = pd.DataFrame({'Name': label, name: values})
    ax = df.plot.bar(x='Name', y=name, rot=90)

In [None]:
plt.rcParams['figure.figsize'] = [10, 4]

degree_centrality = ... # Degree Centrality
betweenness_centrality = ... # Betweenness Centrality
closeness_centrality = ... # Closeness Centrality
eigen_centrality = ... # Eigenvalue Centrality


centrality_bar_plot(degree_centrality, name='degree')
centrality_bar_plot(betweenness_centrality, name='betweenness')
centrality_bar_plot(closeness_centrality, name='closeness')
centrality_bar_plot(eigen_centrality, name='eigen')

plt.rcParams['figure.figsize'] = [12, 12]

In [None]:
# Page rank:

💬 :

* **What is the subgraph generated by the best connected characters?**
  * Use closeness centrality to generate the graph of the 25 most central characters.

In [None]:
def centrality_subgraph(g, centrality, name='closeness', n=25):
    pass

In [None]:
g_subgraph = centrality_subgraph(...)

* **Draw this subgraph where the nodes are of size proportional to their centrality. Mark the most central and the less central node in the graph (for instance, use the color of the node to highlight it).**
  * Use *closeness centrality* and scale it appropriately to emphasize the importance of different nodes.

* **Draw the tree that the BFS and DFS algorithm would generate to traverse the graph starting from the least central node of the network according to *closeness centrality*.**
  * Use *closeness centrality* and scale it appropriately to emphasize the importance of different nodes.
  * To get the positions of the nodes, you can use the `graphviz_layout(tree, prog='dot')` command.
  * 💬 Comment on the obtained result.


In [None]:
tree = ...

pos = graphviz_layout(..., prog='dot')

In [None]:
tree = ...

pos = graphviz_layout(tree, prog='dot')

💬 :

* **💬 Compute the shortest path between the least and the most central nodes in the complete graph.**

💬 :

## 6. Random graph models
----
Up to this point, we have worked with a graph generated from the data extracted from the *Song of Ice and Fire* books. In the real world, however, obtaining the data needed to construct this graph can become very complex and expensive. This is one of the reasons why, over time, the synthetic generation of graphs has been studied.

In this section we will work on the different models described in class. We will generate random graphs and study their properties.

* **Generate random graphs with the Uniform, Gilbert and Barabási-Albert models. Fix the number of nodes to the order of the studied graph. Adjust the rest of the parameters of the graph generation function to obtain graphs with similar number of edges.**

### Erdös-Rény: Uniform Model (gnm)

In [None]:
g_uniform = ...

### Erdös-Rény: Gilbert Model (gnp)


In [None]:
g_gilbert = ...

### Barabási-Albert Model



In [None]:
g_barbasi = ...

In [None]:
g_dict = {'Book': g_book, 'Uniform': g_uniform, 'Erdos': g_gilbert, 'Barbasi': g_barbasi}

* **💬 Show the order and size of the graph as well as the average degree and clustering coefficient of its nodes. Compute also the intervals between the maximum and minimum centralities for each family of synthetic graphs. Make a small report of the main metrics. Which random graph resembles more closely the graph from the books?**
     * You can set the graph generation using a random seed. This way, two different runs will generate exactly the same graph.

In [None]:
for k, g in g_dict.items():
    pass

💬 :

* **💬 Check whether the networks (the three randomly generated ones and the network extracted from the books) follow a Power Law.**

In [None]:
plt.rcParams['figure.figsize'] = [13, 5]

for k, g in g_dict.items():
  pass

💬 :