<div class="frontmatter text-center">
<h1> Introduction to Data Science and Programming</h1>
<h2>Exercise 19: Introduction to network science</h2>
<h3>IT University of Copenhagen, Fall 2023</h3>
</div>

Using the networkx `Graph` class and assocated methods developed in the class, solve the following exercise questions.

In [None]:
import networkx as nx
import matplotlib.pyplot as plt
import random
import numpy as np

### 19.1
Inspect the the "Zachary Karate Club" dataset from `karate.txt` file (in the `files` folder). What type of dataset is it?

In [None]:
!head files\karate.txt
#If this throws an error, try using !type instead of !head (although !type prints the whole file).
#You might also have to switch out the \ for a /.

### 19.2
- Make a function to load a file and return a `Graph` object, **without using `nx.from_edgelist()`**. Make node names `integers` and not `strings`.
- Load `karate.txt` into a `Graph` `G1`.
Hint: `G.add_edge(v1, v2)` adds the nodes `v1` and `v2` to the graph if they do not already exist.

In [None]:
def from_edgelist(filepath):
    G = nx.Graph()
    with open(filepath) as infile:
        for line in infile:
            v1, v2 = line.strip().split(" ")
            v1 = int(v1)
            v2 = int(v2)
            G.add_edge(v1, v2)
    return G
G1 = from_edgelist("files/karate.txt")

The following line should return `True` if your code is correct:

In [None]:
sorted(G1.nodes()) == list(range(34))

### 19.3
Similarly, load the following five different graphs and store them as separate graphs; `G3`, `G_AB`, `G_D1`, `G_D2` and `G_ER`.

``` ["graph3.txt", "graphAB.txt", "graphDense1.txt", "graphDense2.txt", "graphER.txt"]```

(You may use a one-line list comprehension for this, if you like)

In [None]:
G3, G_AB, G_D1, G_D2, G_ER = (from_edgelist("files/"+filename) for filename in ["graph3.txt", "graphAB.txt", "graphDense1.txt", "graphDense2.txt", "graphER.txt"])

### 19.4
Calculate the following for each of the 5 graphs :
- Number of nodes 
- Number of edges
- Average degree (rounded to 2 decimals)
- Diameter (longest shortest path)

Then plot the following for each of the 5 graphs:
- Degree distribution
- Shortest path distribution

Note: To ease your workload, make it into a function, as you will use it again in **19.5**.

*Hints: `nx.diameter`, `nx.shortest_path_length`, `nx.has_path`*

In [None]:
def graph_properties(graph_names, graphs):
    print("Graph\t#Nodes\t#Edges\tAvg.Deg.\tDiameter")
    for name, graph in zip(graph_names, graphs):
        degree = [graph.degree(n) for n in graph.nodes()]
        try:
            diam = nx.diameter(graph)
        except:
            diam = "Infinite"
        print(
            name,
            len(graph.nodes()),
            len(graph.edges()),
            round(sum(degree)/len(degree), 2),
            "",
            diam,
            sep="\t")

    fig, ax = plt.subplots(2, len(graphs), figsize=(len(graphs)*4,2*3))
    for graph_num, (graph_name, graph) in enumerate(zip(graph_names, graphs)):
        degrees = [graph.degree(n) for n in graph.nodes()]
        ax[0, graph_num].hist(degrees, density=True, bins=np.arange(max(degrees)+2)-0.5)
        ax[0, graph_num].set_title(graph_name+" - Deg. Dist.")
        shortest_paths = [nx.shortest_path_length(graph, v1, v2) if nx.has_path(graph, v1, v2) else -1 for v1 in graph.nodes() for v2 in graph.nodes()]
        ax[1, graph_num].hist(shortest_paths, density=True, bins=np.arange(max(shortest_paths)+2)-0.5)
        ax[1, graph_num].set_xticks(range(max(shortest_paths)+1))
        ax[1, graph_num].set_title(graph_name+" - Shortest Path Dist.")
        

graph_names = ["G3", "G_AB", "G_D1", "G_D2", "G_ER"]
graphs = [G3, G_AB, G_D1, G_D2, G_ER]
graph_properties(graph_names, graphs)

### 19.5
- Delete 5 edges randomly, from each of the 5 above graphs and recalculate the metrics from **19.4**.
- Do you see any major changes in any of the plots?
- What changes do you imagine are made when removing edges in a graph? What are the consequences of this?

Note that deleting edges might make your Graph disconnected (in which case a path will not exist between all pairs of nodes and thus it will not have a diameter. As a bonus exercise, you may see if you can fix this. You may use try-except statements in this part.)

Hint: `random.sample`

In [None]:
def del_n_edges(G, n):
    G2 = G.copy()
    to_remove=random.sample(G2.edges(),k=n)
    G2.remove_edges_from(to_remove)
    return G2

random.seed(42)
graph_names_b = [graph_name+"_b" for graph_name in graph_names]
graphs_b = [del_n_edges(graph, 5) for graph in graphs]

graph_properties(graph_names_b, graphs_b)

### 19.6a
- Create a function that, given a graph, returns the number of "triangles" (cliques of size 3) in the graph. **Do not use the triangles method from networkx**.
- For each of the 5 original graphs (before deletion of edges), identify the number of triangles from the Graphs.

In [None]:
def triangles(G):
    num_triangles = 0
    for node in G.nodes():
        for i, neighbor1 in enumerate(list(G.neighbors(node))):
            for neighbor2 in list(G.neighbors(node))[i+1:]:
                if neighbor2 in G.neighbors(neighbor1):
                    num_triangles += 1
    return num_triangles//3

[triangles(graph) for graph in graphs]

### 19.6b
For testing, compare your output with that of the `nx.triangles` method.
- Note: Divide the output by 3 because each triangle is counted thrice - once for each node in it.

An example code and output is given for your convenience. Feel free to use it (or write your own using `nx.triangles`):
```graphs = [G3, G_AB, G_D1, G_D2, G_ER]
[sum(nx.triangles(graph).values())//3 for graph in graphs]```

Output: `[0, 88, 19, 13, 5984]`

# Bonus Questions

### 19.B1 (Advanced)
Create a function to calculate the diameter of a graph.
- The function should return -1 if the graph is disconnected.
- Do not use the `nx.diameter`.
- For an extra challenge, use as few of the following as possible:
 - `nx.shortest_path` (or `nx.shortest_path_length` etc.)
 - `nx.connected_components` (or variants hereof)
 - other methods that might make this seem too easy *(can you do it with just the basic Graph object and iteration/recursion?)*