# Colaboratory Assignment 6

**Instructions**. Below you will find several text cells with programming (short) problems. You can create how many code cells you need to answer them.

There are four problems, but you will only need to solve two. You **must** choose at least one of the problems with the title in <font color='#006633'>green</font>.


**BEFORE YOU START**

Make sure to run the code cell below, to fix the adjacency matrix problem. Also, remember that the next code cell should be the first thing you evaluate. Otherwise, you will to restart your runtime and reimport `networkx`

In [None]:
!pip uninstall scipy networkx
!pip install scipy==1.8
!pip install networkx==2.7

In [None]:
import networkx as nx
nx.__version__

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
import matplotlib.pyplot as plt
import sys
sys.path.append('/content/drive/MyDrive/ColabNotebooks')
from readlist import readlist

## <font color='#006633'>1. Paths in lines and rings</font>

In previous colaboratory assignments, you have created ring and line networks. Besides all the differences you can find in terms of the quantities that characterize a network (e.g. $n$, $m$, $\langle k \rangle$, $\rho$), the average shortest paths between the nodes in these networks are also different.

Before doing anything, think for a moment. Comparing a line with a ring network, which one do you think has a lower value for $\langle s \rangle$?

Let's check this by creating the networks.

1. Create a line network with $n = 200$
2. Create a ring network with $n = 200$
3. Obtain $\langle s \rangle_{\text{line}}$ and $\langle s \rangle_{\text{ring}}$. Use a type 3 search

Explain the difference. This a particularly interesting result, if you consider both networks. Except for one link, they are the same network!

In [None]:
def BFSall(G):
    d = {}
    for origin in G.nodes():
        d[origin] = {}
        s = BFS(G, origin)
        for j in s.keys():
            d[origin][j] = s[j]
    return d

def BFS(G, origin, dest=None):
    d = {}
    t = 0
    w = list(G.nodes())

    d[origin] = t
    w.remove(origin)
    ActiveShell = [origin]

    while len(ActiveShell) > 0:
        t += 1
        NewActiveShell = []
        for node in ActiveShell:
            for neigh in G.neighbors(node):
                if neigh == dest:
                    d[neigh] = t
                    return t
                if neigh in w:
                    NewActiveShell.append(neigh)
                    w.remove(neigh)
                    d[neigh] = t
        ActiveShell = NewActiveShell
    if dest != None:
        return print("Destination not found")
    return d

In [None]:
#Before doing anything, think for a moment. Comparing a line with a ring network, which one do you think has a lower value for  ‚ü®ùë†‚ü© ?
#I think the ring network will have a lower <s>

#Create a line network with  ùëõ=200

LN = nx.Graph()
for i in range(1,200):
  LN.add_edge(i,i+1)

#Create a ring network with  ùëõ=200

RN = nx.Graph()
n = 200
RN.add_edge(1,n)
for i in range(1,n):
  RN.add_edge(i,i+1)

#Obtain  ‚ü®ùë†‚ü©line  and  ‚ü®ùë†‚ü©ring . Use a type 3 search
savg = 0
s = BFSall(LN)
for i in s.keys():
    for q in s[i].keys():
        if i != q:
            savg += s[i][q]

n = LN.number_of_nodes()
savg = savg / (n * (n - 1))
print(f'The <s> of the line network is {savg}')

savg = 0
s = BFSall(RN)
for i in s.keys():
    for q in s[i].keys():
        if i != q:
            savg += s[i][q]

n = RN.number_of_nodes()
savg = savg / (n * (n - 1))
print(f'The <s> of the ring network is {savg}')

#Explain the difference. This a particularly interesting result, if you consider both networks. Except for one link, they are the same network!
#The s avg for the ring network is likely lower because for any given node you can go in either direction to get to the destination, meaning most of the time, there is a shorter path, rather than going all the way across in a line network.

## <font color='#006633'>2. How expensive is to use a type 3 search?</font>

A potential problem with the algorithms presented is that they could take too much time to calculate. This time should be a function of the number of nodes, but also on the particular type of network.

Use a plot to show the relationship between $n$ and the time spent in a type 3 search in a **ring** network.

Some useful pieces of information:

- Systematically increase the number of nodes. For each value of $n$, take the time spent to run a type 3 search. Use $n \in \{50, 100, 150, \ldots, 1000\}$
- To take the time spent running code, you can use several tools. My suggestion is to use the `datetime` module. A small example:
    ``` python
    import datetime as dt

    t0 = dt.datetime.now()
    SOME CODE
    tf = dt.datetime.now()
    print("Time spent executing SOME CODE: ", tf - t0)
    ```
- Note that the above block of code will give you the result in `timedelta` format. In order to use it in a plot, in an easier way, you can convert the result you can use `timedelta.total_seconds()` as explained [here](https://docs.python.org/3/library/datetime.html#timedelta-objects).
- This exercise could take more than usual to run, so I strongly suggest that you start working with **enough time** to answer this problem.

## 3. $\langle s \rangle$ when clusters are involved

For this problem, let's simulate clusters. Start by creating an empty network.

1. Create $c_1$, a cluster with 17 nodes. Remember that all of them must be structurally connected to form a cluster. You choose the labels and the way nodes are connected, just make sure not to repeat any labels.
2. Create $c_2$, a cluster with 71 nodes. The nodes must be different than those found in $c_1$.
3. Calculate $\langle s \rangle$ for **the whole network**. Use a type 3 `BFS`.
4. Connect the two clusters. To do this, simply choose one node in $c_1$ and one node in $c_2$ and create a link between them.
5. Calculate $\langle s \rangle$. Comment the result.

In [None]:
cluster_net = nx.Graph()

#create cluster 1
nodes = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q']
#connect the nodes
cluster_net.add_nodes_from(nodes)
cluster_net.add_edge('a','q')
for i in range(0,len(nodes)-1):
  cluster_net.add_edge(nodes[i],nodes[i+1])
  if i != 1:
    cluster_net.add_edge(nodes[1],nodes[i])

#create cluster 2
n = 71
for i in range(100,n+100-1):
  cluster_net.add_edge(i,i+1)

#Calculate  ‚ü®ùë†‚ü©  for the whole network. Use a type 3 BFS.
savg = 0
s = BFSall(cluster_net)
for i in s.keys():
    for q in s[i].keys():
        if i != q:
            savg += s[i][q]

n = cluster_net.number_of_nodes()
savg = savg / (n * (n - 1))
print(f'The <s> of the network is {savg}')

#Connect the two clusters. To do this, simply choose one node in  ùëê1  and one node in  ùëê2  and create a link between them.
cluster_net.add_edge('a',120)

#Calculate  ‚ü®ùë†‚ü© . Comment the result.
savg = 0
s = BFSall(cluster_net)
for i in s.keys():
    for q in s[i].keys():
        if i != q:
            savg += s[i][q]

n = cluster_net.number_of_nodes()
savg = savg / (n * (n - 1))
print(f'The <s> of the network is {savg}')

## 4. Dolphin network

[Lusseau et al. (2003) ](https://link.springer.com/article/10.1007/s00265-003-0651-y) studied a small group of dolphins. The results are in the `dolphins.txt` file (in Blackboard).

Using the data provided:

1. Using a type 3 `BFS`, save all shortest path lengths between all origins and destinations.
2. Plot the results as a histogram of shortest paths. Note that this follows the same procedure to obtain the histogram of degrees. After all, you are just counting.
3. Plot $\langle s \rangle$ as a vertical reference line. Use the `axvline()` method from `matplotlib.pyplot`