# Colaboratory Assignment 5.2

**Instructions**. Below you will find several text cells with programming (short) problems. You can create how many code cells you need to answer them.

There are four problems, but you will only need to solve two. You **must** choose at least one of the problems with the title in <font color='#006633'>green</font>.


**BEFORE YOU START**

Make sure to run the code cell below, to fix the adjacency matrix problem. Also, remember that the next code cell should be the first thing you evaluate. Otherwise, you will to restart your runtime and reimport `networkx`

In [None]:
!pip uninstall scipy networkx
!pip install scipy==1.8
!pip install networkx==2.7
import networkx as nx
nx.__version__

## 1. Creating clusters

In the slides and videos for this lesson, we reviewed how to count the clusters in the Watergate network. This problem requires to **add** a new cluster to the network.

The steps to achieve this are as follows:

1. Import the network and count the number of clusters, just like we showed you on the videos for this lesson.
2. Add 20 new nodes, using whatever label you want. Just make sure not to repeat an existing label.
3. Connect those new nodes in such a way that each of them is connected to the rest (of the new nodes). i.e. its degree should be 19
4. Count the number of clusters in this new version of the Watergate network. You should have one more than in step 1.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
import matplotlib.pyplot as plt
import sys
sys.path.append('/content/drive/MyDrive/ColabNotebooks')
from readlist import readlist

W = readlist("/content/drive/MyDrive/ColabNotebooks/watergate-testimony-links.dat", 0)

'''plt.figure(figsize=(10,10))
nx.draw_networkx(W)'''

#1 Import the network and count the number of clusters, just like we showed you on the videos for this lesson.
print(nx.number_connected_components(W))

#2 Add 20 new nodes, using whatever label you want. Just make sure not to repeat an existing label.
#new_node_list = [i for i in range(1,20+1)]
W.add_nodes_from(new_node_list)

#3 Connect those new nodes in such a way that each of them is connected to the rest (of the new nodes). i.e. its degree should be 19
print(new_node_list)
for i in range(1,20):
  W.add_edge(i,i+1)
  if i != 1:
    W.add_edge(1,i)

#4 Count the number of clusters in this new version of the Watergate network. You should have one more than in step 1.
print(nx.number_connected_components(W))

nx.draw_networkx(W)

## 2. Clusters don't have to be completely connected

In the previous problem we created a new cluster where all nodes are connected with each other. This does have to be the case when creating a new cluster. For this problem, create a hub and spoke network of 50 nodes. You get to choose the hub node.

If you count the number of clusters in this network, you should obtain `1`. Now, add 20 new nodes, and connect them in such a way that with the **minimal number of connections**, you increase the number of clusters from `1` to **exactly** `2`

In [None]:
HS = nx.Graph()

for i in range (2,50+1):
  HS.add_edge(1,i)
print(nx.number_connected_components(HS))

for i in range (100,119):
  HS.add_edge(i,i+1)
print(nx.number_connected_components(HS))

nx.draw_networkx(HS)

## <font color='#006633'>3. Inside clusters</font>

Go back to problems 1 and 2. In both of them, you created new clusters using the same number of nodes. However, they are very different clusters because their links are placed very differently. To show this difference:

1. Identify the new cluster in the watergate network, save it as `W_newcluster`
2. Do the same for the new cluster in the hub and spoke network. Save it as `HS_newcluster` (Hint: in both networks, these new clusters have fewer nodes than the _original_ cluster)
3. Show the differences in terms of the average degree and density for both new clusters.

In [None]:
#1 Identify the new cluster in the watergate network, save it as W_newcluster
for c in nx.connected_components(W):
  subW = W.subgraph(c)
  if len(subW.nodes()) == 20:
    W_newcluster = subW

#2 Do the same for the new cluster in the hub and spoke network. Save it as HS_newcluster (Hint: in both networks, these new clusters have fewer nodes than the original cluster)
for c in nx.connected_components(HS):
  subHS = HS.subgraph(c)
  if len(subHS.nodes()) == 20:
    HS_newcluster = subHS

#3 Show the differences in terms of the average degree and density for both new clusters.
kavgW = (2 * W_newcluster.number_of_edges()) / W_newcluster.number_of_nodes()
kavgHS = (2 * HS_newcluster.number_of_edges()) / HS_newcluster.number_of_nodes()
print(f'The avg degree for W_newcluster is {kavgW}. The avg degree for HS_newcluster is {kavgHS}, {kavgW-kavgHS}')

rhoW = (2 * W_newcluster.number_of_edges()) / (W_newcluster.number_of_nodes() * (W_newcluster.number_of_nodes() - 1))
rhoHS = (2 * HS_newcluster.number_of_edges()) / (HS_newcluster.number_of_nodes() * (HS_newcluster.number_of_nodes() - 1))
print(f'The density for W_newcluster is {rhoW}. The density for HS_newcluster is {rhoHS}, {rhoW-rhoHS}')

## <font color='#006633'>4. Maximizing the number of new components</font>

In Problem 2, we incorporated 20 new nodes into the hub-and-spoke network, resulting in a modified network comprising two clusters. Given this context, if we were to introduce an additional 14 nodes, what would be the **maximum number of clusters** that could be formed? (Note: Since the network from Problem 2 already contains 2 clusters, the answer should exceed 2.)