# Assignment 8
#### Due November 11, 2020, 23:59

In this week’s assignment, we are going dive to dive back into graph theory and expand on the subject of network science.  
Graphs are powerful constructs with even more powerful mathematical properties that we can take advantage of when we can formulate our problems as a graph. This time around, we are interested in one network property in particular: the **local clustering coefficient** of a node.

## Submission
Edit and turn in this jupyter notebook file containing your solutions to each task.  
Implement your solution to each of the exercises in the code field below the exercise description.  

The libraries you may need are already given, any extra imports are not allowed.

___

### Local clustering coefficient
In this assignment, we want to calculate the local clustering coefficient of a node in an undirected graph. 

Recall that an undirected graph consists a set of nodes that are connected to some extent, where all the edges that connect the nodes are bidirectional. 
Imagine, for example, a graph where the nodes represent people at a party pre-corona and there is an edge between two people if they shook hands. This example graph is undirected because any person, A, can shake hands with another person, B, only if B also shakes hands with A. This means that if A is connected to B, then B is also per definition automatically connected to A.

The intuition behind the **local clustering coefficient** metric is that it describes the connectivity of the neighborhood of a node. That is, the proportion of connections among its neighbours which are actually realised out of the number of all possible connections.

Imagine a person, A, that has three friends: B, C, and D. These friends are person A’s neighborhood. They all have in common that they are friends with A. However, they might not be friends with each other. The local clustering coefficient expresses how many of A’s friends are in fact also friends with each other. 

Different scenarios for the local clustering coefficient of A:
- $LCC_A = \frac{0}{3}$ -- noone is friends in the neighbourhood: no nodes are connected
- $LCC_A = \frac{1}{3}$ -- only B and C are friends (or only C and D, or only D and B)
- $LCC_A = \frac{2}{3}$ -- we have two pairs of friends in the neighbourhood
- $LCC_A = \frac{3}{3}$ -- everybody is friends in the neighbourhood: all nodes are connected


<img src="img/clustering_coeff.png" align="center">

___

## Assignment
Your task in the following exercises is to compute the local clustering coefficient from various representations of the same undirected graph, `tiny`, consisting of 5 nodes and 7 edges.


In [2]:
import numpy as np

### Exercise 1
As we know, one way of representing a graph is with an edge list. 
This representation can be found in the file `tiny_edgelist.txt`. The file contains one edge per line, shown as an edge pair of 2 integers separated by whitespace. Investigate the file to further by yourself to see the formatting of the edge pairs. 

Write a function called `coefficient_from_edgelist(edgefile, node_id)` that takes an edge list file formatted like so, and a node, and returns the local clustering coefficient for that node, rounded to 3 decimals.
___
`coefficient_from_edgelist(tiny_edgelist.txt, 2)`  
\>\> `0.667`

In [127]:
# your solution to exercise 1 here
def coefficient_from_edgelist(edgefile,node_id):
    edgeList = np.loadtxt(edgefile,delimiter=' ')
    nodes = set()
    neighbours = {}
    for i in edgeList: #get the nodes
        for j in range(2):
            nodes.add(i[j])

    
    for node in nodes: #create neighbours
        neighbours[node] = set()
        for i in edgeList:
            if node == i[0]:
                 neighbours[node].add(i[1])
            elif node == i[1]:
                neighbours[node].add(i[0])

    id_neighbours = neighbours[node_id]
    visitedEdges = []
    edge = 0
    for node in id_neighbours: #for each of the neighbours of our node_id, check if the neighbours are also neighbours of each other
        for j in id_neighbours:
            if node != j: #skip the node itself, so node doesn't consider itself as a neighbour, quite redundant because I didn't put a node as its neighbour in the neighbour dictionary anyways
                for i in neighbours[node]:#check neighbourhood of all neighbours of node_id
                    if i == j and (node,j) not in visitedEdges:
                        edge += 1
                        visitedEdges.append((node,j)) # to not add the same edge between two nodes twice
                        visitedEdges.append((j,node))
    
    return round(edge / (len(id_neighbours)*(len(id_neighbours)-1)/2),3)
            
        
coefficient_from_edgelist('tiny_edgelist.txt',2)

0.667

## Exercise 2
Another common way to represent a graph is with an adjacency matrix. 
This representation can be found in the file `tiny_adjmatrix.txt`. Investigate the file by yourself to see the formatting of the adjacency matrix. 

Write a function called `coefficient_from_adjmatrix(matrixfile, node_id)` that takes an adjacency matrix file formatted like so, and a node, and returns the local clustering coefficient for that node, rounded to 3 decimals.
___
`coefficient_from_adjmatrix(tiny_edgelist.txt, 0)`  
\>\> `1.0`

In [129]:
# your solution to exercise 2 here
def coefficient_from_adjmatrix(matrixList, node_id):
    matrixList = np.loadtxt(matrixList,delimiter=' ') 

    id_neighbours = np.where(matrixList[node_id] == 1)
    id_neighbours = id_neighbours[0] # neighbours of node_id
    
    edge = 0
    visitedEdges = []
    for neighbour in id_neighbours: #Checking whether neighbours of node_id are also neighbours of each other
        for j in id_neighbours:
            if neighbour != j and (neighbour,j) not in visitedEdges:
                if matrixList[neighbour][j] == 1:
                    edge += 1
                    visitedEdges.append((neighbour,j)) #making sure that I don't add an edge between two nodes twice
                    visitedEdges.append((j,neighbour))
                    
    return round(edge / (len(id_neighbours)*(len(id_neighbours)-1)/2),3)
    
coefficient_from_adjmatrix('tiny_adjmatrix.txt', 0)

    
    

1.0