# Assignment 8
#### Due November 11, 2020, 23:59

In this week’s assignment, we are going dive to dive back into graph theory and expand on the subject of network science.  
Graphs are powerful constructs with even more powerful mathematical properties that we can take advantage of when we can formulate our problems as a graph. This time around, we are interested in one network property in particular: the **local clustering coefficient** of a node.

## Submission
Edit and turn in this jupyter notebook file containing your solutions to each task.  
Implement your solution to each of the exercises in the code field below the exercise description.  

The libraries you may need are already given, any extra imports are not allowed.

___

### Local clustering coefficient
In this assignment, we want to calculate the local clustering coefficient of a node in an undirected graph. 

Recall that an undirected graph consists a set of nodes that are connected to some extent, where all the edges that connect the nodes are bidirectional. 
Imagine, for example, a graph where the nodes represent people at a party pre-corona and there is an edge between two people if they shook hands. This example graph is undirected because any person, A, can shake hands with another person, B, only if B also shakes hands with A. This means that if A is connected to B, then B is also per definition automatically connected to A.

The intuition behind the **local clustering coefficient** metric is that it describes the connectivity of the neighborhood of a node. That is, the proportion of connections among its neighbours which are actually realised out of the number of all possible connections.

Imagine a person, A, that has three friends: B, C, and D. These friends are person A’s neighborhood. They all have in common that they are friends with A. However, they might not be friends with each other. The local clustering coefficient expresses how many of A’s friends are in fact also friends with each other. 

Different scenarios for the local clustering coefficient of A:
- $LCC_A = \frac{0}{3}$ -- noone is friends in the neighbourhood: no nodes are connected
- $LCC_A = \frac{1}{3}$ -- only B and C are friends (or only C and D, or only D and B)
- $LCC_A = \frac{2}{3}$ -- we have two pairs of friends in the neighbourhood
- $LCC_A = \frac{3}{3}$ -- everybody is friends in the neighbourhood: all nodes are connected


<img src="img/clustering_coeff.png" align="center">

___

## Assignment
Your task in the following exercises is to compute the local clustering coefficient from various representations of the same undirected graph, `tiny`, consisting of 5 nodes and 7 edges.


In [1]:
import numpy as np

In [2]:
# helper function for calc_lcc() method inside MyGraph class
def max_connection(n):
    """
    Recursive Function to calculate the maximum number of possible connection between n nodes
    """
    if n == 0:
        return n
    return n-1 + max_connection(n-1)

In [3]:
class MyGraph:
    """
    Class representing an undirected Graph including methods to answer the specific assignments.

    Properties:
    - Filepath                    : Path to file, that should be read in to instantiate a MyGraph object
    - Nodes                       : List of unique nodes
    - Edges                       : List of tuples representing undirected edges
    - Neighbours                  : Dictionary with the value for each node as a key being the list of neighbouring nodes

    Methods:
    - load_from_edgelist          : Reads in a ".txt" file representing an edgelist 
    - load_from_adjacency_matrix  : Reads in a ".txt" file representing an adjecency matrix
    - calc_lcc                    : Returns the Local Clustering Coefficient for a given node in the graph instance
    """

    def __init__(self):
        """
        Constructs an empyt MyGraph object and initalizes the three attributes nodes, edges and neighbours.
        """
        self.nodes = []
        self.edges = []
        self.neighbours = {}
    
    def get_nodes(self):
        """
        Returns the nodes as a list of a MyGraph object. Returns None if empty.
        """
        return None if self.nodes == [] else self.nodes

    def edges(self):
        """
        Returns the edgelist as a list of tuples of a MyGraph object. Returns None if empty.
        """
        return None if self.edges == [] else self.edges

    def neighbours(self):
        """
        Returns the neighbours as a dictionary with the key being in each node and the values being the neighbouring nodes of a MyGraph object. Returns None if empty.
        """
        return None if self.neighbours == {} else self.neighbours

    def load_from_edgelist(self, filepath):
        """
        Load the attributes of a MyGraph object (nodes, edges and neighbour dict) from an edgelist, whose filepath is provided as an input argument.
        """
        with open(filepath, 'r') as f:
            for line in f:
                # load edges 
                self.edges.append(tuple([int(i) for i in line.strip().split(' ')]))
    
        # load nodes
        for edge in self.edges:
            for node in edge:
                self.nodes.append(node)    
        self.nodes = sorted(list(set(self.nodes)))

        # load neighbours
        for edge in self.edges:
            if edge[0] in self.neighbours:
                self.neighbours[edge[0]].append(edge[1])
            else: self.neighbours[edge[0]] = [edge[1]]

            if edge[1] in self.neighbours:
                self.neighbours[edge[1]].append(edge[0])
            else: self.neighbours[edge[1]] = [edge[0]]

    def load_from_adjacency_matrix(self, filepath):
        """
        Load the attributes of a MyGraph object (nodes, edges and neighbour dict) from an adjacency matrix, whose filepath is provided as an input argument.
        """
        with open(filepath, 'r') as f:
            # load nodes
            for i in range(len(f.readline().strip().split(' '))):
                self.nodes.append(i)
            self.nodes = sorted(self.nodes)

        with open(filepath, 'r') as f:
            # load edges
            for x, line in enumerate(f):
                l = [int(i) for i in line.strip().split(' ')[x:]]
                for y, element in enumerate(l):
                    if element == 1:
                        self.edges.append(tuple((y+x, x)))


            # load neighbors
        for edge in self.edges:
            if edge[0] in self.neighbours:
                self.neighbours[edge[0]].append(edge[1])
            else: self.neighbours[edge[0]] = [edge[1]]

            if edge[1] in self.neighbours:
                self.neighbours[edge[1]].append(edge[0])
            else: self.neighbours[edge[1]] = [edge[0]]

    def calc_lcc(self, nodeid):
        """
        Returns the Local Clustering Coefficient (LCC) for any node (provided as input argument nodeid) in the MyGraph object.
        """
        # get the neighbours of the nodeid
        neighbours = self.neighbours[nodeid] # list of neigbours
        num_neighbours = len(neighbours)

        # calculate the max possible connections depending of len(neighbours)
        max_connections = max_connection(num_neighbours)

        # calculate the number of actual connections by iterating through edges
        connections = 0 
        for edge in self.edges:
            if edge[0] in neighbours and edge[1] in neighbours:
                connections += 1

        return round(connections/max_connections, 3)

    def print_summary(self, node_id, lcc):
        """
        Prints out a summary for the specific assignment 1&2.
        """
        # print report
        print('SUMMARY')
        print('Filename\tGraph')
        print('---------------------------------------------------------------------------')
        print(f'Nodes\t\t{self.nodes}')
        print(f'#Nodes\t\t{len(self.nodes)}')
        print(f'Edges\t\t{self.edges}')
        print(f'#Edges\t\t{len(self.edges)}')
        print('---------------------------------------------------------------------------')
        print(f'#NodeID\t\t{node_id}')
        print(f'Neighbours\t{self.neighbours[node_id]}')
        print(f'LCC\t\t{lcc}')

### Exercise 1
As we know, one way of representing a graph is with an edge list. 
This representation can be found in the file `tiny_edgelist.txt`. The file contains one edge per line, shown as an edge pair of 2 integers separated by whitespace. Investigate the file to further by yourself to see the formatting of the edge pairs. 

Write a function called `coefficient_from_edgelist(edgefile, node_id)` that takes an edge list file formatted like so, and a node, and returns the local clustering coefficient for that node, rounded to 3 decimals.
___
`coefficient_from_edgelist(tiny_edgelist.txt, 2)`  
\>\> `0.667`

In [4]:
def coefficient_from_edgelist(edgefile, node_id):
    """
    Inputs: Edge List File and a node
    Return: LCC for specific node
    """ 
    G = MyGraph()
    G.load_from_edgelist(edgefile)

    lcc = G.calc_lcc(node_id)

    G.print_summary(node_id, lcc)

    return lcc

coefficient_from_edgelist("tiny_edgelist.txt", 2)

SUMMARY
Filename	Graph
---------------------------------------------------------------------------
Nodes		[0, 1, 2, 3, 4]
#Nodes		5
Edges		[(0, 1), (0, 3), (1, 2), (1, 3), (1, 4), (2, 3), (2, 4)]
#Edges		7
---------------------------------------------------------------------------
#NodeID		2
Neighbours	[1, 3, 4]
LCC		0.667


0.667

## Exercise 2
Another common way to represent a graph is with an adjacency matrix. 
This representation can be found in the file `tiny_adjmatrix.txt`. Investigate the file by yourself to see the formatting of the adjacency matrix. 

Write a function called `coefficient_from_adjmatrix(matrixfile, node_id)` that takes an adjacency matrix file formatted like so, and a node, and returns the local clustering coefficient for that node, rounded to 3 decimals.
___
`coefficient_from_adjmatrix(tiny_edgelist.txt, 0)`  
\>\> `1.0`

In [5]:
def coefficient_from_adjmatrix(matrixfile, node_id):
    """
    Inputs: Adj Matrix File and a node
    Return: LCC for specific node
    """ 
    G = MyGraph()
    G.load_from_adjacency_matrix(matrixfile)

    lcc = G.calc_lcc(node_id)

    G.print_summary(node_id, lcc)

    return lcc

coefficient_from_adjmatrix("tiny_adjmatrix.txt", 0)

SUMMARY
Filename	Graph
---------------------------------------------------------------------------
Nodes		[0, 1, 2, 3, 4]
#Nodes		5
Edges		[(1, 0), (3, 0), (2, 1), (3, 1), (4, 1), (4, 2)]
#Edges		6
---------------------------------------------------------------------------
#NodeID		0
Neighbours	[1, 3]
LCC		1.0


1.0