# NTDS'18 milestone 1: network collection and properties
[Effrosyni Simou](https://lts4.epfl.ch/simou), [EPFL LTS4](https://lts4.epfl.ch)

## Students

* Team: `<your team number>`
* Students: `<the name of all students in the team>`
* Dataset: `<the dataset you used to complete the milestone>`

## Rules

* Milestones have to be completed by teams. No collaboration between teams is allowed.
* Textual answers shall be short. Typically one to three sentences.
* Code has to be clean.
* You cannot import any other library than we imported.
* When submitting, the notebook is executed and the results are stored. I.e., if you open the notebook again it should show numerical results and plots. We won't be able to execute your notebooks.
* The notebook is re-executed from a blank state before submission. That is to be sure it is reproducible. You can click "Kernel" then "Restart & Run All" in Jupyter.

## Objective 

The purpose of this milestone is to start getting acquainted to the network that you will use for this class. In the first part of the milestone you will import your data using [Pandas](http://pandas.pydata.org) and you will create the adjacency matrix using [Numpy](http://www.numpy.org). This part is project specific. In the second part you will have to compute some basic properties of your network. **For the computation of the properties you are only allowed to use the packages that have been imported in the cell below.** You are not allowed to use any graph-specific toolboxes for this milestone (such as networkx and PyGSP). Furthermore, the aim is not to blindly compute the network properties, but to also start to think about what kind of network you will be working with this semester. 

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## Part 1

Import your data and manipulate them. 

In the case where the dataset you are using does not provide a graph, you will have to build a feature graph yourself. You can find out how to do that in Section 3 of last year's [assignment](https://github.com/mdeff/ntds_2017/blob/master/assignments/03_solution.ipynb). 

Please provide below a Panda dataframe where each row corresponds to a node and its features and a Numpy array with the adjacency matrix of your graph.

In [None]:
# Your code here.
 
features = # the pandas dataframe with the features
adjacency = # the adjacency matrix
n_nodes = # the number of nodes in the network

## Part 2

Execute the cell below to plot the (weighted) adjacency matrix of your network.

In [None]:
plt.spy(adjacency, markersize=1)
plt.title('adjacency matrix')

### Question 1

What is the maximum number of links $L_{max}$ in a network with $N$ nodes (where $N$ is the number of nodes in your network)? How many links $L$ are there in your collected network? Comment on the sparsity of your network.

In [None]:
# Your code here.

**Your answer here.**

### Question 2

Is your graph directed or undirected?

**Your answer here.**

### Question 3

Are the edges of your graph weighted?

**Your answer here.**

### Question 4

What is the (weighted) degree distibution of your network? 

In [None]:
degree =  # Your code here. It should be a numpy array.

assert len(degree) == n_nodes

Compute the average degree $\langle k \rangle = \frac{2L}{N}$, and verify that it agrees with the degree distribution.

In [None]:
average_degree =  # Your code here.

assert average_degree == degree.sum() / n_nodes

Execute the cell below to see the histogram of the degree distribution.

In [None]:
weights = np.ones_like(degree) / float(n_nodes)
plt.hist(degree, weights=weights);

### Question 5

Comment on the degree distribution of your network.

**Your answer here.**

### Question 6

Write a function that takes as input the adjacency matrix of a graph and determines whether the graph is connected or not.

In [None]:
def connected_graph(adjacency):
    """Determines whether a graph is connected.
    
    Parameters
    ----------
    adjacency: numpy array
        The (weighted) adjacency matrix of a graph.
    
    Returns
    -------
    bool
        True if the graph is connected, False otherwise.
    """
    
    # Your code here.
    
    return connected

Is your graph connected? Run the ``connected_graph`` function to determine your answer.

In [None]:
# Your code here.

### Question 7

Write a function that extracts the connected components of a graph.

In [None]:
def find_components(adjacency):
    """Find the connected components of a graph.
    
    Parameters
    ----------
    adjacency: numpy array
        The (weighted) adjacency matrix of a graph.
    
    Returns
    -------
    list of numpy arrays
        A list of adjacency matrices, one per connected component.
    """
    
    # Your code here.
    
    return components

How many connected components is your network composed of? What is the size of the largest connected component? Run the ``connected_graph`` function to determine your answer. If your graph is connected, i.e., it has only one component, break it in two for the purpose of testing this function.

In [None]:
# Your code here.

### Question 8

Write a function that takes as input the adjacency matrix and two nodes (`source` and `target`) and returns the length of the shortest path between them using Dijkstra's algorithm.

In [None]:
def compute_shortest_path(adjacency, source, target):
    """Compute the shortest path between a source and target node.
    
    Parameters
    ----------
    adjacency: numpy array
        The (weighted) adjacency matrix of a graph.
    source: int
        The source node. A number between 0 and n_nodes-1.
    target: int
        The target node. A number between 0 and n_nodes-1.
    
    Returns
    -------
    int
        The length of the shortest path.
    """
    
    # Your code here.
    
    return length

### Question 9

The diameter of the graph is the length of the longest shortest path between any pair of nodes. Use the above developed function to compute the diameter of the graph or the diameter of the largest connected component of the graph.

Together with the diameter, compute the average distance, i.e., the average of the shortest paths between all pairs of nodes.

In [None]:
# Your code here.