# Degree correlations and assortativity

In this problem, we consider degree correlations and assortativity of a real-world network: a network of collaborations between network scientists (from M. E. J. Newman, "Finding community structure in networks using the eigenvectors of matrices." Phys. Rev. E 74, 036104 (2006)).

In [None]:
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats
import matplotlib as mpl

In [None]:
# NO NEED TO MODIFY THESE FUNCTIONS
def create_scatter(x_degrees, y_degrees, network_title):
    """
    For x_degrees, y_degrees pair, creates and
    saves a scatter of the degrees.

    Parameters
    ----------
    x_degrees: list or array
    y_degrees: list or array
    network_title: str
        a network-referring title (string) for figures

    Returns
    -------
    fig: figure object
    """
    fig, ax = plt.subplots(figsize=(6, 6), layout='constrained')
    alpha = 0.5
    ax.plot(x_degrees, y_degrees, 'r', ls='', marker='o', ms=3, alpha=alpha)
    ax.set_xlabel(r'Degree $k$')
    ax.set_ylabel(r'Degree $k$')
    ax.set_title(network_title)

    return fig

def create_heatmap(x_degrees, y_degrees, network_title):
    """
    For x_degrees, y_degrees pair, creates and
    saves a heatmap of the degrees.

    Parameters
    ----------
    x_degrees: list or array
    y_degrees: list or array
    network_title: str
        a network-referring title (string) for figures

    Returns
    -------
    fig: figure object
    """
    k_min = np.min((x_degrees, y_degrees))
    k_max = np.max((x_degrees, y_degrees))

    n_bins = k_max - k_min + 1
    statistic = stats.binned_statistic_2d(x_degrees, y_degrees, None, 
                                          statistic='count', bins=n_bins)[0]

    fig, ax = plt.subplots(figsize=(6, 6), layout='constrained')
    cmap = plt.get_cmap('jet')
    ax.imshow(statistic, extent=(k_min-0.5, k_max+0.5, k_min-0.5, k_max+0.5),
              origin='lower', cmap=cmap, interpolation='nearest')
    ax.set_title(network_title)
    ax.set_xlabel(r'Degree $k$')
    ax.set_ylabel(r'Degree $k$')
    norm = mpl.colors.Normalize(vmin=np.min(statistic), vmax=np.max(statistic))
    scm = mpl.cm.ScalarMappable(norm=norm, cmap=cmap)
    fig.colorbar(scm, ax=ax)
    return fig

## Loading the data
Let us load the data from the right folder. If you run this notebook on your machine, please specify the right folder.

In [None]:
# Select data directory
import os
import pickle
if os.path.isdir('/coursedata'):
    course_data_dir = '/coursedata'
elif os.path.isdir('../data'):
    course_data_dir = '../data'
else:
    # Specify course_data_dir on your machine
    course_data_dir = '.'
    # YOUR CODE HERE
    #raise NotImplementedError()

print('The data directory is %s' % course_data_dir)

network_fname = os.path.join(course_data_dir, 'netscience.gml')
network = nx.read_gml(network_fname)

## a,b) Generate a degree-degree scatterplot
**Generate a scatter plot** of the degrees of pairs of connected nodes. That is, take each connected pair of nodes $(i,j)$, take their degrees $k_i$ and $k_j$, plot the point $(k_i,k_j)$ on two axes with degrees as their units, and repeat for all pairs of connected nodes. Because the network is undirected, the plot should be symmetrical, containing points $(k_i,k_j)$ and $(k_j,k_i)$ for all connected pairs $(i,j)$. 

First, extract the degrees of links with a function, and then use the lists returned by the function for the scatterplot. Have a look at the plot, and answer the questions in MyCourses.


In [None]:
def get_x_and_y_degrees(network):
    """
    For the given network, creates two lists (x_degrees and y_degrees) 
    of degrees such that "start" and "end" nodes of each edge is in 
    x_degrees and y_degrees, respectively. 
    That is, the degree of the start node of n-th edge is x_degrees[n] 
    and the degree of the end node of n-th edge is y_degrees[n].
    For undirected networks, each edge is considered twice because 
    each attached node is counted as a start node and an end node.

    Parameters
    ----------
    network: a NetworkX graph object

    Returns
    -------
    x_degrees: list
    y_degrees: list
    """
    edges = network.edges()
    x_degrees = []
    y_degrees = []
    # TODO: write the correct definition for x_arrays and y_arrays
    # YOUR CODE HERE
    raise NotImplementedError()
    return x_degrees, y_degrees

In [None]:
# use the above-defined function to generate and save the scatterplot
x_degrees, y_degrees = get_x_and_y_degrees(network)
fig = create_scatter(x_degrees, y_degrees, 'Collaboration network')
fig.savefig('scatterplot_degrees_collnet.pdf')

## c) Heatmap
Next **produce a heat map** of the degrees of connected node pairs (http://en.wikipedia.org/wiki/Heat_map). The heat map uses the same information as you used in part a), that is, the degrees of pairs of connected nodes. However, no points are plotted: rather, the two degree axes are **binned** and the number of degree pairs $(k_i,k_j)$ in each bin is computed. Then, the bin is colored according to this number (e.g., red = many connected pairs of nodes with degrees falling in the bin). Again, the heat map ought to be symmetrical. **What extra information** do you gain by using a heatmap instead of just a scatter plot (if any)?

In [None]:
fig = create_heatmap(x_degrees, y_degrees, "collaboration_network")

In [None]:
# Save figure 
path="./"
fig.savefig(path+'collaboration_network_heatmap'+'.pdf');

## d) Assortativity
The assortativity coefficient is defined as the Pearson correlation coefficient of the degrees of pairs of connected nodes.
**Calculate and report the assortativity coefficients for the collaboration network** both using `scipy.stats.pearsonr` and NetworkX function `degree_assortativity_coefficient`. Check that both methods return the same value. 

In [None]:
def assortativity(x_degrees, y_degrees):
    """
    Calculates assortativity for a network, i.e. Pearson correlation
    coefficient between x_degrees and y_degrees in the network.

    Parameters
    ----------
    x_degrees: np.array
    y_degrees: np.array

    Returns
    -------
    assortativity: float
        the assortativity value of the network as a number
    """
    # assortativity = ? # to be replaced
    # TIP: #TODO: write code for calculating assortativity
    # YOUR CODE HERE
    raise NotImplementedError()
    return assortativity

In [None]:
 # assortativity
assortativity_own = assortativity(x_degrees, y_degrees)
assortativity_nx = nx.degree_assortativity_coefficient(network)
print("Own assortativity: {:.5g}".format(assortativity_own))
print("NetworkX assortativity: {:.5g}".format(assortativity_nx))

## d) Average Nearest Neighbor Degree

For the collaboration network, **compute** the average nearest neighbour degree $k_{nn}$ of each node and **make a scatter plot** of $k_{nn}$ as a function of $k$. In the same plot, **draw** also the curve of $\langle k_{nn} \rangle(k)$ as a function of $k$, i.e. the average of $k_{nn}$ for each $k$ value. What do you see?

In [None]:
def get_nearest_neighbor_degree(network):
    """
    Calculates the average nearest neighbor degree for each node.

    Parameters
    ----------
    network: NetworkX graph

    Returns
    -------
    degrees: list-like
        array of degree of nodes
    nearest_neighbor_degrees: list-like
        an array of average nearest neighbor degree of nodes 
        in the same order as degrees
    """
    degrees = [] #to be replaced
    avg_nearest_neighbor_degrees = [] #to be replaced
    # TODO: Calculate degrees and average nearest neighbor degrees of network
    # Hint: if using nx.degree() and nx.average_neighbor_degree(), remember
    # that key-value pairs of dictionaries are not automatically in a fixed order!

    # YOUR CODE HERE
    raise NotImplementedError()
    return degrees, avg_nearest_neighbor_degrees

In [None]:
def calculate_average_y_for_each_unique_x_value(x_values, y_values):
    """
    Calculates average of y values for each unique x value.

    Parameters
    ----------
    x_values: an array of x values
    y_values: an array of corresponding y values

    Returns
    -------
    unique_x_values: an array of unique x values
    bin_average: an array of average y values per each unique x
    """
    # TODO: make an array of unique x-values. 
    # This array will be used to compute average of y-values corresponding to 
    # each unique x-value.
    unique_x_values = np.array([]) # replace
    # YOUR CODE HERE
    raise NotImplementedError()

    # Now we compute the average of y-values corresponding to each unique x-value 
    # and save them in a np.array which is ordered based on unique_x_values 
    grouped_y_values = {x:[] for x in unique_x_values}
    for i, x in enumerate(x_values):
        y = y_values[i]
        grouped_y_values[x].append(y)
    y_averages = np.array([np.mean(grouped_y_values[unique_x]) 
                           for unique_x in unique_x_values])
    return unique_x_values, y_averages

In [None]:
#you do not need to change anything in this function
def visualize_nearest_neighbor_degree(degrees, nearest_neighbor_degrees, 
                                      unique_x_value, y_averages,
                                      network_title):
    """
    Visualizes the nearest neighbor degree for each degree as a scatter and
    the mean nearest neighbor degree per degree as a line.

    Parameters
    ----------
    degrees: list-like
        an array of node degrees
    nearest_neighbor_degrees: list-like
        an array of average nearest neighbor degree of nodes 
        in the same order as degrees
    unique_x_value: list-like
        unique degree values
    y_averages: list-like
        the mean nearest neighbor degree per unique degree value
    network_title: str
        network-referring title (string) for figure

    Returns
    -------
    fig : figure object
    """

    fig, ax = plt.subplots(figsize=(6, 4), layout='constrained')
    ax.scatter(degrees, nearest_neighbor_degrees, marker='.', 
               label=r'$k_{nn}(k)$')
    ax.plot(unique_x_value, y_averages, color='r', lw=2, 
            label=r'$\langle k_{nn} \rangle (k)$')
    ax.set_xscale('log')
    ax.set_yscale('log')
    ax.set_title(network_title)
    ax.set_xlabel(r'Degree $k$')
    ax.set_ylabel(r'Average nearest neighbor degree $k_{nn}$')
    ax.legend(loc=0)
    return fig

In [None]:
network_title = 'Scientific collaboration network'

degrees, nearest_neighbor_degrees = get_nearest_neighbor_degree(network)
unique_degrees, mean_nearest_neighbor_degrees = \
    calculate_average_y_for_each_unique_x_value(degrees, nearest_neighbor_degrees)
fig = visualize_nearest_neighbor_degree(degrees,
                                        nearest_neighbor_degrees,
                                        unique_degrees,
                                        mean_nearest_neighbor_degrees,
                                        network_title)

In [None]:
# Save figure
path='./'
fig.savefig(path+'collaborations_knn'+'.pdf')