# Networks and their Structure: Network Science

## Preferential Attachment

We have concluded that random graphs are not necessarily a good model, certainly for the one network we looked at showing a citation network.

Here we look at another model which we call PA Graphs (PA stands for Preferential Attachment), see the introduction in the last few slides.

PA Graphs are defined algorithmically: first a complete directed graph on $m$ nodes is created (in a complete graph every node is joined to every other node in both directions).  Then new nodes are added one at a time and each links to $m$ existing nodes.  The probability that the new node links to an old node is proportional to the in-degree of the old node.  The total number of nodes is $n$, where $n$ is generally much larger than $m$.  Duplicate links are eliminated so a new node might be connected to fewer than $m$ nodes.

The code below can be used to generate such graphs.

In [1]:
#based on code from http://www.codeskulptor.org/#alg_dpa_trial.py

import random

#first we need
class PATrial:
    """
    Used when each new node is added in creation of a PA graph.
    Maintains a list of node numbers with multiple instances of each number.
    The number of instances of each node number are in proportion to the
    probability that it is linked to.
    Uses random.choice() to select a node number from this list for each trial.
    """

    def __init__(self, num_nodes):
        """
        Initialize a PATrial object corresponding to a 
        complete graph with num_nodes nodes
        
        Note the initial list of node numbers has num_nodes copies of
        each node number
        """
        self._num_nodes = num_nodes #note that the vertices are labelled from 0 so self._num_nodes is the label of the next vertex to be added
        self._node_numbers = [node for node in range(num_nodes) for dummy_idx in range(num_nodes)]


    def run_trial(self, num_nodes):
        """
        Conduct num_node trials using by applying random.choice()
        to the list of node numbers
        
        Updates the list of node numbers so that the number of instances of
        each node number is in the same ratio as the desired probabilities
        
        Returns:
        Set of nodes
        """       
        #compute the neighbors for the newly-created node
        new_node_neighbors = set()
        for dummy_idx in range(num_nodes):
            new_node_neighbors.add(random.choice(self._node_numbers))
        # update the list of node numbers so that each node number 
        # appears in the correct ratio
        self._node_numbers.extend(list(new_node_neighbors))
        # also add to the list of node numbers the id of the current node
        # since each node must appear once in the list else no future node will link to it
        # note that self._node_numbers will next be incremented
        self._node_numbers.append(self._num_nodes)
        # update the number of nodes
        self._num_nodes += 1
        return new_node_neighbors
    
def make_complete_graph(num_nodes):
    """Takes the number of nodes num_nodes and returns a dictionary
    corresponding to a complete directed graph with the specified number of
    nodes. A complete graph contains all possible edges subject to the
    restriction that self-loops are not allowed. The nodes of the graph should
    be numbered 0 to num_nodes - 1 when num_nodes is positive. Otherwise, the
    function returns a dictionary corresponding to the empty graph."""
    #initialize empty graph
    complete_graph = {}
    #consider each vertex
    for vertex in range(num_nodes):
        #add vertex with list of neighbours
        complete_graph[vertex] = set([j for j in range(num_nodes) if j != vertex])
    return complete_graph
    
def make_PA_Graph(total_nodes, out_degree):
    """creates a PA_Graph on total_nodes where each vertex is iteratively
    connected to a number of existing nodes equal to out_degree"""
    #initialize graph by creating complete graph and trial object
    PA_graph = make_complete_graph(out_degree)
    trial = PATrial(out_degree)
    for vertex in range(out_degree, total_nodes):
        PA_graph[vertex] = trial.run_trial(out_degree)
    return PA_graph

We can use the code above to create a PA graph by writing

```python
EX_GRAPH_PA1 = make_PA_Graph(n, m)
```

where $n$ and $m$ are positive integers for us to choose.  

If we aim to construct a graph similar to the citation graph, what values should we choose for $n$ and $m$?  

Once you have determined reasonable values, construct the graph and plot its in-degree distribution and compare it to the plot for the citation graph.  Is it a better model?  (Actually, the values of $n$ and $m$ do not much influence the "shape" of the distribution.)

Use the next cell to import functions from the other notebooks.

In [2]:
%run NatSfunctions.ipynb