Author: Fariba Karimi

Website: frbkrm.com

Version: 18 July 2019

Generating networks with python and networkx

# A tutorial on generating networks


The goal of this module is to famlirize the students with generative models of networks using python. 

The most famous model of networks is scale-free networks. 

### Barabasi-Albert (BA) preferential attachment network

This model of preferential attachment generates networks that have scale-free degree distributions that has been observed in many large-scale real social networks. 

In [1]:
## let's import libraries that we need
import networkx as nx
import random

In [2]:
BA_graph = nx.Graph() #generate an empty undirected graph

This model is a grwoth model. That means in each time step, a new node arrives and connect to existing nodes. The probability of connecting to existing nodes follows a preferential attachemnt: nodes with higher degree, have higher probability of being selected. 

Each arrival node connects to m alrbitrary number of exsiting nodes. We fix m to 2.

In [3]:
m = 2

That means the arraiving node has to at least connect to 2 nodes. Therefore, we need to initialize the graph based on m.

In [4]:
BA_graph.add_edge(0,1)

#### Roulette Wheel selection to choose a random node to connect to

Each arrival node has to connect to m "random" nodes that are chosen based on their degree. For example, if node A and node B both have degree of 10, we need to make sure that both of them have similar chance of being picked up. To do so we use Roulette Wheel selection. 

<img src="roulette_wheel.png">

For example, let's assume we have a star graph.

In [5]:
star_g = nx.Graph()

In [6]:
star_g.add_edge('A','B')
star_g.add_edge('A','C')
star_g.add_edge('A','D')
star_g.add_edge('A','E')
star_g.add_edge('A','F')
star_g.add_edge('A','G')

Q: what is the degree of A? What is the degree of other nodes?
In star_g graph, all nodes except A should have the same probability of being picked up. 

Let's make a Roulette Wheel algorithm:'

In [7]:
degree_dict = star_g.degree() #dictionary of degree. keys: nodes, values:degree
sum_degree = sum(deg for n,deg in star_g.degree())
cumulative_sum = 0
probability_dict = {}
for n,deg in degree_dict:
    node_prob = deg/sum_degree
    cumulative_sum += node_prob
    probability_dict[n] = cumulative_sum
    print(n , node_prob, cumulative_sum)
    

B 0.08333333333333333 0.08333333333333333
E 0.08333333333333333 0.16666666666666666
F 0.08333333333333333 0.25
C 0.08333333333333333 0.3333333333333333
A 0.5 0.8333333333333333
D 0.08333333333333333 0.9166666666666666
G 0.08333333333333333 1.0


In [8]:
from collections import defaultdict
count_node = defaultdict(int)
for i in range(1000):
    r = random.random() # a random number between 0 and 1
    for node,cum_sum in probability_dict.items():
        if r <= cum_sum:
            count_node[node] += 1
            break

In [9]:
print(count_node)

defaultdict(<class 'int'>, {'B': 87, 'E': 81, 'F': 98, 'C': 90, 'A': 480, 'D': 87, 'G': 77})


In [10]:
def pick_targets(G , m):
    # first make the probability list
    
    degree_dict = G.degree() #dictionary of degree. keys: nodes, values:degree
    sum_degree = sum(deg for n,deg in G.degree())
    cumulative_sum = 0
    probability_dict = {}
    for n,deg in degree_dict:
        node_prob = deg/sum_degree
        cumulative_sum += node_prob
        probability_dict[n] = cumulative_sum
    
    # now pick the target 
    chosen = []
    for i in range(m):
        r = random.random() # a random number between 0 and 1
        for n,cum_sum in probability_dict.items():
            if r <= cum_sum:
                chosen.append(n)
                break
    return chosen    
    

Now, let's generate a preferential attachment network model.

In [11]:
N = 200
source = m
while source < N:
    targets = pick_targets(BA_graph , m)
    for target in targets:
        BA_graph.add_edge(source,target)
    source += 1

TODO: plot the degree distribution of the graph

### Scale-free networks with triadic closure

Many real social networks contain triads which are more significant than being by chance. Let's incorporate that in the BA model.

In [12]:
G = nx.Graph()
source = m
G.add_edge(0,1)
p = 0.1 # probability of making triangles 
m = 2
N = 200
while source < N:        # Now add the other n-1 nodes
    possible_targets = pick_targets(G, m)
    # do one preferential attachment for new node
    target = possible_targets.pop()
    G.add_edge(source, target)
    count = 1
    while count < m:  # add m-1 more new links
        if random.random() < p:  # clustering step: add triangle
            neighborhood = [nbr for nbr in G.neighbors(target)
                                if not G.has_edge(source, nbr)
                                and not nbr == source]
            if neighborhood:  # if there is a neighbor without a link
                nbr = random.choice(neighborhood)
                G.add_edge(source, nbr)  # add triangle
                count = count + 1
                continue  # go to top of while loop
        # else do preferential attachment step if above fails
        target = possible_targets.pop()
        G.add_edge(source, target)
        count = count + 1
            
        
    source += 1


Let's count triangels for simple BA and BA with triads (also known as Holme-Kim algorithm):

In [13]:
sum(nx.triangles(G).values())

132

In [14]:
sum(nx.triangles(BA_graph).values())

99

### Simple random graph model

In [15]:
import itertools

N = 100
p = 0.2 # probability of edge connection in a random graph
random_graph = nx.Graph()

possible_edges=itertools.combinations(range(N),2)
for e in possible_edges:
    i,j = e
    if random.random() < p:
        random_graph.add_edge(i,j)
        
