# Networks: structure, evolution & processes
**Internet Analytics - Lab 2**

---

**Group:** B

**Names:**

* Vincenzo Bazzucchi
* Amaury Combes
* Alexis Montavon

---

#### Instructions

*This is a template for part 4 of the lab. Clearly write your answers, comments and interpretations in Markodown cells. Don't forget that you can add $\LaTeX$ equations in these cells. Feel free to add or remove any cell.*

*Please properly comment your code. Code readability will be considered for grading. To avoid long cells of codes in the notebook, you can also embed long python functions and classes in a separate module. Don’t forget to hand in your module if that is the case. In multiple exercises, you are required to come up with your own method to solve various problems. Be creative and clearly motivate and explain your methods. Creativity and clarity will be considered for grading.*

In [14]:
import networkx as nx
import numpy as np
from random import choice
from collections import Counter

In [15]:
def directed_graph_from_tsv(pathname):
    file = open(pathname, 'r');
    G = nx.DiGraph()
    for line in file:
        content = line.split('\t')
        u = int(content[0])
        adjacent_nodes = map(lambda v: int(v), content[1].split())
        for v in adjacent_nodes:
            G.add_edge(u, v)
    file.close()
    return G

In [27]:
# Question for these two functions: is having a maximum number of hopes the right stop criteria?
def random_surfer(G, max_hops=100):
    counter = Counter()
    u = choice(G.nodes())
    for i in range(max_hops):
        links = list(G[u].keys())
        if len(links) == 0: # Dangling node found!
            break
        v = choice(links)
        counter[v] += 1
        u = v
    return {page: counter[page] / sum(counter.values()) for page in counter.keys()}

In [25]:
components = directed_graph_from_tsv('../data/components.graph')
absorbing = directed_graph_from_tsv('../data/absorbing.graph')

In [52]:
N_SIMUL = 10000
visited_nodes_p = 0
for i in range(N_SIMUL):
    visited_nodes_p += len(random_surfer(components))
print(visited_nodes_p / N_SIMUL, '% of nodes was visited')

4.0 % of nodes was visited


In [53]:
N_SIMUL = 10000
visited_nodes_p = 0
for i in range(N_SIMUL):
    visited_nodes_p += len(random_surfer(absorbing))
print(visited_nodes_p / N_SIMUL, '% of nodes was visited')

1.591 % of nodes was visited


We observe that even if the num ber of maximal hops is quite large, the number of visited nodes stays small.
This behavior is probably caused by dangling nodes.

#### Exercise 2.13

In [54]:
def random_surfer_improved(G, max_hops=100, damping_factor=0.15):
    counter = Counter()
    u = choice(G.nodes())
    for i in range(max_hops):
        links = list(G[u].keys())
        # if restart or dangling node start at random
        if np.random.binomial(1, damping_factor) or len(links) == 0:
            v = choice(G.nodes())
        else: # choose link at random
            v = choice(links)
        counter[v] += 1
        u = v
    return counter

In [55]:
N_SIMUL = 10000
visited_nodes_p = 0
for i in range(N_SIMUL):
    visited_nodes_p += len(random_surfer_improved(components))
print(visited_nodes_p / N_SIMUL, '% of nodes was visited')

7.9898 % of nodes was visited


In [56]:
N_SIMUL = 10000
visited_nodes_p = 0
for i in range(N_SIMUL):
    visited_nodes_p += len(random_surfer(components))
print(visited_nodes_p / N_SIMUL, '% of nodes was visited')

4.0 % of nodes was visited


In [57]:
#Do you think that the PageRank scores make intuitive sense?

---

### 2.4.2 Power Iteration Method

#### Exercise 2.14: Power Iteration method

In [67]:
g = directed_graph_from_tsv('../data/components.graph')

In [143]:
def transition_matrix(graph):
    H = np.zeros((len(graph), len(graph)))
    for u in graph:
        out_degree_u = nx.degree(g, u)
        for v in graph:
            H[u, v] = 1 / out_degree_u if graph.has_edge(u, v) else 0
    return H
        
def GoogleMatrix(graph, theta):
    N = len(graph)
    H = transition_matrix(graph)
    w = np.array([1 if nx.degree(g, node) > 0 else 0 for node in graph])
    w = np.reshape(w, (N, 1))
    onesT = np.ones((1, N))
    Hhat = H + (1 / N) * (w @ onesT)
    return theta * Hhat + (1-theta)* (onesT.T @ onesT) / N

def PageRank(google_matrix, pi_0):
    prev = np.zeros(pi_0.shape)
    curr = pi_0
    while not np.allclose(curr, prev, atol=0.02):
        prev = curr
        curr = curr @ google_matrix 
        print("CURR diff", abs(np.min(curr- prev)))
    return curr

In [144]:
nx.degree(g)

{0: 2, 1: 3, 2: 3, 3: 2, 4: 3, 5: 2, 6: 3, 7: 2}

In [145]:
GM = GoogleMatrix(g, 0.15)
r = PageRank(GM, np.array([1 / len(g) for el in g]))

CURR diff 0.00625


In [146]:
r

array([ 0.13125 ,  0.134375,  0.140625,  0.13125 ,  0.134375,  0.13125 ,
        0.140625,  0.13125 ])

---

### 2.4.3 Gaming the system *(Bonus)*

#### Exercise 2.15 *(Bonus)*