# Networks: structure, evolution & processes
**Internet Analytics - Lab 2**

---

**Group:** *R*

**Names:**

* *Raphael Barman*
* *Thierry Bossy*
* *Raphael Strebel*

---

#### Instructions

*This is a template for part 2 of the lab. Clearly write your answers, comments and interpretations in Markodown cells. Don't forget that you can add $\LaTeX$ equations in these cells. Feel free to add or remove any cell.*

*Please properly comment your code. Code readability will be considered for grading. To avoid long cells of codes in the notebook, you can also embed long python functions and classes in a separate module. Don’t forget to hand in your module if that is the case. In multiple exercises, you are required to come up with your own method to solve various problems. Be creative and clearly motivate and explain your methods. Creativity and clarity will be considered for grading.*

---

## 2.2 Network sampling

#### Exercise 2.7: Random walk on the Facebook network

In [2]:
import requests
import random

In [6]:
def randomWalker(s, N):
    u = s
    i = 0
    data = dataFromNode(u)
    age_sequence = [] # we want to return the ages we came across
    while i < N:
        age_sequence.append(data['age'])
        u = random.sample(data['friends'],1)[0] # choose the next node at random
        data = dataFromNode(u) 
        i += 1
    return age_sequence

def dataFromNode(s):
    # Base url of the API
    URL_TEMPLATE = 'http://iccluster118.iccluster.epfl.ch:5050/v1.0/facebook?user={user_id}'
    # The actual url to call 
    url = URL_TEMPLATE.format(user_id=s)
    # Execute the HTTP Get request
    response = requests.get(url)
    # Format the json response as a Python dict
    data = response.json()
    return data

In [8]:
# Target user id
user_id = 'f30ff3966f16ed62f5165a229a19b319'
data = dataFromNode(user_id)
nb_nodes = 1000
ages = randomWalker(user_id, nb_nodes)
average = sum(ages) / len(ages)
print("Estimation of the average age of a Facebook user:", average)
print("We are at %.2f percent of the average"% (average/44.3*100))
print("We visited", nb_nodes, "nodes")

Estimation of the average age of a Facebook user: 24.629
We are at 55.60 percent of the average
We visited 1000 nodes


In [4]:
def randomWalkerImproved(s, N):
    curr = s
    prev = ''
    i = 0
    sum_age = 0
    while i < N:
        data = dataFromNode(curr)
        if(curr != prev):
            sum_age += data['age']
            i += 1
        candidate = random.sample(data['friends'],1)[0]
        k_curr = len(data['friends'])
        k_candidate = len(dataFromNode(candidate)['friends'])
        p = random.uniform(0.0,1.0)
        prev = curr
        if p < (k_curr/k_candidate):
            curr = candidate
    return sum_age/N

In [7]:
# Target user id
user_id = 'f30ff3966f16ed62f5165a229a19b319'
data = dataFromNode(user_id)
nb_nodes = 1000

average = randomWalkerImproved(user_id, nb_nodes)
print("Estimation of the average age of a Facebook user:", average)
print("We are at %.2f percent of the average"% (average/44.3))
print("We visited", nb_nodes, "nodes")

Estimation of the average age of a Facebook user: 39.092
We are at 0.88 percent of the average
We visited 1000 nodes


#### Exercise 2.8

##### Exercise 2.8.1

Our estimation usually varies between ages 20 to 25.

##### Exercise 2.8.2

There are many variations due to the fact that we do not have the computational power to run more than a few thousand iterations, which is not enough to estimate the age of a billion users.

Our estimation is far from the average, this is due to the fact that the random walk is biased. The probability of being at a particular node $u$ converge to $\pi_u = \frac{\text{#neighbours of $u$}}{2\cdot|E|}$.

##### Exercise 2.8.2

It would be best to change the algorithm to work in a breadth-first search fashion instead of going in deapth as done until now.

To have better result, we could use the [Metropolis-Hastings Random Walk](https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm), which has the advantage of making the probability converge to $\pi_u = \frac{1}{|V|}$