##### MY470 Computer Programming

### Problem Set 3

#### \*\*\* Example Answers \*\*\*

---
### Simulating contagion on a network

In this problem set, you are asked to write a program that simulates the contagion of disease or information on an empirical network. In academic research, contagion models have been used to study the properties of different types of networks. In practice, contagion models are extremely valuable to predict the spread of contagious disease such as the flu or STDs.

We will use the simplest of contagion models — the SI model. SI stands for susceptible and infected. The SI model assumes that once a susceptible individual is infected, there is no recovery. This is a good representation for the spread of non-curable but non-deadly infectious disease such as Herpes simplex or for the spread of new technologies and knowledge.

In the SI model we will implement, we will start with a population where everyone is susceptible. Then we will randomly pick a small number of individuals ("Patients 0") and infect them. In the next period, all the contacts of the infected individuals will get infected (thus, we will assume that the probability of transmission is 1). And so on. We will repeat the process until everyone in the network is infected or until a certain number of periods have passed (the latter is necessary for networks that are not connected and have separate components; in such situations, it is possible that the contagion never reaches some individuals). 

We will run the model on a real network. For simplicity, we will reuse the co-authorship network we analyzed in Problem Set 1. So think about the contagion in this case as learning about a new research technique, empirical finding, or theoretical result.

#### Hints

Use docstrings to describe your methods. We will subtract points from your mark if you do not use appropriate description of your code.


### Problem 1: Working in a team

Work with your assigned partner to complete and submit the problem set. You can meet in person to discuss the division of labor but we expect you to use GitHub to communicate when coding your part and merging your contributions. We will  review the Issues, Pull request, and Wiki stats for your repository. You will get the full points for this problem if we find sufficient evidence that you have made use of GitHub as a collaboration tool. 

#### Hints

One reasonable way to divide the work is to assign Problems 2 and 3 to Student A and Problems 4 and 5 to Student B.

### Problem 2: Class for network

Create a class called `UndirectedNetwork`. The class should have the following data attributes:

* `nodes` — a dictionary where the node id is a key and the value is a list with the ids of the node's neighbors (coauthors for our data); initially empty

and the following methods:

* `add_node` — takes `node_id` and initializes it as a key to `nodes` if it is not already there
* `add_neighbors` — takes two arguments: `ego_id` and `alter_id` and adds `alter_id` to `ego_id`'s list of neighbors and `ego_id` to `alter_id`'s list of neighbors, if they are not already there
* `get_node_ids` — generator method that gives the ids of the nodes in the network
* `get_node_neighbors` — generator method that takes `node_id` and gives its neighbors

Calling the `print()` function on a `UndirectedNetwork` object should print the number of nodes in the network, e.g. "Undirected network with 455 nodes".


In [1]:
# For our network example, we know that the data file contains 
# both the i-j and the j-i edges so all the checks 
# in add_neighbors() are unnecessary. However, this may not be 
# the case in another dataset and the power of classes is that
# they can cover many different situations and circumstances.

class UndirectedNetwork(object):
    """A class used to represent a network."""
    
    def __init__(self):
        """Create a new empty network."""        
        self.nodes = {}
    
    def add_node(self, node_id):
        """Take node_id and add it to the network if it is not there."""
        if node_id not in self.nodes:
            self.nodes[node_id] = []
    
    def add_neighbors(self, ego_id, alter_id):
        """Take ego_id and alter_id and update ego_id's list of neighbors."""
        
        # Make sure nodes are added to the network
        self.add_node(ego_id)
        self.add_node(alter_id)  
        
        # Add the neighbors if they are not duplicates        
        if alter_id not in self.nodes[ego_id]:
            self.nodes[ego_id].append(alter_id)
        if ego_id not in self.nodes[alter_id]:
            self.nodes[alter_id].append(ego_id)
         
    def get_node_ids(self):
        """Return the network node ids one at a time."""        
        for i in self.nodes:
            yield i
    
    def get_node_neighbors(self, node_id):
        """Take node_id and return its neighbors one at a time."""        
        for i in self.nodes[node_id]:
            yield i

    def __str__(self):
        """Print the number of nodes in the network."""        
        return "Undirected network with " + str(len(self.nodes)) + " nodes"
        

### Problem 3: Create an instance of the network class

Read the data from the file `ca-GrQc.txt` in the `data` repository (use the same relative path as in the previous problem sets). Save the data in an instance of the UndirectedNetwork class you created. Call print on the instance.


In [2]:
net = UndirectedNetwork()

for line in open('../data/ca-GrQc.txt', 'r'):
    # Ignore the comment lines at the beginning of the file
    if line[0] != '#':    
        strlst = line.strip().split('\t')
        if strlst[0] != strlst[1]: # Remove self-loops
            net.add_neighbors(int(strlst[0]), int(strlst[1]))

print(net)


Undirected network with 5241 nodes


---
### Problem 4: Class for SI model


Create a class called `SIModel` that has the following data attributes:

* `network` — an instance of class UndirectedNetwork taken at instantiation
* `susceptible_nodes` — a list of ids for nodes that are not yet infected; initially includes all nodes from `network`
* `infected_nodes` — a list of ids for nodes that are infected; initially empty
* `num_infected` — keeps track of the number of infected nodes; initially `0`

and the following methods:

* `initialize` — takes an integer `n` to randomly select `n` number of nodes and infect them; then prints the number of infected nodes
* `update` — iterates over the susceptible nodes in random order and infects those who have at least one infected neighbor; then prints the number of infected nodes. The process should be asynchronous, in the sense that a node immediately becomes infected and will then infect any susceptible neighbors who are yet to be iterated over.
* `run` — repeats `update` until all nodes are infected or until `update` has been run 30 times

Calling the `print()` function on a `SIModel` object should print `num_infected`.

#### Hints

In this problem you will need to use functions from the `random` module. You can read more about it [here](https://docs.python.org/3/library/random.html).

Make sure the methods update all the relevant data attributes when called.

In [3]:
# Typically, we should import modules before any code starts
# but we will accept import here since it only comes up for
# this problem and after
import random as ran

class SIModel(object):
    """A class used to simulate susceptible-infected contagion on a network."""
    
    def __init__(self, net):
        """Assume net is an object of type UndirectedNetwork.
        Create a new SI model using net.
        """        
        self.network = net
        self.susceptible_nodes = [i for i in net.get_node_ids()]
        self.infected_nodes = []
        self.num_infected = 0
    
    
    def initialize(self, n):
        """Assume n is an integer.
        Randomly select n nodes and infect them.
        Print the number of infected nodes.
        """        
        patients0 = ran.sample(self.susceptible_nodes, n)
        self.infected_nodes.extend(patients0)
        for i in patients0:
            self.susceptible_nodes.remove(i)
        self.num_infected = n
        print(self)
        
        
    def update(self):
        """Iterate over all susceptible nodes in random order and 
        infect those who have at least one infected neighbor.
        Implement asynchronous updating.
        Print the number of infected nodes.
        """        
        # Remember not to iterate over a list you are changing
        temp = self.susceptible_nodes[:]
        ran.shuffle(temp)
        for i in temp:
            
            # Get an iterator over i's neighbors
            nbrs = self.network.get_node_neighbors(i)
            
            # Infect if at least one neighbor is infected
            # Here, I am summing bools, where False = 0, True = 1
            if sum([(j in self.infected_nodes) for j in nbrs]) > 0:
                self.infected_nodes.append(i)
                self.susceptible_nodes.remove(i)
                self.num_infected += 1
        print(self)
        
        
    def run(self):
        """Run update and print the number of infected nodes 
        until all nodes are infected or until update has been run 30 times.
        """        
        p = 0
        # While there are any susceptible nodes 
        # and for not more than 30 iterations
        while self.susceptible_nodes and p < 30:
            self.update()
            p +=1
    
    
    def __str__(self):
        """Print the number of infected nodes."""        
        return str(self.num_infected)
    

---
### Problem 5: Run the model

Run `SIModel` using the network from Problem 2. You should initialize the simulation with 3 seeds (the number of "patients 0").


In [4]:
# The output will vary because the similation is initialized 
# with a random process. For replication purposes, we will
# fix the random seed.
ran.seed(2)
si = SIModel(net) 
si.initialize(3)
si.run()


3
416
2616
3818
4118
4157
4158
4158
4158
4158
4158
4158
4158
4158
4158
4158
4158
4158
4158
4158
4158
4158
4158
4158
4158
4158
4158
4158
4158
4158
4158


---

### Evaluation

| Problem | Mark     | Comment   
|:-------:|:--------:|:----------------------
| 1       |   /2    |   
| 2       |   /4    |      
| 3       |   /1    | 
| 4       |   /5    | 
| 5       |   /1    |
| Legibility   |   /2    | 
| Modularity   |   /2    | 
| Efficiency   |   /3    | 
|**Total**|**/20**  | 
