##### MY470 Computer Programming

### Problem Set 3, AT 2023

#### \*\*\* Due 12:00 noon on Monday, November 6 \*\*\*

---
### Simulating contagion on a network

In this problem set, you are asked to write a program that simulates the contagion of disease or information on an empirical network. In academic research, contagion models have been used to study the properties of different types of networks. In practice, contagion models are extremely valuable to predict the spread of contagious disease such as the flu or STDs.

We will use the simplest of contagion models — the SI model. SI stands for susceptible and infected. The SI model assumes that once a susceptible individual is infected, there is no recovery. This is a good representation for the spread of non-curable but non-deadly infectious disease such as Herpes simplex or for the spread of new technologies and knowledge.

In the SI model we will implement, we will start with a population where everyone is susceptible. Then we will randomly pick a small number of individuals ("Patients 0") and infect them. In the next period, all the contacts of the infected individuals will get infected (thus, we will assume that the probability of transmission is 1). And so on. We will repeat the process until everyone in the network is infected or until a certain number of periods have passed (the latter is necessary for networks that are not connected and have separate components; in such situations, it is possible that the contagion never reaches some individuals). 

We will run the model on a real network. For simplicity, we will reuse the co-authorship network we analyzed in Problem Set 1. So think about the contagion in this case as learning about a new research technique, empirical finding, or theoretical result.

#### Hints

Use docstrings to describe your methods. We will subtract points from your mark if you do not use appropriate description of your code.


### Problem 1: Working in a team

Work with your assigned partner to complete and submit the problem set. You can meet in person to discuss the division of labor but we expect you to use GitHub to communicate when coding your part and merging your contributions. We will  review the Issues, Pull request, and Wiki stats for your repository. You will get the full points for this problem if we find sufficient evidence that you have made use of GitHub as a collaboration tool. 

#### Hints

One reasonable way to divide the work is to assign Problems 2 and 3 to Student A and Problems 4 and 5 to Student B.


### Problem 2: Class for network

Create a class called `UndirectedNetwork`. The class should have the following data attributes:

* `nodes` — a dictionary where the node id is a key and the value is a list with the ids of the node's neighbors (coauthors for our data); initially empty

and the following methods:

* `add_node` — takes `node_id` and initializes it as a key to `nodes` if it is not already there
* `add_neighbors` — takes two arguments: `ego_id` and `alter_id` and adds `alter_id` to `ego_id`'s list of neighbors and `ego_id` to `alter_id`'s list of neighbors, if they are not already there
* `get_node_ids` — generator method that gives the ids of the nodes in the network
* `get_node_neighbors` — generator method that takes `node_id` and gives its neighbors

Calling the `print()` function on a `UndirectedNetwork` object should print the number of nodes in the network, e.g. "Undirected network with 455 nodes".


In [3]:
# Enter your answer to Problem 2 below. 
class UndirectedNetwork():
    """Represents an undirected network of nodes and neighbours.

    This class provides methods to add nodes and links between nodes in
    an undirected network, and generates ids and its respective neighbour ids.

    Attributes:
    - nodes (dict): A dictionary where keys are node IDs, and the value is a list of
      neighbor node IDs.
    """
    def __init__(self, nodes = None):
        self.nodes = {}

    def add_node(self, node_id):
        """Adds new node as a dictionary key to nodes 
        if it doesn't already exist.
        """
        if node_id not in self.nodes:
            self.nodes[node_id] = []

    def add_neighbours(self, ego_id, alter_id):
        """Appends each node's ID to the other's list of neighbors 
        if the nodes do not already exist in the network. alter_id is added
        to ego_id's list of neighbours and vice versa. 
        """
        if ego_id not in self.nodes:
            self.add_node(ego_id)
        if alter_id not in self.nodes:
            self.add_node(alter_id)

        self.nodes[ego_id].append(alter_id)
        self.nodes[alter_id].append(ego_id)

    def get_node_ids(self):
        """Generates ids of the nodes in the network.
        """
        for ids in self.nodes:
            yield ids

    def get_node_neighbours(self, node_id):
        """Generates neighbour ids of each node.
        """
        for neighbor_id in self.nodes[node_id]:
            yield neighbor_id
        
    def __str__(self):
        return f'Undirected network with {len(list(self.get_node_ids()))} nodes'

### Problem 3: Create an instance of the network class

Read the data from the file `ca-GrQc.txt` in the `data` repository (use the same relative path as in the previous problem sets). Save the data in an instance of the UndirectedNetwork class you created. Call print on the instance.


In [4]:
# Enter your answer to Problem 3 below. 
# Create an instance of the UndirectedNetwork class
network = UndirectedNetwork()
for line in open('../data/ca-GrQc.txt', 'r'):
    # Ignores comment lines at the beginning of the file
    if line[0] != '#':    
        line = line.strip().split('\t')
        network.add_node(line[0])
        network.add_neighbours(line[0], line[1])

print(network)

Undirected network with 5242 nodes


---
### Problem 4: Class for SI model


Create a class called `SIModel` that has the following data attributes:

* `network` — an instance of class UndirectedNetwork taken at instantiation
* `susceptible_nodes` — a list of ids for nodes that are not yet infected; initially includes all nodes from `network`
* `infected_nodes` — a list of ids for nodes that are infected; initially empty
* `num_infected` — keeps track of the number of infected nodes; initially `0`

and the following methods:

* `initialize` — takes an integer `n` to randomly select `n` number of nodes and infect them; then prints the number of infected nodes
* `update` — iterates over the susceptible nodes in random order and infects those who have at least one infected neighbor; then prints the number of infected nodes. The process should be asynchronous, in the sense that a node immediately becomes infected and will then infect any susceptible neighbors who are yet to be iterated over.
* `run` — repeats `update` until all nodes are infected or until `update` has been run 30 times

Calling the `print()` function on a `SIModel` object should print `num_infected`.

#### Hints

In this problem you will need to use functions from the `random` module. You can read more about it [here](https://docs.python.org/3/library/random.html).

Make sure the methods update all the relevant data attributes when called.

In [14]:
# Enter your answer to Problem 4 below. 
import random

class SIModel():
    """  This class creates an SI model with the data attributes:
    network, susceptible nodes, infected noded and number of infected nodes.
    It takes an instance of the class UndirectedNetwork as an argument and uses the nodes 
    from that instance for the simulation of the infection process.
    It uses methods to randomly infect the first nodes, then infect the neighbours of such 
    nodes and then continues this process for 30 times or until every node is infected.
    """
    def __init__(self, network):
        self.network = network         
        self.susceptible_nodes = list(self.network.get_node_ids())
        self.infected_nodes = []
        self.num_infected = 0


    def initialize(self, n):
        """ This method randomly infects n nodes. N is the number of
        nodes that should be infected and is passed as an argument in the method.
        Then it prints the number of infected nodes. 
        """
        nodes = random.sample(self.susceptible_nodes, n)
        for id in nodes: 
            self.susceptible_nodes.remove(id)
            self.infected_nodes.append(id)
            self.num_infected = len(self.infected_nodes)
        print("Number of infected nodes after initialization:" + " " + str(self.num_infected))
    
    def update(self):
        """ This method randomly iterates over the susceptible nodes and
        infects all nodes who have at least on infected neighbor. Then it
        prints the number of infected nodes. 
        """
        random.shuffle(self.susceptible_nodes)
        infected_set = set(self.infected_nodes)
        for node_id in self.susceptible_nodes:
            neighbours = list(self.network.get_node_neighbours(node_id))
            if any(neighbour in infected_set for neighbour in neighbours):
                self.susceptible_nodes.remove(node_id)
                self.infected_nodes.append(node_id)
        self.num_infected = len(self.infected_nodes)
        print("Number of infected nodes:", self.num_infected)
    
    def run(self):
        """ This method repeats the update method until every node 
        is infected or until the number of iteration is 30. 
        """
        num_iterations = 0
        while len(self.susceptible_nodes) > 0 and num_iterations < 30:
            self.update() 
            num_iterations += 1 
        print("Infection process completed.")
    
    def str(self):
        return "Number of infected nodes after infection process has ended:" + " " + str(self.num_infected)


---
### Problem 5: Run the model

Run `SIModel` using the network from Problem 2. You should initialize the simulation with 3 seeds (the number of "patients 0").


In [13]:
# Enter your answer to Problem 5 below. 

simodel = SIModel(network)
simodel.initialize(3)
simodel.run()
print(simodel.str())

Number of infected nodes after initialization: 3
Number of infected nodes: 13
Number of infected nodes: 67
Number of infected nodes: 337
Number of infected nodes: 1094
Number of infected nodes: 2132
Number of infected nodes: 3047
Number of infected nodes: 3634
Number of infected nodes: 3938
Number of infected nodes: 4070
Number of infected nodes: 4139
Number of infected nodes: 4159
Number of infected nodes: 4163
Number of infected nodes: 4163
Number of infected nodes: 4163
Number of infected nodes: 4163
Number of infected nodes: 4163
Number of infected nodes: 4163
Number of infected nodes: 4163
Number of infected nodes: 4163
Number of infected nodes: 4163
Number of infected nodes: 4163
Number of infected nodes: 4163
Number of infected nodes: 4163
Number of infected nodes: 4163
Number of infected nodes: 4163
Number of infected nodes: 4163
Number of infected nodes: 4163
Number of infected nodes: 4163
Number of infected nodes: 4163
Number of infected nodes: 4163
Infection process complete

---

### Evaluation

| Problem | Mark     | Comment   
|:-------:|:--------:|:----------------------
| 1       |   /2    |   
| 2       |   /4    |      
| 3       |   /1    | 
| 4       |   /5    | 
| 5       |   /1    |
| Legibility   |   /2    | 
| Modularity   |   /2    | 
| Efficiency   |   /3    | 
|**Total**|**/20**  | 
