### Impact of community structure on dynamics

In this exercise, we'll continue the previous exercise's efforts and keep probing how network structure affects spreading dynamics. To this end, we'll use the same SIR spreading dynamics, again picking random seed nodes and seeing what happens. The difference is that we use an artificial network with a very pronounced community structure, generated using the stochastic block model. 

In the stochastic block model, nodes are divided into blocks so that the connection probabilities between nodes in the same block and nodes in different blocks are generally different. For a strong community structure, the within-block probabilities should be fairly large, while those between blocks should be smaller. 


In [None]:
import networkx as nx
import matplotlib.pyplot as plt
import random
from math import log2

### a. Generating a stochastic block model network

We'll first generate a network with five equal-sized blocks (100 nodes each) so that the connection probability of nodes within the blocks is much larger than between the blocks ($p_{in}=0.15$ vs $p_{out}=0.00075$). Complete the block of code below to generate the network and visualize it and then answer the question in the MyCourses quiz. 

In [None]:
block_size=100
p_in=0.15
p_out=0.00075

# TODO: using nx.stochastic_block_model, generate a network that has 5 blocks of size block_size, so that the connection probabilities
# within the blocks are p_in and between blocks p_out (same probabilities for all blocks and pairs of blocks)
# Check the help for nx.stochastic_block_model at https://networkx.org/documentation/stable/reference/generated/networkx.generators.community.stochastic_block_model.html
# note that the parameter p is a list of lists and it is equivalent to an array representing a 5x5 matrix, with values p_in at the diagonal and
# p_out elsewhere

# YOUR CODE HERE
p=[[p_in if i==j else p_out for i in range(5)] for j in range(5)]
network=nx.stochastic_block_model([block_size]*5,p)

fig,ax=plt.subplots(figsize=(8,8))

pos=nx.spring_layout(network) # get the coordinates for reuse
nx.draw(network,ax=ax,pos=pos,node_size=20)

In [None]:
filename="./SBM.pdf"
fig.savefig(filename)

### b. SIR spreading on the stochastic block model network

Let's next recycle the SIR code from the previous exercise and run viral spreading starting from random seed nodes. No need to touch the code, just run the visualization code block several times and observe what is happening. Then answer the MyCourses quiz question. 

In [None]:
# NO NEED TO MODIFY THIS

def sir_spread(G, seed_node, transmission_prob):
    """
    Simulate the SIR spreading process on a NetworkX graph. First infect seed_node,
    then with transmission_prob each susceptible neighbour of each infected node,
    who are infectious for one time step and then turn recovered. The process
    stops when there are no infectious nodes left. 
    
    Parameters:
    - G: NetworkX Graph object
    - seed_node: The initial node from which the infection starts
    - transmission_prob: Probability of transmission across a link
    
    Returns:
    - The set of nodes that were infected during the process
    """
    # Initialize node states
    susceptible = set(G.nodes())
    infected = set()
    recovered = set()
    
    # Start the infection from the seed node
    if seed_node not in G.nodes():
        raise ValueError("Seed node must be in the graph.")
    
    susceptible.remove(seed_node)
    infected.add(seed_node)
        
    while infected:
        new_infected = set()
        
        # For each infected node, attempt to spread the infection
        for node in infected:
            neighbors = set(G.neighbors(node))
            susceptible_neighbors = neighbors.intersection(susceptible)
            
            for neighbor in susceptible_neighbors:
                if random.random() < transmission_prob:
                    new_infected.add(neighbor)
        
        # Update the states of the nodes
        recovered.update(infected)
        susceptible.difference_update(new_infected)
        infected = new_infected
    
    return recovered

In [None]:
# This piece of code runs one outbreak and visualizes the outcome on the network
# RUN THIS SEVERAL TIMES TO GET AN IDEA OF THE VARIETY OF OUTCOMES (and to answer the MyCourses question)

transmission_probability=0.1 # the probability of transmission from an infected to a susceptible node

seed=random.choice(list(network.nodes()))

outbreak=sir_spread(network,seed,transmission_probability) 

# different node colours for those who got infected and those who didn't
# different node sizes for the seed and the rest

node_colors=[]
node_sizes=[]

for node in network:
    if node in outbreak:
        node_colors.append('red')
    else:
        node_colors.append('gray')
    if node==seed:
        node_sizes.append(100)
    else:
        node_sizes.append(30)

fig, ax = plt.subplots(figsize=(8,6))

# draw the network: gray nodes weren't infected but red nodes were, and the infection started from the large red node.

nx.draw(network, pos=pos,ax=ax, 
        node_size=node_sizes, 
        node_color=node_colors, 
        edgecolors='black', linewidths=0.8, 
        edge_color='lightgray', width=1, 
        alpha=0.8
       )


In [None]:
filename="./SBM_outbreak.pdf"
fig.savefig(filename)

### c. Detecting the communities with label propagation

The five communities are clearly visible in the above visualizations, but we don't yet know the node ids of the communities. As we'll soon need this information, let's run the label propagation algorithm of networkx to get the communities (see https://networkx.org/documentation). Do this and check that you've identified the communities correctly by display the network with nodes coloured according to their community membership. 

In [None]:

## TODO: use networkx subpackage community (usage: nx.community.somefunction) to get label propagation communities

communities=None # this variable should contain a list of lists, where each list contains nodes of one community

# YOUR CODE HERE
communities=list(nx.community.label_propagation_communities(network))

## TODO: the variable communities should now contain lists of nodes, with each list corresponding to one community.
## TODO: now create a dictionary node_color_dict whose keys are nodes and values are colours, so that each community
## TODO: has its own color. You can use the list of colours below. 

colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd']

node_color_dict={}

# YOUR CODE HERE
for i in range(len(communities)):
    for node in communities[i]:
        node_color_dict[node]=colors[i]

# Then we'll have to transform the dictionary into a list because nx.draw is weird and wants a list. 

node_color_list=[] 
for node in network:
    node_color_list.append(node_color_dict[node])

fig, ax = plt.subplots(figsize=(8,6))

# draw the network: nodes in each community should be drawn in the colour of that community. 

nx.draw(network, pos=pos,ax=ax, 
        node_size=30, 
        node_color=node_color_list, 
        edgecolors='black', linewidths=0.8, 
        edge_color='lightgray', width=1, 
        alpha=0.8
       )

    

In [None]:
filename="./SBM_communities.pdf"
fig.savefig(filename)

### d. Entropy of outbreaks

Let's now formalize the result of b. We want to quantitatively check whether the outbreaks tend to localize within communities or not. To this end, we'll again run the SIR spreading process 1000 times, and compute the following entropy measure for each run: $$S=-\sum_{i=1}^5 p(i)\log_2 p(i),$$ where we define $p(i)$ as the probability of a randomly picked infected node belonging to the community $i$. As a reference, if the outbreak is uniformly spread across the network without following the community structure, then $p(i)=1/5 \forall i$, and the reference entropy $S_{ref}\approx 2.32$.

Modify the block of code below to compute the entropies for 1000 runs (more precisely, those runs out of 1000 runs where the outbreak doesn't cease immediately). Then plot a histogram that shows the entropies and answer the MyCourses questions.


In [None]:
community_id={} 
entropy_values=[]

for i in range(0,1000):

    transmission_probability=0.1 # the probability of transmission from an infected to a susceptible node
    seed=random.choice(list(network.nodes()))
    outbreak=sir_spread(network,seed,transmission_probability) 
    outbreak_size=len(outbreak)

    # dictionaries for how many infected nodes there are in each community and their sizes

    community_infected={}

    ## TODO: write a loop that iterates over communities in the variable communities (from the above block)
    ## TODO: and fills in the dictionary community_infected={community index : number of infected}

    # YOUR CODE HERE
    for i in range(len(communities)):
        community_infected[i]=0
        for node in communities[i]:
            if node in outbreak:
                community_infected[i]+=1

    if outbreak_size>1: # We'll censor out those outbreaks that do not take off at all with trivially zero entropy

        S=0
    
        for i in community_infected:

            ## TODO: compute the probability p_i that a randomly picked infected node is in community i
            ## TODO: and use it to compute this run's entropy S
            ## TODO: note: if p(i)=0, one uses in the entropy sum the limiting value p(i)log(p(i))=0 (log of 0 doesn't exist)
            ## TODO: You can use `log2` function to compute the base-2 logarithm

            # YOUR CODE HERE
            p_i=community_infected[i]/outbreak_size
            if p_i>0:
                S+=-p_i*log2(p_i)

        entropy_values.append(S)

fig, ax = plt.subplots(figsize=(8,6))

ax.hist(entropy_values,bins=20);

ymax=ax.get_ylim()[1]

max_entropy=2.322
ax.plot([max_entropy, max_entropy],[0,ymax],'r--') # the maximum entropy line for reference
ax.text(max_entropy-0.1,int(ymax/2),'Value for uniform spread',color='red',rotation=90)

ax.set_xlabel('Entropy S')
ax.set_ylabel('Runs')
        

In [None]:
fname='SBM_entropy.pdf'
fig.savefig('./'+fname)