### K-shells and spreading

In this exercise, we will investigate the k-shells for a social network: the largest connected component of a network from mobile phone calls between students in Copenhagen (source: https://www.nature.com/articles/s41597-019-0325-x). We'll first visualize the network and the k-shells, and then run a spreading process on top: think of this as, say, word-of-mouth viral marketing. We'll choose different seed nodes for the process, and see how many other nodes the spreading process reaches for seeds of different k-shell values. 

In [None]:
import networkx as nx
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import matplotlib.ticker as ticker
import random
import seaborn as sns

## Data
Let's first load the data into a networkx graph. As usual, specify your path below.

In [None]:
# Select data directory
import os
if os.path.isdir('/coursedata'):
    course_data_dir = '/coursedata'
elif os.path.isdir('../data'):
    course_data_dir = '../data'
else:
    # Specify course_data_dir on your machine
    course_data_dir = '.'

print('The data directory is %s' % course_data_dir)

network_fname = os.path.join(course_data_dir, 'CH_call_largest_edges.csv')

### a. Visualization of k-shell indices
Next, read in the edge list and compute the k-shell indices of nodes. Use the template to visualize the network, with nodes coloured according to their k-shell index. 

In [None]:
# NO NEED TO MODIFY THIS CELL. Yields colors corresponding to x (scaled between minval and maxval).
def get_colors(value_list, cmap_name='CMRmap'):
    # Normalize x between minval and maxval
    norm = mcolors.Normalize(vmin=min(value_list), vmax=max(value_list))
    
    # Get the colormap
    cmap = plt.get_cmap(cmap_name)
    
    # Map the normalized value to a color

    colors = [cmap(norm(x)) for x in value_list]
    
    return colors

In [None]:
network=nx.Graph()
network = nx.read_edgelist(network_fname, delimiter=',', nodetype=int)

## TODO: using the ready-made networkX function for k-shells (nx.core_number), create a list node_shells of nodes' k-shell indices

node_shells=None

# YOUR CODE HERE
node_shells = nx.core_number(network).values()


# turn node shell indices directly into colours, as some versions of matplotlib cannot handle colormaps directly in nx.draw...

node_colors=get_colors(node_shells)

figure_title = 'Copenhagen student call network'
fig_vis, ax_vis = plt.subplots(figsize=(8,6))
pos=nx.spring_layout(network) # get the coordinates for reuse
nx.draw(network, pos=pos,ax=ax_vis, 
        node_size=30, 
        node_color=node_colors, # Nodes colored according to shell indices
        edgecolors='black', linewidths=0.8, # Controls the color and width of the node borders
        edge_color='gray', width=1, # Controls the color and width of the edges
        alpha=0.8
)
ax_vis.set_title(figure_title) # Sets the title of the figure
sm = plt.cm.ScalarMappable(cmap=plt.cm.CMRmap, norm=plt.Normalize(vmin=min(node_shells), vmax=max(node_shells)))
sm.set_array([])
cbar=plt.colorbar(sm, ax=ax_vis,label="k-shell Index")
cbar.ax.yaxis.set_major_locator(ticker.MaxNLocator(integer=True))

In [None]:
figure_fname = 'Facebook_call_friendships_kshells.pdf'
# This save the file in the same directory as this notebook is in. 
# To save it in a different directory, specify the path:
# figure_fname = 'some_path/Facebook_call_friendships_kshells.pdf'
fig_vis.savefig(figure_fname)

### b. Viral spreading from different seed nodes
Next, we'll do some simulations! We will run a so-called SIR (Susceptible-Infectious-Recovered) process, where initially each node is Susceptible except for a single seed node that is Infectious. Then, on every time step, Infectious nodes infect their Susceptible neighbours with some probability and then Recover. Recovered nodes are considered immune and they cannot become Infectious any more. 

Use the code below to run the SIR spreading process from a random seed node and to visualize the outcome. Assume transmission probability to be 0.4. Run it more than 10 times to get an idea of the variety of outcomes, and then answer the MyCourses question.


In [None]:
# NO NEED TO MODIFY THIS

def sir_spread(G, seed_node, transmission_prob):
    """
    Simulate the SIR spreading process on a NetworkX graph. First infect seed_node,
    then with transmission_prob each susceptible neighbour of each infected node,
    who are infectious for one time step and then turn recovered. The process
    stops when there are no infectious nodes left. 
    
    Parameters:
    - G: NetworkX Graph object
    - seed_node: The initial node from which the infection starts
    - transmission_prob: Probability of transmission across a link
    
    Returns:
    - The set of nodes that were infected during the process
    """
    # Initialize node states
    susceptible = set(G.nodes())
    infected = set()
    recovered = set()
    
    # Start the infection from the seed node
    if seed_node not in G.nodes():
        raise ValueError("Seed node must be in the graph.")
    
    susceptible.remove(seed_node)
    infected.add(seed_node)
        
    while infected:
        new_infected = set()
        
        # For each infected node, attempt to spread the infection
        for node in infected:
            neighbors = set(G.neighbors(node))
            susceptible_neighbors = neighbors.intersection(susceptible)
            
            for neighbor in susceptible_neighbors:
                if random.random() < transmission_prob:
                    new_infected.add(neighbor)
        
        # Update the states of the nodes
        recovered.update(infected)
        susceptible.difference_update(new_infected)
        infected = new_infected
    
    return recovered

In [None]:
# This piece of code runs one outbreak and visualizes the outcome on the network
# RUN THIS MORE THAN 10 TIMES TO GET AN IDEA OF THE VARIETY OF OUTCOMES (and to answer the MyCourses question)

transmission_probability=0.4 # the probability of transmission from an infected to a susceptible node

## TODO: pick a random node from the network to the variable "seed"
seed=None

# YOUR CODE HERE
seed = random.choice(list(network.nodes()))

## TODO: run the function sir_spread with the correct parameters, and store the returned set in variable "outbreak"
outbreak=None
# YOUR CODE HERE
outbreak = sir_spread(network, seed, transmission_probability)

# different node colours for those who got infected and those who didn't
# different node sizes for the seed and the rest

node_colors=[]
node_sizes=[]

for node in network:
    if node in outbreak:
        node_colors.append('red')
    else:
        node_colors.append('gray')
    if node==seed:
        node_sizes.append(100)
    else:
        node_sizes.append(30)

fig, ax = plt.subplots(figsize=(8,6))

# draw the network: gray nodes weren't infected but red nodes were, and the infection started from the large red node.

nx.draw(network, pos=pos,ax=ax, 
        node_size=node_sizes, 
        node_color=node_colors, 
        edgecolors='black', linewidths=0.8, 
        edge_color='lightgray', width=1, 
        alpha=0.8
       )


### c. Dependence of the outbreak size on the seed node's shell index.

Next, run the SIR simulation 1000 times from random seed nodes using the code below. This time use transmission probability = 0.2. Then plot a boxplot of the outcomes for different k-shell indices of the seed nodes. How does the expected outbreak size depend on the seed node shell index?


In [None]:
# now let's repeat the above 1000 times and compute the average outbreak size for each shell index

transmission_probability=0.2 # use this value here

seed_shells=[]
outbreak_size=[]

## TODO: run the sir_spread function 1000 times and for each run, append the shell index of the randomly chosen seed node to seed_shells and the outbreak size to outbreak_size

# YOUR CODE HERE
for i in range(1000):
    seed = random.choice(list(network.nodes()))
    outbreak = sir_spread(network, seed, transmission_probability)
    seed_shells.append(nx.core_number(network)[seed])
    outbreak_size.append(len(outbreak))

fig, ax = plt.subplots(figsize=(8,6))
sns.boxplot(x=seed_shells,y=outbreak_size,ax=ax)
ax.set_xlabel('Shell index of seed')
ax.set_ylabel('Outbreak size')


In [None]:
figure_fname = 'Outbreak_size_vs_kshell.pdf'
# This save the file in the same directory as this notebook is in. 
# To save it in a different directory, specify the path:
# figure_fname = 'some_path/outbreak_size_vs_kshell.pdf'
fig.savefig(figure_fname)