# Generate infection network

Notebook to produce the infection network as a function of time and save it as set of `.gml` files.

Assumes currently `.csv` inputs as per the variables `HUMANS_PATH` and `INFECTIONS_PATH` below.

Saves a version of the `HUMANS_PATH` `.csv` that contains the infections events.

Also creates plot of the network using `networkx` and `matplotlib`.

In [57]:
import pandas as pd
import numpy as np
import matplotlib.colors
import matplotlib.pyplot as plt
%matplotlib inline
import networkx as nx
import pickle as pkl
import gzip

## 0. Global variables

Paths:

In [54]:
# HUMANS_PATH        = '../outputs/output-humans_time_course.csv'
# INFECTIONS_PATH    = '../outputs/InfectionNetwork.csv'
# EXPORT_GML_PATH    = '../outputs/network_static'
# EXPORT_HUMANS_PATH = '../outputs/humans_infected_time_course.csv'
HUMANS_PATH        = '../outputs/basicsenario-humans_time_course.csv'
INFECTIONS_PATH    = '../outputs/basicscenario-infection_network.csv'
EXPORT_GML_PATH    = '../outputs/network_static_basic_scenario'
EXPORT_HUMANS_PATH = '../outputs/humans_infected_time_course_basic_scenario.csv'
EXPORT_NX_PATH     = '../outputs/network_dynamic_nx_basic_scenario_'

Colors:

In [3]:
SUSCEPTIBLE_COLOR = 'silver'
INFECTED_COLOR    = 'red'
RECOVERED_COLOR   = 'limegreen'
DEAD_COLOR        = 'black'

status_colors = {'S': SUSCEPTIBLE_COLOR,
                 'I': INFECTED_COLOR,
                 'R': RECOVERED_COLOR,
                 'D': DEAD_COLOR}

https://stackoverflow.com/questions/18474791/decreasing-the-size-of-cpickle-objects

In [58]:
def save_zipped_pickle(obj, filename, protocol=-1):
    with gzip.open(filename, 'wb') as f:
        pkl.dump(obj, f, protocol)

In [59]:
def load_zipped_pickle(filename):
    with gzip.open(filename, 'rb') as f:
        loaded_object = pkl.load(f)
        return loaded_object

## 1. Read outputs

Humans timecourses:

In [4]:
%%time
humans_tc = pd.read_csv(HUMANS_PATH)

Wall time: 13.6 s


Get rid of `Unnamed: 0`:

In [5]:
humans_tc = humans_tc[['h_ID', 'loc', 'status', 'WasInfected', 'Diagnosed', 'Hospitalized', 'ICUed', 'time']]

In [6]:
humans_tc.head()

Unnamed: 0,h_ID,loc,status,WasInfected,Diagnosed,Hospitalized,ICUed,time
0,14298,4779,S,0,0,0,0,0
1,12027,4031,S,0,0,0,0,0
2,4033,1335,S,0,0,0,0,0
3,14300,4779,S,0,0,0,0,0
4,12032,4033,S,0,0,0,0,0


In [7]:
humans_tc.shape

(30050000, 8)

Infection network:

In [8]:
infections = pd.read_csv(INFECTIONS_PATH)

Get rid of `Unnamed: 0`:

In [9]:
infections = infections[['h_ID', 'place_of_infection', 'infection_time', 'infected_by']]

In [10]:
infections.head()

Unnamed: 0,h_ID,place_of_infection,infection_time,infected_by
0,10715,3603,0,
1,10230,3448,0,
2,978,321,0,
3,9429,3173,0,
4,8229,2774,0,


Convert `infected_by` to `int`:

Using `pd.Int32Dtype()` in order to keep `NaN`s.

In [11]:
infections['infected_by'] = infections['infected_by'].astype(pd.Int32Dtype())

In [12]:
infections.head()

Unnamed: 0,h_ID,place_of_infection,infection_time,infected_by
0,10715,3603,0,
1,10230,3448,0,
2,978,321,0,
3,9429,3173,0,
4,8229,2774,0,


In [13]:
infections.shape

(13653, 4)

## 2. Generate humans table

Network up to a certain time:

In [26]:
cutoff_time = 5 * 24 + 12

All infections up to that time except initial (`NaN`) infections:

In [27]:
infections_trimmed = infections[(infections['infection_time'] <= cutoff_time) & (infections['infected_by'].notna())]

In [28]:
infections_trimmed

Unnamed: 0,h_ID,place_of_infection,infection_time,infected_by
10,977,321,49,978
11,10719,3603,52,10715
12,36,1263,61,5148
13,1261,3347,61,4703
14,10109,3202,63,5844
15,12400,1443,64,978
16,8228,2774,70,8229
17,10717,3603,77,10715
18,10718,3603,78,10715
19,3878,1263,82,5148


Spreaders ranked by cumulative number of infections:

In [29]:
infections_trimmed.infected_by.value_counts(dropna = False)

10230    6
5844     5
10715    5
10719    2
978      2
5148     2
4703     2
36       1
8229     1
3878     1
3120     1
9429     1
10109    1
NaN      0
Name: infected_by, dtype: Int64

In [30]:
%%time
humans_tc_with_infected = pd.merge(left = humans_tc,
                                   right = infections, 
                                   how = 'left',
                                   left_on = ['h_ID', 'time'],
                                   right_on = ['h_ID', 'infection_time'])

Wall time: 13.9 s


In [31]:
humans_tc_with_infected.head()

Unnamed: 0,h_ID,loc,status,WasInfected,Diagnosed,Hospitalized,ICUed,time,place_of_infection,infection_time,infected_by
0,14298,4779,S,0,0,0,0,0,,,
1,12027,4031,S,0,0,0,0,0,,,
2,4033,1335,S,0,0,0,0,0,,,
3,14300,4779,S,0,0,0,0,0,,,
4,12032,4033,S,0,0,0,0,0,,,


Only humans at infection times:

In [32]:
humans_tc_with_infected[humans_tc_with_infected.infected_by.notna()]

Unnamed: 0,h_ID,loc,status,WasInfected,Diagnosed,Hospitalized,ICUed,time,place_of_infection,infection_time,infected_by
743985,977,321,I,1,0,0,0,49,321.0,49.0,978
782010,10719,3603,I,1,0,0,0,52,3603.0,52.0,10715
917208,36,1263,I,1,0,0,0,61,1263.0,61.0,5148
926592,1261,3347,I,1,0,0,0,61,3347.0,61.0,4703
949050,10109,3202,I,1,0,0,0,63,3202.0,63.0,5844
...,...,...,...,...,...,...,...,...,...,...,...
17333041,5247,1737,I,1,0,0,0,1153,1737.0,1153.0,5249
17510339,8805,1734,I,1,0,0,0,1165,1734.0,1165.0,5118
17534980,4107,2586,I,1,0,0,0,1167,2586.0,1167.0,12414
17756404,11496,3855,I,1,0,0,0,1181,3855.0,1181.0,11498


Convert `infection_time` and `place_of_infection` to `int`:

Using `pd.Int32Dtype()` in order to keep `NaN`s.

In [34]:
humans_tc_with_infected['infection_time'] = humans_tc_with_infected['infection_time'].astype(pd.Int32Dtype())
humans_tc_with_infected['place_of_infection'] = humans_tc_with_infected['place_of_infection'].astype(pd.Int32Dtype())

In [35]:
humans_tc_with_infected.shape

(30050000, 11)

In [36]:
humans_tc_with_infected.head()

Unnamed: 0,h_ID,loc,status,WasInfected,Diagnosed,Hospitalized,ICUed,time,place_of_infection,infection_time,infected_by
0,14298,4779,S,0,0,0,0,0,,,
1,12027,4031,S,0,0,0,0,0,,,
2,4033,1335,S,0,0,0,0,0,,,
3,14300,4779,S,0,0,0,0,0,,,
4,12032,4033,S,0,0,0,0,0,,,


Status statistics at cutoff time:

In [37]:
humans_tc_with_infected[humans_tc_with_infected["time"] == cutoff_time].status.value_counts()

S    14985
I       39
R        1
Name: status, dtype: int64

In [33]:
%%time
humans_tc_with_infected.to_csv(EXPORT_HUMANS_PATH, sep = ";", index = False)

Wall time: 1min 57s


## 3. Generate `networkx` objects

Generate edge list:

In [41]:
infection_events = list(zip(infections_trimmed['infected_by'], infections_trimmed['h_ID']))

In [42]:
infection_events

[(978, 977),
 (10715, 10719),
 (5148, 36),
 (4703, 1261),
 (5844, 10109),
 (978, 12400),
 (8229, 8228),
 (10715, 10717),
 (10715, 10718),
 (5148, 3878),
 (5844, 13496),
 (10719, 2653),
 (10715, 7180),
 (10230, 9321),
 (5844, 5704),
 (10230, 10226),
 (3120, 3122),
 (10230, 10231),
 (10230, 10229),
 (5844, 5842),
 (10230, 3130),
 (9429, 6788),
 (10715, 197),
 (10230, 5650),
 (10109, 5228),
 (5844, 5843),
 (3878, 3879),
 (10719, 10716),
 (36, 37),
 (4703, 4702)]

Time-resolved, alternatively via `humans_tc_with_infected`:

In [43]:
infection_edges = dict()
times = np.unique(humans_tc_with_infected.time.values)

Generate `dict` of infection events at discrete times:

In [46]:
for time in times:
    infections_at_time = humans_tc_with_infected[(humans_tc_with_infected.time == time) & (humans_tc_with_infected.infected_by.notna())]
    if time % 100 == 0:
        print(time, infections_at_time.shape, end = "   ")
    infection_edges[time] = list(zip(infections_at_time['infected_by'], infections_at_time['h_ID']))

0 (0, 11)   100 (1, 11)   200 (1, 11)   300 (3, 11)   400 (20, 11)   500 (38, 11)   600 (30, 11)   700 (13, 11)   800 (9, 11)   900 (4, 11)   1000 (1, 11)   1100 (0, 11)   1200 (0, 11)   1300 (0, 11)   1400 (0, 11)   1500 (0, 11)   1600 (0, 11)   1700 (0, 11)   1800 (0, 11)   1900 (0, 11)   

In [47]:
infection_edges[800]

[(3117, 11506),
 (9346, 9345),
 (7346, 9674),
 (11937, 5286),
 (14153, 14155),
 (11599, 11601),
 (6798, 6799),
 (12521, 12522),
 (10406, 10407)]

Generate graphs:

In [48]:
infection_network_static = {time: nx.DiGraph(infection_edges[time]) for time in times}

Add all nodes so that not just the infected but also the susceptible are in the graph:

In [67]:
%%time
for time in times[700:711]:
    print(time, end = "  ")
    infection_network_static[time].add_nodes_from(humans_tc_with_infected['h_ID'])
    save_zipped_pickle(obj = infection_network_static[time], filename = EXPORT_NX_PATH + str(time).zfill(3) + ".zip")

700  701  702  703  704  705  706  707  708  709  710  Wall time: 2min 37s


Draw graph at one time:

In [68]:
example_time = 709

In [69]:
edges = infection_network_static[example_time].edges()

In [70]:
len(edges)

43

In [71]:
out_degrees = [infection_network_static[example_time].out_degree()[edge[0]] for edge in edges]

In [72]:
max(out_degrees)

2

Define a colormap for the edges based on the degrees of the nodes (red = super spreader, green = low spreader):

In [73]:
cdict = {'red':   [(0.0, 0.0, 0.0),  # red increases
                   (1.0, 1.0, 1.0)],

         'green': [(0.0, 1.0, 1.0),  # green decreases
                   (1.0, 0.0, 0.0)],

         'blue':  [(0.0, 0.0, 0.0),  # no blue at all
                   (1.0, 0.0, 0.0)]}

red_green_cm = matplotlib.colors.LinearSegmentedColormap('RedGreen', cdict, max(out_degrees))

Get status of a node (not very safe because it depends on the expression returning only one value):

In [74]:
def get_status(human_ID, time):
    if pd.isnull(human_ID):
        status = 'S'
    elif human_ID not in humans_tc_with_infected.h_ID.values:
        status = 'S'
    else:
        status = humans_tc_with_infected[(humans_tc_with_infected.h_ID == human_ID) & (humans_tc_with_infected.time == time)].status.values[0]

    return status

Test the function:

In [75]:
get_status(214, cutoff_time)

'S'

The following cells take forever for large networks:

Save networks to `gml` (e.g. for visualization in Cytoscape):

Careful, `gml` does not save the line widths or colors from the plot above.