<div style="float:left;"><img src="logo.png" width="500"/></div>

# Dynamic Networks

This demo will focus on dynamic network analysis, in the context of healthcare contact data from Demo 3. We will learn to apply a time window strategy to convert continuous data into a discrete network representation. This representation will be characterised, and regions of interest in the network will be explored in more detail.

In [None]:
from pathlib import Path
from collections import Counter
import networkx as nx
import pandas as pd
pd.set_option('display.precision', 3)

## Data Loading

Load the contact records from the file *hospital-contacts.csv*:

In [None]:
meta_path = Path("../Data") / "hospital-metadata.csv"
df_metadata = pd.read_csv(meta_path, index_col=0)
print("Metadata - %d rows" % len(df_metadata))
df_metadata.head(5)

Load the participant metadata from the file *hospital-metadata.csv*:

In [None]:
contact_path = Path("../Data") / "hospital-contacts.csv"
df_contacts = pd.read_csv(contact_path)
print("Contact records - %d rows" % len(df_contacts))
df_contacts.head(10)

Based on the ‘day’ column, we can split the Data Frame into 4 daily time windows, each represented by a separate Data Frame.

In [None]:
# handle the data for each day
window_frames = {}
for day in range(1,5):
    # create a smaller data frame by filtering by the day column
    window_frames[day] = df_contacts[df_contacts["day"]==day]
    print("Day %d: %d rows" % (day, len(window_frames[day])))

## Dynamic Network Creation

For each time window, we can use its Data Frame to construct a corresponding undirected weighted time window contact network, as we saw in the last demo.

In [None]:
# create a network for each day
window_networks = {}
for day in range(1,5):
    g = nx.Graph()
    # get the counts
    frequencies = Counter()
    relevant_nodes = set()
    for i, row in window_frames[day].iterrows():
        relevant_nodes.add(row["participant1"])
        relevant_nodes.add(row["participant2"])
        pair = frozenset([row["participant1"], row["participant2"]])
        frequencies[pair] += 1
    # only add the relevant nodes that appear in this time window
    for node_id in relevant_nodes:
        name = df_metadata["name"][node_id]
        role = df_metadata["role"][node_id]
        g.add_node(node_id, name=name, role=role)
    # now add the edges
    for pair in frequencies:
        node_pair = list(pair)
        g.add_edge(node_pair[0], node_pair[1], weight=frequencies[pair])
    print("Day %d: Network has %d nodes and %d edges" % (day, g.number_of_nodes(), g.number_of_edges()))
    window_networks[day] = g
print("Created %d networks" % len(window_networks))

##  Characterising Dynamic Networks

From the time window networks above, we can generate a time series plot showing the average number of unique contacts for each node per day.

In [None]:
# Convenience function for plotting the time series using Pandas
def gen_ts_plot(values, measure_name, color="red"):
    s_values = pd.Series(values)
    ax = s_values.plot(figsize=(12, 5.5), fontsize=13, color=color, style='.-', ms=15, zorder=3)
    ax.set_title("%s per day" % measure_name, fontsize=14)
    ax.set_xlabel("Time Window (day)", fontsize=14)
    ax.set_ylabel(measure_name, fontsize=14)
    ax.xaxis.grid(True)
    ax.set_xlim(1,4)
    ax.set_ylim((0,s_values.max()*1.1))
    ax.set_xticks([1,2,3,4])

In [None]:
# calculate average unweighted degree for each window - i.e. number of unique contacts
values = {}
for day in range(1,5):
    # get the unweighted degree values for all nodes in this day's network
    degree_values = dict(window_networks[day].degree()).values()
    # get the mean value
    values[day] = sum(degree_values)/len(degree_values)
gen_ts_plot(values, "Mean unique contacts", "blue")

From the time window networks above, we can generate a time series plot showing the average number of total contacts for each node per day.

In [None]:
# calculate average weight degree for each window - i.e. number of total contacts
values = {}
for day in range(1,5):
    # get the weighted degree values for all nodes in this day's network
    wdegree_values = dict(window_networks[day].degree(weight="weight")).values()
    # get the mean value
    values[day] = sum(wdegree_values)/len(wdegree_values)
gen_ts_plot(values, "Mean total contacts", "blue")

For each time window network, we can identify the node with the highest weighted degree (i.e. the highest number of contacts during that day):

In [None]:
# get the node with the highest
max_node_ids = {}
for day in range(1,5):
    wdegree = dict(window_networks[day].degree(weight="weight"))
    # convert to a Pandas series
    s_wdeg = pd.Series(wdegree)
    # get the node with the maximum value 
    max_node = s_wdeg.idxmax()
    max_node_ids[day] = max_node
    print("Day %d: Node %d has highest weighted degree of %d" % ( day, max_node, s_wdeg[max_node] ) )    

We could look at the ego network for each of these nodes:

In [None]:
# generate the ego network for each one
ego_networks = {}
rows = []
for day in range(1,5):
    ego_node_id = max_node_ids[day]
    print("Day %d: Creating ego network for node %s" % (day, ego_node_id))
    eg = nx.ego_graph(window_networks[day], ego_node_id)
    ego_networks[day] = eg
    den = nx.density(eg)
    # calculate some network statistics
    rows.append( {"day" : day, 
        "ego_node_id" : ego_node_id, 
        "ego_name" : df_metadata["name"][ego_node_id],
        "ego_role" : df_metadata["role"][ego_node_id],
        "nodes" : eg.number_of_nodes(),
        "edges" : eg.number_of_edges(),
        "density" : den} )

In [None]:
# display the details as a Data Frame
pd.DataFrame(rows).set_index("day")

For further exploration, we could export each of these ego networks as a GEXF file and visualise them using Gephi:

In [None]:
# save the first ego network
for day in ego_networks:
    nx.write_gexf(ego_networks[day], "contacts-ego-day%d.gexf" % day)