# Content


In this notebook we take the SocioPatterns dataset from the `SPData` folder and convert them to a unique format `i,j,t`, so that `i, j` are two numbers between $1$ and $n$ and `t` is the number of $20$ seconds slices elapsed since the first measurement. The output is saved in the folder `Graphs`.

In [1]:
# set library directory
import sys
sys.path += ['dir_to_package/Package'] 

from Utilities import *

import warnings
warnings.filterwarnings("ignore")

ROOT = 'SPData/'

In [2]:
def ProcessAndSave(dft, name):
    '''Function to pre-process and save the data. Given the temporal network dft, it maps its nodes to integers 
    between 0 and n-1 (where n is the number of nodes), it sets the temporal resolution to 20 seconds and shifts
    the smallest time to 0. It then creates both the (i,j,t) and (i,j,t,τ) graph representations and saves them 
    in the appropriate folder'''
    
    # map the node names to integers between 0 and n-1
    all_nodes = np.unique(dft[['i', 'j']].values)
    n = len(all_nodes)
    mapper = dict(zip(all_nodes, np.arange(n)))
    idx1 = dft.i.map(lambda x: mapper[x])
    idx2 = dft.j.map(lambda x: mapper[x])

    dft.i = np.minimum(idx1, idx2)
    dft.j = np.maximum(idx1, idx2)

    # shift time and change the temporal resolution
    dft.t = dft.t - dft.t.min()
    dft.t /= 20

    # save the (i,j,t) format
    dft[['i', 'j', 't']].to_csv('Graphs/' + name + '-dft.csv', index = False)
    
    # create and save the (i,j,t,τ) format
    dfttau = tij2tijtau(dft)
    dfttau.to_csv('Graphs/' + name + '-dfttau.csv', index = False)
    
    return

In [3]:
name = 'malawi_pilot'

df = pd.read_csv(ROOT + name + '.csv.gz')[['contact_time', 'id1', 'id2']]
df = df.rename(columns = {'contact_time': 't', 'id1': 'i', 'id2': 'j'})
ProcessAndSave(df, name)

[Parallel(n_jobs=8)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done   9 tasks      | elapsed:    1.1s
[Parallel(n_jobs=8)]: Done 160 tasks      | elapsed:    1.6s
[Parallel(n_jobs=8)]: Done 347 out of 347 | elapsed:    2.0s finished


In [4]:
name = 'baboons'

df = pd.read_csv(ROOT + name + '.txt', sep = '\t')
dft = df[['t', 'i', 'j']]
ProcessAndSave(dft, name)

[Parallel(n_jobs=8)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done   9 tasks      | elapsed:    0.0s
[Parallel(n_jobs=8)]: Done  78 out of  78 | elapsed:    0.2s finished


In [5]:
name = 'SFHH'

dft = pd.read_csv(ROOT + name + '.dat', sep = ' ', header = None, names = ['t', 'i', 'j'])
ProcessAndSave(dft, name)

[Parallel(n_jobs=8)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done   9 tasks      | elapsed:    0.0s
[Parallel(n_jobs=8)]: Done 528 tasks      | elapsed:    0.9s
[Parallel(n_jobs=8)]: Done 2496 tasks      | elapsed:    3.8s
[Parallel(n_jobs=8)]: Done 5232 tasks      | elapsed:    7.7s
[Parallel(n_jobs=8)]: Done 8768 tasks      | elapsed:   12.5s
[Parallel(n_jobs=8)]: Done 9565 out of 9565 | elapsed:   13.7s finished


In [6]:
name = 'InVS'

dft = pd.read_csv(ROOT + name + '.dat', sep = ' ', header = None, names = ['t', 'i', 'j'])
ProcessAndSave(dft, name)

[Parallel(n_jobs=8)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done   9 tasks      | elapsed:    0.0s
[Parallel(n_jobs=8)]: Done 528 tasks      | elapsed:    0.8s
[Parallel(n_jobs=8)]: Done 755 out of 755 | elapsed:    1.0s finished


In [7]:
name = 'primaryschool'

df = pd.read_csv(ROOT + name + '.csv', sep = '\t', header = None, names = ['t', 'i', 'j', 'C1', 'C2'])
dft = df[['t', 'i', 'j']]
ProcessAndSave(dft, name)

[Parallel(n_jobs=8)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done   9 tasks      | elapsed:    0.1s
[Parallel(n_jobs=8)]: Done 528 tasks      | elapsed:    1.2s
[Parallel(n_jobs=8)]: Done 2496 tasks      | elapsed:    4.3s
[Parallel(n_jobs=8)]: Done 5232 tasks      | elapsed:    8.5s
[Parallel(n_jobs=8)]: Done 8317 out of 8317 | elapsed:   13.3s finished


In [8]:
name = 'highschool_2013'
df = pd.read_csv(ROOT + name + '.csv', sep = ' ', header = None, names = ['t', 'i', 'j', 'C1', 'C2'])
dft = df[['t', 'i', 'j']]
ProcessAndSave(dft, name)

[Parallel(n_jobs=8)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done   9 tasks      | elapsed:    0.1s
[Parallel(n_jobs=8)]: Done 528 tasks      | elapsed:    1.1s
[Parallel(n_jobs=8)]: Done 2496 tasks      | elapsed:    4.1s
[Parallel(n_jobs=8)]: Done 5232 tasks      | elapsed:    8.6s
[Parallel(n_jobs=8)]: Done 5818 out of 5818 | elapsed:    9.7s finished


In [9]:
name = 'highschool_2011'
df = pd.read_csv(ROOT + name + '.csv', sep = '\t', header = None, names = ['t', 'i', 'j', 'C1', 'C2'])
dft = df[['t', 'i', 'j']]
ProcessAndSave(dft, name)

[Parallel(n_jobs=8)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done   9 tasks      | elapsed:    0.0s
[Parallel(n_jobs=8)]: Done 528 tasks      | elapsed:    0.8s
[Parallel(n_jobs=8)]: Done 1710 out of 1710 | elapsed:    2.4s finished


In [10]:
name = 'highschool_2012'
df = pd.read_csv(ROOT + name + '.csv', sep = '\t', header = None, names = ['t', 'i', 'j', 'C1', 'C2'])
dft = df[['t', 'i', 'j']]
ProcessAndSave(dft, name)

[Parallel(n_jobs=8)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done   9 tasks      | elapsed:    0.0s
[Parallel(n_jobs=8)]: Done 560 tasks      | elapsed:    1.1s
[Parallel(n_jobs=8)]: Done 2186 tasks      | elapsed:    3.0s
[Parallel(n_jobs=8)]: Done 2220 out of 2220 | elapsed:    3.0s finished


In [11]:
name = 'hospital'
df = pd.read_csv(ROOT + name + '.dat', sep = '\t', header = None, names = ['t', 'i', 'j', 'C1', 'C2'])
dft = df[['t', 'i', 'j']]
ProcessAndSave(dft, name)

[Parallel(n_jobs=8)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done   9 tasks      | elapsed:    0.0s
[Parallel(n_jobs=8)]: Done 528 tasks      | elapsed:    0.8s
[Parallel(n_jobs=8)]: Done 1139 out of 1139 | elapsed:    1.6s finished
