# 08-calculate-effective-distance

This notebook calculates the effective distance between zones in RMSP using `effective_distance_py3.py`.

It calculates the `Dominant Path Effective Distance` and the `Random Walk Effective Distance`, assuming `R0 = 2.5`, `Infection period = 8.0`, and assuming `10%` of the population is circulating.

___________________

## References:

**`Dominant Path Effective Distance`: only the most dominant path is considered**
- Gautreau, A., Barrat, A. and Barthelemy, M., Global disease spread: Statistics and estimation of arrival times, J. Theor. Biol. 251, 3, 509 (2008)
- Brockmann, D. and Helbing, D., The hidden geometry of complex, network-driven contagion phenomena, Science 342, 6164, 1337 (2013)

**`Random Walk Effective Distance`: random Walk based approach**
- Iannelli, F., Koher, A., Hoevel, P. and Sokolov, I.M. Effective Distances for Epidemics spreading on Complex Networks, arXiv:1608.06201 (2016)

In [132]:
import sys
import pandas as pd
import numpy as np


In [None]:
data_path = '/Users/shivyucel/Documents/SDS_2021.nosync/SDS_2020-2021/SDS_Thesis/Data/'

# Preprocessing

These steps receive the data from the SIR preprocessing steps (with the regions that meet its criteria), and ensure that they have associated mobility data. If they do not, they are removed from the data set. If changed, we return to the SIR preprocessing file and re-run it with the updated list of commuting zones. This step is repeated until no more modications are made, leaving 2599 hexagon regions.

In [136]:
hex_ids = pd.read_csv(data_path + 'h3/h3_IDs.csv')

#correct indices generated in SIR_preprocess
df = pd.read_csv(data_path + 'SIR_preprocessed_commuting.csv')


valid = df['SOURCE'].values
valid= pd.DataFrame(valid)


mob_change = pd.read_csv(data_path + 'h3/commuting/new_mob_redux.csv')
mob_change = mob_change.merge(hex_ids, left_on='h3', right_on='Unnamed: 0')
mob_change = mob_change.sort_values(by='0')

all_commute = commute['SOURCE'].append(commute['TARGET']).unique()
mob_change = mob_change[mob_change['0'].isin(all_commute)]

# fill NaN with the mean mobility reduction, and set places with mobility reductions greater than 1 to 0.99
mob_change.reset_index(inplace=True)
mob_change['marginal_change'] = mob_change['marginal_change'].fillna(value=mob_change['marginal_change'].mean())
mob_change['marginal_change'] = [0.999 if x >= 1 else x for x in mob_change['marginal_change'] ]

#mob_change.to_csv(data_path + 'h3/real_commute_mob_change_ED.csv')

commute = commute[commute['SOURCE'].isin(mob_change['0'])]
commute = commute[commute['TARGET'].isin(mob_change['0'])]

values = commute.values

#np.savetxt(data_path + f"real_commuting_for_ED.csv", values, delimiter=",")

#if regions change during this step because they don't have mobility data, rerun SIR _preprocess on this updated paper_clean_commute.csv
#this ensures SIR model and effective distance are calculated for the same regions
#return here afterwards and ensure consistency

#commute.to_csv(data_path + 'paper_clean_commute.csv')


In [2]:
import effective_distance_py_baseline as ed
import numpy as np

## Run baseline ED  

In [None]:
baseline_output_file = ""

In [5]:
# Gravity and Radiation are two models for human mobility
# Do gravity later
for model in ['original']:
   
    #------------------------------------#
    #
    #       Step 2: Load the network
    #
    #------------------------------------#

    # We assume that the mobility network is stored in a comma separated file (.csv)
    # following the convention SOURCE, TARGET, FLUX.
    # The node IDs are interpreted as integers and have to run from 0 to number_of_nodes-1.
    # The fluxes have to be positive and will be saved as float numbers.

    infile = data_path + f"h3/commuting/real_commuting_for_ED.csv"

    myEffectiveDistance = ed.EffectiveDistances(infile)
    # The graph is now stored in the attribute myEffectiveDistance.graph as a NetworkX type DiGraph.
    # In order to avoid singularities in later calculations,
    # only the giant strongly connected component has been stored.
    
    print('Loaded', infile)

    for traffic_reduction in [0]:  # Assuming the traffic is cut down by 0%, 30%, 50% or 80%

        #------------------------------------#
        #
        #      Step 3: Effective Distances
        #
        #------------------------------------#

        # We provide three methods to calculate the effective distance.
        # If the source and target are specified a number is returned.
        # Otherwise, if one is set to None or both an array is returned.
        # https://www.nature.com/articles/s41562-020-0928-4
        alpha   = 2.9/9.2      # R0/Infection period

        # Recovery rate
        beta    = 1.0/9.2      # 1.0/Infection period

        #pop_moving = sum([ myEffectiveDistance.graph[u][v]['weight'] for u,v in myEffectiveDistance.graph.edges() ])
        #pop_total  = 15018089.928
        #kappa_0 = pop_moving / pop_total

        # kappa_0 is defined as the ratio between
        # the total daily passenger flux and the total population,
        # i.e. the rate to leave a node for a randomly chosen individual.
        # https://arxiv.org/pdf/1608.06201.pdf

        # But since the delta parameter needs to be positive,
        # this implies that we need (alpha-beta)/kappa > exp(euler_mascheroni).
        # but assuming alpha and beta are fixed,
        # we need kappa < (alpha-beta)/exp(euler_mascheroni)
        # which in our case means kappa < 0.105/.

        # So we'll take kappa_0 = 0.10, i,e, 10% of the population is circulating.

        kappa_0 = 0.10

        kappa   = (1.0-traffic_reduction/100.0)*kappa_0

        euler_mascheroni = np.euler_gamma

        parameter = float( np.log( (alpha-beta)/kappa ) - euler_mascheroni )

        print("alpha = {}, beta = {}, kappa = {:.3f}, parameter = {:.5f}".format(alpha,beta,kappa,parameter))

        #-------------------------------------#
        #
        #   Dominant Path Effective Distance
        #
        #-------------------------------------#
        
        source, target = None, None
        DPED = myEffectiveDistance.get_dominant_path_distance(source, target, parameter=parameter)

        outfile = data_path + f'{output_file}.csv'
        np.savetxt(outfile, DPED)
        print('Saved',outfile)


The node IDs have to run continuously from 0 to Number_of_nodes-1.
Node IDs have been changed according to the requirement.
-----------------------------------

Lines:  1487795 , Nodes:  2599
-----------------------------------
Data Structure:

source,    target,    weight 

0,       0,       4.37e+02
0,       3,       2.82e+00
0,       12,       2.33e-01
0,       19,       2.06e+00
0,       20,       1.74e+00
0,       21,       2.52e-02
0,       27,       2.82e+00
-----------------------------------

ignore self loop at node 0
ignore self loop at node 1
ignore self loop at node 2
ignore self loop at node 3
ignore self loop at node 4
ignore self loop at node 5
ignore self loop at node 6
ignore self loop at node 7
ignore self loop at node 8
ignore self loop at node 10
ignore self loop at node 11
ignore self loop at node 12
ignore self loop at node 13
ignore self loop at node 14
ignore self loop at node 15
ignore self loop at node 16
ignore self loop at node 17
ignore self loop at node 

In [8]:
import effective_distance_real_change as ed

  if saveto is not "":
  if saveto is not "":
  if saveto is not "":
  if saveto is not "":


In [None]:
real_reduction_output_file = ""

In [118]:

#------------------------------------#
#
#       Step 1: Load the network
#
#------------------------------------#

# We assume that the mobility network is stored in a comma separated file (.csv)
# following the convention SOURCE, TARGET, FLUX.
# The node IDs are interpreted as integers and have to run from 0 to number_of_nodes-1.
# The fluxes have to be positive and will be saved as float numbers.

infile = data_path + f"real_commuting_for_ED.csv"

myEffectiveDistance = ed.EffectiveDistances(infile)
# The graph is now stored in the attribute myEffectiveDistance.graph as a NetworkX type DiGraph.
# In order to avoid singularities in later calculations,
# only the giant strongly connected component has been stored.

print('Loaded', infile)

#------------------------------------#
#
#      Step 2: Effective Distances
#
#------------------------------------#

# https://www.nature.com/articles/s41562-020-0928-4
alpha   = 2.9/9.2      # R0/Infection period

# Recovery rate
beta    = 1.0/9.2      # 1.0/Infection period

source, target = None, None
DPED = myEffectiveDistance.get_dominant_path_distance(source, target, parameter=1)

outfile = data_path + f'{real_reduction_output_file}.csv'
np.savetxt(outfile, DPED)
print('Saved',outfile)


The node IDs have to run continuously from 0 to Number_of_nodes-1.
Node IDs have been changed according to the requirement.
-----------------------------------

Lines:  1487795 , Nodes:  2599
-----------------------------------
Data Structure:

source,    target,    weight 

0,       0,       4.37e+02
0,       3,       2.82e+00
0,       12,       2.33e-01
0,       19,       2.06e+00
0,       20,       1.74e+00
0,       21,       2.52e-02
0,       27,       2.82e+00
-----------------------------------

ignore self loop at node 0
ignore self loop at node 1
ignore self loop at node 2
ignore self loop at node 3
ignore self loop at node 4
ignore self loop at node 5
ignore self loop at node 6
ignore self loop at node 7
ignore self loop at node 8
ignore self loop at node 10
ignore self loop at node 11
ignore self loop at node 12
ignore self loop at node 13
ignore self loop at node 14
ignore self loop at node 15
ignore self loop at node 16
ignore self loop at node 17
ignore self loop at node 