## Exposure  Analysis
for each ct $j$ in Illinois, the exposure vector $v^{(j)}$, whose entry $v_i^{(j)}$ encodes the contribution of ct in Illinois to the importation risk $R_j$:

$$v_i^{(j)} = \frac{r_{i,j}}{R_j}$$

By construction these entries sum to one, $\Sigma_i v_i^{(j)} = 1$. Therefore, we can use entropy related metrics to quantify the similarity between the exposure patterns of two different
destination cts($j$),$\alpha$ and $\beta$. Specifically, once defined the **entropy** of $v^{(\alpha)}$ as 

$$ S(v^{(\alpha)}) = -\Sigma_i v^{(\alpha)} log v^{(\alpha)}$$,

we used the **Jensen-Shannon divergence** between the two vectors, $v^{(\alpha)}$ and $v^{(\beta)}$,defined as

$$\Delta_{\alpha\beta} = S(\frac{v^{(\alpha)}+v^{(\beta)}}{2}) = \frac{S(v^{(\alpha)})+S(v^{(\beta)})}{2}$$

We then apply **agglomerative clustering (linkage complete)** to identify clusters of countries with similar exposure patterns


In [2]:
import pandas as pd
import numpy as np

In [7]:
risk_flow_df = pd.read_csv("risk_flow_matrix.csv").set_index("GEOID")
risk_flow_df

Unnamed: 0_level_0,17091011700,17091011800,17119400951,17119400952,17135957500,17119401100,17119401500,17119401722,17189950200,17189950400,...,17037000900,17037001600,17037000500,17037001700,17037001900,17037000100,17037001500,17037000400,17037000300,17037000200
GEOID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
17091011700,0.000003,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000
17091011800,0.000003,0.000026,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000
17119400951,0.000000,0.000000,0.000150,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000
17119400952,0.000000,0.000000,0.000022,0.000002,0.000000,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000
17135957500,0.000000,0.000000,0.000000,0.000000,0.000019,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17037000100,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000010,0.000000,0.0,0.0,0.000126,0.000002,0.000000,0.000000,0.000005
17037001500,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000002,0.000002,0.0,0.0,0.000000,0.000036,0.000000,0.000002,0.000000
17037000400,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000005,0.000007,0.0,0.0,0.000000,0.000013,0.000006,0.000007,0.000000
17037000300,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000000,0.000002,0.0,0.0,0.000000,0.000000,0.000000,0.000009,0.000000


In [82]:
# df -> matrix(2d array)
risk_flow_matrix = risk_flow_df.values
risk_flow_matrix

array([[3.18887880e-06, 0.00000000e+00, 0.00000000e+00, ...,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
       [3.18887880e-06, 2.55110304e-05, 0.00000000e+00, ...,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
       [0.00000000e+00, 0.00000000e+00, 1.50270008e-04, ...,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
       ...,
       [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
        5.77848849e-06, 6.93418619e-06, 0.00000000e+00],
       [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
        0.00000000e+00, 9.24558159e-06, 0.00000000e+00],
       [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
        0.00000000e+00, 2.31139540e-06, 2.08025586e-05]])

In [111]:
# set all 0s in matrix to a extremly small value
risk_flow_matrix[risk_flow_matrix == 0] = 10**-10

# for each element in the risk flow matrix, devided by its row sum
# exposure matrix
v_matrix = risk_flow_matrix/risk_flow_matrix.sum(axis = 1, keepdims = True)
v_matrix

array([[0.00097104, 0.00030451, 0.00030451, ..., 0.00030451, 0.00030451,
        0.00030451],
       [0.00095767, 0.00766139, 0.00030032, ..., 0.00030032, 0.00030032,
        0.00030032],
       [0.00028979, 0.00028979, 0.04354661, ..., 0.00028979, 0.00028979,
        0.00028979],
       ...,
       [0.00027191, 0.00027191, 0.00027191, ..., 0.00157123, 0.00188547,
        0.00027191],
       [0.00031464, 0.00031464, 0.00031464, ..., 0.00031464, 0.00290907,
        0.00031464],
       [0.00031017, 0.00031017, 0.00031017, ..., 0.00031017, 0.00071693,
        0.00645233]])

In [None]:
# v_matrix * np.log(v_matrix)

In [None]:
# log_v_matrix = np.log(exposure_v_matrix)
# log_v_matrix[np.isinf(log_v_matrix)] = -10
# log_v_matrix

In [None]:
# S = (-v_matrix * log_v_matrix).sum(axis = 1, keepdims = True)
# S

In [113]:
'''
# the function to calculate entropy of a vector
input: a vector
output: a number
'''
def S(vctor):
    log_vctor = np.log(vctor)
#     log_vctor[np.isinf(log_vctor)] = -10
    s = -np.sum(log_vctor * vctor)
#     if np.isnan(s) == True:
#         s = 10**-10
    return s

In [115]:
# s_i = []
# for j in range(3123):
#     s_i.append(S(v_matrix[j]))


In [116]:
'''
# the function to calculate entropy of a vector
input: the index i and j, 
       which refers to the i_th row and j_column of the v_matrix(exposure matrix)
output: a number
'''
def Delta(i, j):
    return S((v_matrix[i]+v_matrix[j])/2) - (S(v_matrix[i]) + S(v_matrix[j]))/2 

In [117]:
# initilize the Delta_matrix with 0s
Delta_matrix = np.zeros([3123,3123])
Delta_matrix.shape

(3123, 3123)

In [124]:
# calculate each element using the Delta funciton
for i in range(300):
    for j in range(300):
        Delta_matrix[i][j] = Delta(i, j)

In [125]:
Delta_matrix

array([[0.        , 0.0063137 , 0.04093334, ..., 0.0612859 , 0.01671932,
        0.02010185],
       [0.0063137 , 0.        , 0.04482662, ..., 0.06501512, 0.02080158,
        0.02416113],
       [0.04093334, 0.04482662, 0.        , ..., 0.07547502, 0.03229838,
        0.03558486],
       ...,
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ]])