# Summary
This page contains two sections of code. The first section implements equation 5 in the original paper in two different ways in an attempt to replicate Mr. Jackson's results. This section also contains the code necessary to read all the demographic, network, and microfinance participation data. 

The second section runs simulations to find the parameters of a contagion that most closely resembles the behavior observed in the Indian villages. While Mr. Jackson does not go into details about how he designed his model or simulation, on page 8 in his paper he does mention that his model only had to go through 3 - 6 iterations before returning result close to real life. He also estimated the probabilities of household transmitting information about microfinancing to another household depends heavily on whether the initial household participated in microfinancing or not. His calculated probabilities were .55 and .05 per period respectively. The model and simulation I built takes into account many more possible factors, details can be found below.

## Equation 5 Reimplementation

In [1]:
import numpy as np
import pandas as pd

In [2]:
listofadj = []
listofpeerpressure = []
listofparticipation = []
for i in [1,2,3,4,6,9,10,12,15,19,20,21,23,24,25,28,29,31,32,33,36,37,39,41,42,43,45,46,47,48,50,51,52,55,57,59,60,62,64,65,66,67,68,70,71,72,73,75,77]:
    HHId = pd.read_csv('1. Network Data/Adjacency Matrix Keys/key_HH_vilno_' + str(i) + '.csv', header=None)
    adj = pd.read_csv('1. Network Data/Adjacency Matrices/adj_allVillageRelationships_HH_vilno_' + str(i) + '.csv', header=None)
    MF = pd.read_csv('3. Microfinance Participation/MF' + str(i) + '.csv', header=None)
    
    HHId[0] = HHId[0]+(i*1000)
    adj.columns=HHId[0].tolist()
    adj.index=HHId[0].tolist()
    MF.index=HHId[0].tolist()
    MF.index.name = 'hhid'
    MF.columns = ['Got Microfinance']
    
    subset = MF[MF['Got Microfinance'] == 1].index.tolist()
    subset = adj[subset]
    peerpressure = pd.DataFrame(subset.sum(axis=1), columns=['Friends Borrowed'])
    
    listofadj.append(adj)
    listofpeerpressure.append(peerpressure)
    listofparticipation.append(MF)
    
peerpressureall = pd.concat(listofpeerpressure)
testy = pd.concat(listofparticipation)

In [3]:
Demographics = pd.read_csv('2. Demographics and Outcomes/household_characteristics.csv')
Demographics.set_index('hhid', inplace=True)

hohreligion = pd.get_dummies(Demographics['hohreligion'])
castesubcaste = pd.get_dummies(Demographics['castesubcaste'])
electricity = pd.get_dummies(Demographics['electricity'])
latrine = pd.get_dummies(Demographics['latrine'])
ownrent = pd.get_dummies(Demographics['ownrent'])

Demographics.drop(columns=['Unnamed: 0','village', 'adjmatrix_key', 'HHnum_in_village', 'rooftypeoth', 'hohreligion', 'castesubcaste', 'electricity', 'latrine', 'ownrent'], inplace=True)
Demographics = Demographics.join(hohreligion).join(castesubcaste).join(electricity).join(latrine).join(ownrent)

In [4]:
testx = pd.merge(peerpressureall, Demographics, left_index=True, right_index=True, how='inner')
testx

Unnamed: 0,Friends Borrowed,rooftype1,rooftype2,rooftype3,rooftype4,rooftype5,room_no,bed_no,hhSurveyed,leader,...,Common,None,Owned,0.0,6.0,GIVEN BY GOVERNMENT,LEASED,OWNED,OWNED BUT SHARED,RENTED
1001,3,0,1,0,0,0,3,4,0,0,...,0,1,0,0,0,0,0,1,0,0
1002,0,0,1,0,0,0,1,1,1,1,...,0,1,0,0,0,0,0,1,0,0
1003,0,0,0,0,0,1,3,4,0,1,...,0,1,0,0,0,0,0,1,0,0
1004,1,0,1,0,0,0,2,6,0,0,...,0,0,1,0,0,0,0,1,0,0
1005,4,0,1,0,0,0,3,4,0,0,...,0,1,0,0,0,0,0,1,0,0
1006,4,0,0,1,0,0,2,1,1,0,...,0,0,1,0,0,0,0,1,0,0
1007,4,0,0,1,0,0,3,5,1,0,...,0,0,1,0,0,0,0,1,0,0
1008,4,0,0,0,1,0,2,1,1,0,...,0,0,1,0,0,0,0,1,0,0
1009,1,0,1,0,0,0,2,7,0,0,...,0,1,0,0,0,1,0,0,0,0
1010,0,0,0,1,0,0,2,1,0,0,...,0,1,0,0,0,0,0,1,0,0


In [5]:
from sklearn import linear_model as bestfit
model = bestfit.Lasso(alpha=0.1)
model.fit(testx,testy)
print(model.coef_)

[ 0.02170964  0.          0.         -0.          0.         -0.
 -0.         -0.         -0.          0.          0.         -0.
  0.         -0.          0.         -0.          0.          0.
  0.          0.         -0.         -0.          0.         -0.
  0.         -0.          0.          0.         -0.          0.
  0.        ]


In [6]:
listofvillagex = []
listofvillagey = []
for i in [1,2,3,4,6,9,10,12,15,19,20,21,23,24,25,28,29,31,32,33,36,37,39,41,42,43,45,46,47,48,50,51,52,55,57,59,60,62,64,65,66,67,68,70,71,72,73,75,77]:
    villagex = testx.loc[((i*1000) < testx.index) & (testx.index < (((i+1)*1000)-1))]
    villagey = testy.loc[((i*1000) < testy.index) & (testy.index < (((i+1)*1000)-1))]
    meanvillagex = pd.DataFrame(np.mean(villagex, axis=0),columns = [i])
    meanvillagey = pd.DataFrame(np.mean(villagey, axis=0),columns = [i])
    listofvillagex.append(meanvillagex)
    listofvillagey.append(meanvillagey)
testx2 = pd.concat(listofvillagex, axis=1)
testy2 = pd.concat(listofvillagey, axis=1)

In [7]:
testy2 = np.log(testy2 / (1-testy2))
testy2

Unnamed: 0,1,2,3,4,6,9,10,12,15,19,...,65,66,67,68,70,71,72,73,75,77
Got Microfinance,-1.203973,-1.704748,-1.899748,-2.569464,-1.269761,-1.524881,-2.154665,-0.806806,-2.021548,-1.349927,...,-1.597365,-1.93968,-0.869604,-1.784155,-1.356081,-1.554088,-1.715386,-1.568616,-1.193922,-2.233592


In [8]:
model2 = bestfit.Lasso(alpha=0.1)
model2.fit(testx2.T,testy2.T)
print(model2.coef_)

[ 0.47468105  0.         -0.          0.         -0.         -0.
 -0.         -0.         -0.         -0.          0.         -0.
  0.         -0.          0.         -0.         -0.         -0.
  0.          0.         -0.         -0.         -0.          0.
  0.          0.         -0.         -0.          0.          0.
  0.        ]


## Contagion Model and Simulation

Here is the mathamatical overview of how the model and simulation is designed: <br>
[A hxh] : Adjacency Matrix of Households, with dimensions hxh where h = # of households in village. Note that this matrix is of the format sender x reciever. Contains only 0 and 1 and is supplied by dataset. <br>
[K 1xh]i : Knowledge Matrix, showing which households have knowledge of microfinancing at iteration i. At the first iteration, this is will be the leader matrix. The initial matrix will contain only 1 or 0, but during the iterations it can contain any value between 0 and 1, representing the probability that a household has knowledge of microfinancing. Values must always increase after each iteration. <br>
[P 1xh]i : Participation Matrix, showing the probability of a specific household participating in microfinancing at iteration i. Initialized as all zeros. <br>
[D hx31]i : Demographics Matrix, showing the demographic of each household. Note that "Friends that Borrowed" and "Participation" changes changes after each iteration. These are in order the last two columns of the matrix.<br>
[A' hxh]i : Weighted Adjacency Matrix of Household at iteration i. Represents the strength of connection.<br>
[W1 31x1] : Parameter Matrix for Connections [information transfer]<br>
b1 : Parameter for Connections Constant<br>
[W2 31x1] : Parameter Matrix for Nodes [conversion]<br>
b2 : Parameter for Nodes Constant<br>
<br>
Each round goes through the following steps. <br>
<br>
1) [D hx31]i built using [P 1xh]i-- <br>
<br>
2) [A' hxh]i is calculated from [A hxh] and [D hx31]i. For each 1 in the matrix [A hxh], the weighted value is calculated by:<br>
[step1 1x31] = [Di-vector Of Sender - Di-vector Of Reciever] <br>
A'(sender,reciever) = [sigmoid([step1 1x31] * [W1 31x1] + b1)] <br> #This is the weight of each connection based on demographic factors. This calculation is done on the fully connected graph.
A' = A' *elementmulitplication A #This takes into account the overall graph structure. If there is no connection, the weights are set to zero. Thus the A' matrix is no longer a fully connected graph.
<br>
3) Calculate [K 1xh]i : <br>
[step1 1xh] = [K 1xh]i-- * [A' hxh]i <br> #This is incorrect desription look at code for actual process...
[K 1xh]i = 1 - ( ( 1 - [K 1xh]i-- ) *elementmultiplication ( 1 - [step1 1xh] ) )<br>
<br>
4) [P 1xh]i is calculated: <br>
[step1 hx1] = sigmoid([[D hx31]i * [W2 31x1] + b2)<br> #This is probability of a household taking out a loan given that they have the information.
[step1 1xh] = inverse ([step1 hx1])<br>
[step2 1xh] = [K 1xh]i *elementmultiplication [step1 1xh]<br>
[P 1xh]i = 1 - ( ( 1 - [P 1xh]i-- ) *elementmultiplication ( 1 - [step2 1xh] ) )<br>
<br>
After each round, sum the values of [P 1xh]. As soon as it passes the real participation rate for the village, stop the iterations and compare the final [P 1xh] matrix with the true participation values.

In [154]:
import math

def sigmoid(x):
    return 1 / (1 + math.exp(-x))

#The Values of The Parameters According to the Paper. If you wish to set your own, comment these out.
#W1 = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3.2]]).T
#b1 = -3
#W2 = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -2, 0]]).T
#b2 = -1

#Experiment with your own parameters. If you wish to use the parameters as reported in the paper, comment these out.
W1 = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]).T
b1 = 
W2 = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -2, 0]]).T
b2 = -1

#Pick a Village, Simulation Can Run on Only One at A Time
village = 1
villagex = testx.loc[((village*1000) < testx.index) & (testx.index < (((village+1)*1000)-1))]
villagey = testy.loc[((village*1000) < testy.index) & (testy.index < (((village+1)*1000)-1))]

A = pd.read_csv('1. Network Data/Adjacency Matrices/adj_allVillageRelationships_HH_vilno_' + str(village) + '.csv', header=None).values
K = np.array([villagex['leader'].values])
P = np.zeros(K.shape)
villagex = villagex.drop(['leader', 'Friends Borrowed'], axis=1) #all the constant values of D matrix
totalMF = villagey.sum().values[0]

In [155]:
import time
timeout = time.time() + 10   # 10 second time out
rounds = 0
while (P.sum() < totalMF and time.time() < timeout):
    rounds = rounds + 1
    #Step 1: [D hx31]i built using [P 1xh]i--
    D = villagex.copy()
    D['Friends that Borrowed'] = np.matmul(P,A).T
    D['Participation'] = P.T
    D = D.values
    
    #Step 2: [A' hxh]i is calculated from [A hxh] and [D hx31]i
    Aprime = np.zeros(A.shape)
    for i in range(len(Aprime)):
        for j in range(len(Aprime[i])):
            step1 = D[i] - D[j]
            Aprime[i][j] = sigmoid(np.matmul(step1,W1) + b1)
    Aprime = Aprime * A
    
    #Step 3: Calculate [K 1xh]i
    step1 = np.zeros(K.shape)
    for i in range(len(Aprime)):
        for j in range(len(Aprime[i])):
            temp = Aprime[i][j] * K[0][i]
            step1[0][j] = (1 - ((1-temp)*(1-step1[0][j])))
    K = (1 - ((1-K) * (1-step1)))
    
    #Step 4: Calculate [P 1xh]i
    step1[0] = np.fromiter(map(sigmoid, np.matmul(D,W2) + b2), dtype=np.double)
    step2 = step1 * K
    P = (1 - ((1-P) * (1-step2)))
delta = villagey.values.T - P

In [156]:
def selection_sort(x):
    for i in range(len(x)):
        swap = i + np.argmin(x[i:])
        (x[i], x[swap]) = (x[swap], x[i])
    return x

#selection_sort(delta[0])
np.sqrt((delta[0]**2).mean())

0.4258407494906782

## Map Clustering

In [158]:
import sklearn.cluster as cluster

In [198]:
def count_list(arr):
    result = {}
    for i in set(arr):
        result[i] = arr.count(i)
    return result

participation_rate = []
leadership_rate = []

for village in [1,2,3,4,6,9,10,12,15,19,20,21,23,24,25,28,29,31,32,33,36,37,39,41,42,43,45,46,47,48,50,51,52,55,57,59,60,62,64,65,66,67,68,70,71,72,73,75,77]:    
    A = pd.read_csv('1. Network Data/Adjacency Matrices/adj_allVillageRelationships_HH_vilno_' + str(village) + '.csv', header=None).values
    villagex = testx.loc[((village*1000) < testx.index) & (testx.index < (((village+1)*1000)-1))]
    villagey = testy.loc[((village*1000) < testy.index) & (testy.index < (((village+1)*1000)-1))]
    leaders = np.array([villagex['leader'].values])

    clusterid = cluster.spectral_clustering(A)
    cluster_of_borrowers = ((clusterid+1)*villagey.values.T)-1
    cluster_of_leaders = ((clusterid+1)*leaders)-1

    denominator = count_list(clusterid.tolist())
    numerator = count_list(cluster_of_borrowers.tolist()[0])
    count_of_leaders = count_list(cluster_of_leaders.tolist()[0])
    for i in range (0, len(denominator)):
        participation_rate.append(numerator.get(i,0) / denominator[i])
        leadership_rate.append(count_of_leaders.get(i,0) / denominator[i])





In [203]:
regression_dataframe = pd.DataFrame([participation_rate,leadership_rate]).T

In [206]:
regression_dataframe.corr(method="pearson")

Unnamed: 0,0,1
0,1.0,0.055047
1,0.055047,1.0
