# TWIA Lab Exercise 8

## Submitted by : Rahul Bachhish (17CSU145)

Case Study:
In this case study, you will investigate homophily of several characteristics of individuals connected in social networks in rural India.

You will calculate the chance homophily for an arbitrary characteristic. Homophily is the proportion of edges in the network whose constituent nodes share that characteristic. How much homophily do we expect by chance? If characteristics are distributed completely randomly, the probability that two nodes x and y share characteristic a is the probability both nodes have characteristic a, which is the frequency of a squared. The total probability that nodes x and y share their characteristic is therefore the sum of the frequency of each characteristic in the network

In [1]:
import pandas as pd
import numpy as np
from collections import Counter
import networkx as nx

In [2]:
df  = pd.read_stata("individual_characteristics.dta")
df1 = df.loc[df.village == 1]
df2 = df.loc[df.village == 2]
df1.head()

Unnamed: 0,village,adjmatrix_key,pid,hhid,resp_id,resp_gend,resp_status,age,religion,caste,...,privategovt,work_outside,work_outside_freq,shgparticipate,shg_no,savings,savings_no,electioncard,rationcard,rationcard_colour
0,1,5,100201,1002,1,1,Head of Household,38,HINDUISM,OBC,...,PRIVATE BUSINESS,Yes,0.0,No,,No,,Yes,Yes,GREEN
1,1,6,100202,1002,2,2,Spouse of Head of Household,27,HINDUISM,OBC,...,,,,No,,No,,Yes,Yes,GREEN
2,1,23,100601,1006,1,1,Head of Household,29,HINDUISM,OBC,...,OTHER LAND,No,,No,,No,,Yes,Yes,GREEN
3,1,24,100602,1006,2,2,Spouse of Head of Household,24,HINDUISM,OBC,...,PRIVATE BUSINESS,No,,Yes,1.0,Yes,1.0,Yes,No,
4,1,27,100701,1007,1,1,Head of Household,58,HINDUISM,OBC,...,OTHER LAND,No,,No,,No,,Yes,Yes,GREEN


In [3]:
pid1 = pd.read_csv("village1_pid.csv",dtype=int, header = None)
pid2 = pd.read_csv("village2_pid.csv",dtype=int, header = None)

In [4]:
sex1      = df1.set_index("pid")["resp_gend"].to_dict()
caste1    = df1.set_index("pid")["caste"].to_dict()
religion1 = df1.set_index("pid")["religion"].to_dict()

sex2      = df2.set_index("pid")["resp_gend"].to_dict()
caste2    = df2.set_index("pid")["caste"].to_dict()
religion2 = df2.set_index("pid")["religion"].to_dict()

In [5]:
def chance_homophily(chars):
    chars_counts = np.array(list(Counter(chars.values()).values()))
    chars_frequency = chars_counts / sum(chars_counts)
    return sum(chars_frequency**2)

favorite_colors = {
    "ankit":  "red",
    "xiaoyu": "blue",
    "mary":   "blue"
}

color_homophily = chance_homophily(favorite_colors)
print(color_homophily)

k = dict(Counter(favorite_colors))

0.5555555555555556


In [6]:
print("Village 1 chance of same sex:", chance_homophily(sex1))
print("Village 2 chance of same sex:", chance_homophily(sex2))
print("Village 1 chance of same caste:", chance_homophily(caste1))
print("Village 2 chance of same caste:", chance_homophily(caste2))
print("Village 1 chance of same religion:", chance_homophily(religion1))
print("Village 2 chance of same religion:", chance_homophily(religion2))

Village 1 chance of same sex: 0.5027299861680701
Village 2 chance of same sex: 0.5005945303210464
Village 1 chance of same caste: 0.6741488509791551
Village 2 chance of same caste: 0.425368244800893
Village 1 chance of same religion: 0.9804896988521925
Village 2 chance of same religion: 1.0


In [7]:
G=nx.Graph()
A1=np.loadtxt("village1_relationships.csv", delimiter=",")
A2=np.loadtxt("village2_relationships.csv", delimiter=",")

G1=nx.to_networkx_graph(A1)
G2=nx.to_networkx_graph(A2)

In [8]:
def homophily(G, chars, IDs):
   
    num_same_ties, num_ties = 0, 0
    for n1 in G.nodes():
        for n2 in G.nodes():
            if n1 > n2:  
                if IDs[0][n1] in chars and IDs[0][n2] in chars:
                    if G.has_edge(n1, n2):
                        num_ties += 1
                        if chars[IDs[0][n1]] == chars[IDs[0][n2]]:
                            num_same_ties += 1
    return (num_same_ties / num_ties)

In [9]:
print("Village 1 observed proportion of same sex:", homophily(G1, sex1, pid1))
print("Village 2 observed proportion of same sex:", homophily(G2, sex2, pid2))
print("Village 1 observed proportion of same caste:", homophily(G1, caste1, pid1))
print("Village 2 observed proportion of same caste:", homophily(G2, caste2, pid2))
print("Village 1 observed proportion of same religion :", homophily(G1, religion1, pid1))
print("Village 2 observed proportion of same religion :", homophily(G2, religion2, pid2))

Village 1 observed proportion of same sex: 0.5879345603271984
Village 2 observed proportion of same sex: 0.5622435020519836
Village 1 observed proportion of same caste: 0.7944785276073619
Village 2 observed proportion of same caste: 0.826265389876881
Village 1 observed proportion of same religion : 0.99079754601227
Village 2 observed proportion of same religion : 1.0
