# Tutorial 28 February 2024

## Personal networks, ego networks

#### We use the data collected by R. Vacca : the personal networks of 102 Sri Lankan immigrants living in Milan, Italy. The original data are available at: https://github.com/raffaelevacca/egocentric-r-book and a riche tutorial is available at: https://raffaelevacca.github.io/egocentric-r-book/

In [None]:
########### Preparation ##############
# import packages
import networkx as nx
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

In [None]:
# use pandas to import list of egos with attributes
Egos = pd.read_csv('Ego.attr.csv', sep=';')
print(Egos)
# we focus on Ego n. 28 (the first in the list)

In [None]:
# use pandas to import list of alters of Ego 28, with their attributes
Alters28 = pd.read_csv('Alter.attr.28.csv', sep=';')
print(Alters28)

In [None]:
# use pandas to import edgelist (of Ego 28) as a table
EgoEdges28 = pd.read_table('EgoNet.28.alone.csv', sep=';')
EgoEdges28

In [None]:
# generate graph from pandas edgelist
# it is an undirected graph

EgoNet28 = nx.from_pandas_edgelist(EgoEdges28, 'V1', 'V2', create_using=nx.Graph()) 
print(EgoNet28)

In [None]:
# match nodes of graph with attributes of nodes
NodeData = Alters28.set_index('alter_ID').to_dict('index').items()
EgoNet28.add_nodes_from(NodeData)

# view results
print(EgoNet28.nodes(data=True))

In [None]:
# view edges
print(EgoNet28.edges(data=True))

In [None]:
# Draw graph
nx.draw(EgoNet28)
plt.show()

## Personal network composition indicators

In [None]:
# Is the network of Ego 28 homogeneous by age?
# let's first look at the age of Ego 28 (the first in the table):
Egos['ego.age'].iloc[0]
# Ego turns out to be 61 yo

In [None]:
# Let's nos look at the age of alters
#For a numerical variable, like age, we can take mean, variance, stdev, quartiles, median...
print(Alters28['alter.age'].mean())
print(Alters28['alter.age'].var())
print(Alters28['alter.age'].std())
print(Alters28['alter.age'].quantile([0.25, 0.5, 0.75]))
# Ego 28's contacts are a bit younger, though with some variation

In [None]:
# Is the network of Ego 28 homogeneous by gender?
# Let's first look at ego's gender:
Egos['ego.sex'].iloc[0]
# Ego turns out to be a man

In [None]:
# Let's nos look at his alters
# For a binary variable like sex (well, at least in this case!),
# we can take proportion/percentage of women in network:
Alters28[['alter.sex']].value_counts(normalize=True)*100
# Ego 28, a man, is mostly surrounded by men

#### Do the same for one of the other binary variables, for example alter.nat (country of birth/nationality), or alter.fam (whether alter is family member)


In [None]:
## Blau index (for a categorical variable with more than 2 categories)

# recall it is equal to 1 - p1^2 - p2^2 - ... - pk^2

# first create a function
def blau(df, col):
    return (1- ((df[col].value_counts() / df[col].count()) ** 2).sum())

# then apply it to the 'alters' table of ego 28, for example with attribute 'alter.age.cat' (= age category)
blau(Alters28, 'alter.age.cat')

In [None]:
# Now, Index of Qualitative Variation
# which is a normalized version of Blau
# It is equal to Blau * k/(k-1), where k is the number of categories
# If you need to see how many/ which categories are represented in a categorical variables
print (Alters28['alter.age.cat'].unique())

In [None]:
# There are 7 categories! 
# We can now calculate the IQV

blau(Alters28, 'alter.age.cat')*7/6

In [None]:
# Herfindahl-Hirschman index (HHI)
# equal to 1 - Blau

1 - blau(Alters28, 'alter.age.cat')

#### Exercise: calculate these indexes for one of the other categorical variables, for example alter.rel (= type of relationship) or alter.res (=country of residence)

## Structural measures

In [None]:
# find isolates (components consisting in 1 node)
list(nx.isolates(EgoNet28)) 

In [None]:
# density
nx.density(EgoNet28) 

In [None]:
# Transitivity
print(nx.transitivity(EgoNet28)) 

In [None]:
# Diameter
nx.diameter(EgoNet28)

In [None]:
nx.average_shortest_path_length(EgoNet28)

In [None]:
# degree contrality
nx.degree_centrality(EgoNet28)

##### In principle, all structural measures can be calculated on a personal / ego network. Just be aware of the relevance of each measure (par ex. reciprocity is not meaningful in un undirected graph, diameter cannot be computed in an unconnected graph, etc.).

## Exercise for next week

##### Describe the ego-network of another actor in the database and compare it to the ego-network of Ego 28. in particular, look at their age and year of arrival in destination country (Italy). Do the egos who have spent more time in Italy have more diverse networks (for example with fewer family members, or with more residents in Italy than in native Sri Lanka, or with more Italian-born contacts)? Are more diverse networks also less cohesive (lower density or transitivity, or higher APL, etc.)? 