### Bishoy Sokkar
### Project: Network Analysis of Davis Southern Women Dataset

#### Dataset Selection and Description
For this project, I selected the Davis Southern Women dataset, a classic bipartite social network dataset. It represents the observed attendance of 18 Southern women at 14 informal social events over a nine-month period in the 1930s, as documented in the ethnographic study "Deep South" by Allison Davis, Burleigh B. Gardner, and Mary R. Gardner. The network is undirected and unweighted, with nodes divided into two categorical groups based on the 'bipartite' attribute: women (bipartite=0) and events (bipartite=1). This categorical information allows for comparison of centrality measures across the two groups, similar to comparing sexes in the high school romantic relationships example.

Number of Nodes: 32 (18 women, 14 events)
Number of Edges: 89
Categorical Information: Node type ('women' vs. 'events'), derived from the 'bipartite' attribute.

The dataset is publicly available and built into the NetworkX library, making it easy to load without external downloads. Original source: Davis, A., Gardner, B. B., & Gardner, M. R. (1941). Deep South: A Social Anthropological Study of Caste and Class. University of Chicago Press.
Links:

NetworkX documentation and generator: https://networkx.org/documentation/stable/reference/generated/networkx.generators.bipartite.davis_southern_women_graph.html
Wikipedia overview of the dataset: https://en.wikipedia.org/wiki/Southern_women_data_set

Methodology
The analysis was performed in Python using NetworkX for network operations, NumPy for numerical computations, and SciPy for statistical testing. The steps are as follows:

Load the Dataset: Used nx.davis_southern_women_graph() to load the graph.
Calculate Centrality Measures:

Degree Centrality: Normalized number of direct connections (attendances), divided by (n-1) where n=32.
Eigenvector Centrality: Measures influence based on connections to other influential nodes, computed using NetworkX's default method.


Group by Category: Divided nodes into 'women' and 'events' based on the 'bipartite' node attribute.
Compare Measures: Computed mean centrality values for each group and used an independent two-sample t-test (assuming unequal variances) to assess statistical differences.

This approach aligns with SNA concepts from Chapters 1-4, including bipartite graph representation, centrality measures, and subgroup analysis.
Python Code

In [1]:
import networkx as nx
import numpy as np
from scipy.stats import ttest_ind
import pandas as pd  # For displaying results in a table

# Load the dataset
G = nx.davis_southern_women_graph()

# Calculate centrality measures
degree = nx.degree_centrality(G)
eigen = nx.eigenvector_centrality(G)

# Group nodes by category
women = [n for n in G.nodes if G.nodes[n]['bipartite'] == 0]
events = [n for n in G.nodes if G.nodes[n]['bipartite'] == 1]

# Extract centrality values for each group
deg_w = [degree[n] for n in women]
deg_e = [degree[n] for n in events]
eig_w = [eigen[n] for n in women]
eig_e = [eigen[n] for n in events]

# Compute means
mean_deg_w = np.mean(deg_w)
mean_deg_e = np.mean(deg_e)
mean_eig_w = np.mean(eig_w)
mean_eig_e = np.mean(eig_e)

# Perform t-tests
t_deg, p_deg = ttest_ind(deg_w, deg_e)
t_eig, p_eig = ttest_ind(eig_w, eig_e)

# Display results in a table
data = {
    'Group': ['Women', 'Events'],
    'Number of Nodes': [len(women), len(events)],
    'Mean Degree Centrality': [mean_deg_w, mean_deg_e],
    'Mean Eigenvector Centrality': [mean_eig_w, mean_eig_e]
}
df = pd.DataFrame(data)
print(df)

# Print t-test results
print(f"\nDegree Centrality t-test: t = {t_deg:.3f}, p-value = {p_deg:.3f}")
print(f"Eigenvector Centrality t-test: t = {t_eig:.3f}, p-value = {p_eig:.3f}")

    Group  Number of Nodes  Mean Degree Centrality  \
0   Women               18                0.159498   
1  Events               14                0.205069   

   Mean Eigenvector Centrality  
0                     0.156621  
1                     0.168133  

Degree Centrality t-test: t = -1.390, p-value = 0.175
Eigenvector Centrality t-test: t = -0.439, p-value = 0.664


#### Degree Centrality Comparison:

- Events have a higher mean degree centrality (0.205) compared to women (0.159), indicating that events, on average, connect to more women than women attend events. This makes sense in a bipartite network where events can have multiple attendees.
- t-test results: t = -1.390, p-value = 0.175 (not statistically significant at α=0.05). There is no strong evidence of a difference in degree centrality between women and events.


#### Eigenvector Centrality Comparison:

- Events have a slightly higher mean eigenvector centrality (0.168) than women (0.157), suggesting events may be marginally more influential in the network structure, possibly due to connecting prominent women.
- t-test results: t = -0.439, p-value = 0.664 (not statistically significant at α=0.05). No evidence of meaningful differences between groups.


#### Discussion
The results show modest differences in centrality, with events appearing more central, which reflects the bipartite nature—events act as hubs connecting multiple women, embodying homophily within attendance patterns. However, the lack of statistical significance (high p-values) suggests these differences could be due to chance, given the small sample sizes. This analysis demonstrates how SNA can reveal structural roles in social groups. For future extensions, one could project the bipartite graph to a unipartite women network and analyze cliques or communities. The code can be run in a Jupyter Notebook for visualization (e.g., adding nx.draw(G)).