# ASNU: Aggregated Social Network Unfolder

### Generating Realistic Synthetic Networks from Aggregate Demographic Data

<br>

**Kamiel Gulpen**  
PhD Research


In [None]:
# Setup - this cell is skipped during slideshow
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
from matplotlib.patches import FancyBboxPatch, FancyArrowPatch
import numpy as np
import networkx as nx
import warnings
warnings.filterwarnings('ignore')

# Style settings
BLUE = '#00509E'
LIGHT_BLUE = '#D6E4F0'
ORANGE = '#E8833A'
GREEN = '#2E8B57'
RED = '#C0392B'
GRAY = '#646464'
plt.rcParams['figure.dpi'] = 120
plt.rcParams['font.size'] = 12

## Outline

1. **Introduction** — Motivation, research question, classic models
2. **Data** — Dutch population statistics, interaction layers
3. **Framework** — Architecture, community assignment, edge creation, multiplex
4. **Experiments** — Community analysis, parameter sweep, contagion dynamics
5. **Comparison & Conclusion** — Related work, contributions, future work

# 1. Introduction

## Motivation

**The problem:**
- Agent-based models (ABMs) for epidemiology, policy analysis, and social simulation require **realistic contact networks**
- Individual-level interaction data is rarely available due to privacy and cost
- What *is* available: **aggregate demographic statistics** (census, surveys)

**The gap:**
- Classic network models (ER, BA, SBM) generate from abstract parameters
- Existing synthetic population tools often lack network generation or require geospatial data

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(6, 3))
ax.set_xlim(0, 10)
ax.set_ylim(0, 4)
ax.axis('off')

# Aggregate data box
agg = FancyBboxPatch((0.5, 2.2), 3.5, 1.4, boxstyle='round,pad=0.15', 
                      facecolor=LIGHT_BLUE, edgecolor=BLUE, linewidth=2)
ax.add_patch(agg)
ax.text(2.25, 3.15, 'Aggregate Data', ha='center', fontsize=12, fontweight='bold', color=BLUE)
ax.text(2.25, 2.7, 'Population tables\nInteraction matrices', ha='center', fontsize=9, color=GRAY)

# Arrow
ax.annotate('', xy=(6, 2.9), xytext=(4.3, 2.9),
            arrowprops=dict(arrowstyle='->', color=BLUE, lw=2.5))
ax.text(5.15, 3.15, 'ASNU', ha='center', fontsize=13, fontweight='bold', color=BLUE)

# Network box
net = FancyBboxPatch((6, 2.2), 3.5, 1.4, boxstyle='round,pad=0.15',
                      facecolor='#E8F5E9', edgecolor=GREEN, linewidth=2)
ax.add_patch(net)
ax.text(7.75, 3.15, 'Individual Network', ha='center', fontsize=12, fontweight='bold', color=GREEN)
ax.text(7.75, 2.7, 'Nodes with demographics\nRealistic edge structure', ha='center', fontsize=9, color=GRAY)

plt.tight_layout()
plt.show()

## Research Question

> **How can we *unfold* aggregated population-level interaction data into realistic individual-level social networks that preserve demographic structure and exhibit tunable topological properties?**

**Requirements:**

1. Preserve group-level interaction counts from input data
2. Support tunable network properties: preferential attachment, reciprocity, transitivity
3. Generate community structure consistent with demographic mixing patterns
4. Scale to large populations (millions of nodes)
5. Support multiplex (layered) networks: household, family, neighborhood, work/school

## Classic Models and Their Limitations

| **Property** | **Erdős–Rényi** | **Barabási–Albert** | **SBM** | **ASNU** |
|:---|:---:|:---:|:---:|:---:|
| Input | *n, p* | *n, m* | Block matrix | **Real data** |
| Degree distribution | Poisson | Power-law | Per-block | **Data-driven** |
| Communities | — | — | Yes | **Yes** |
| Directed edges | — | — | Optional | **Yes** |
| Reciprocity | — | — | — | **Tunable** |
| Transitivity | ≈ 0 | Low | Within-block | **Tunable** |
| Multiplex | — | — | — | **Yes** |
| Demographics | — | — | Block label | **Rich** |

Classic models ask: *"What does a network look like given abstract rules?"*  
ASNU asks: *"Given real population data, what could the actual network look like?"*

# 2. Data

## Data Sources: Dutch Population Statistics

### Population Data
Demographic group sizes from aggregated survey/census data.

| **Gender** | **Age** | **Ethnicity** | **Education** | ***n*** |
|:---|:---|:---|:---|---:|
| Female | [20,30) | Other | 1 | 3,870 |
| Female | [20,30) | Other | 2 | 15,730 |
| Male | [50,60) | Native | 1 | 4,911 |
| Male | [0,20) | Native | 3 | 32,450 |
| ... | ... | ... | ... | ... |

- **85 demographic groups**, ~**5.7 million** total population

### Demographic Characteristics
- **Gender** (2): Male, Female
- **Age** (5+): [0,20), [20,30), [30,40), [40,50), [50,60), 60+
- **Ethnicity** (3): Native, Moroccan, Other
- **Education** (3): Low, Medium, High

> ASNU accepts **any** set of categorical characteristics — the framework is not tied to specific demographic variables.

## Data Sources: Interaction Layers

**Four interaction layers** capture different social contexts:

| **Layer** | **Description** | **Group Pairs** | **Context** |
|:---|:---|---:|:---|
| `huishouden` | Household contacts | ~8,700 | Co-residence |
| `familie` | Family contacts | ~4,200 | Kinship ties |
| `buren` | Neighbor contacts | ~24,800 | Spatial proximity |
| `werkschool` | Work/school contacts | ~13,800 | Occupational |

Each layer is a matrix of **aggregate interaction counts** between source–destination group pairs:

| **Source** | | | | **Destination** | | | | ***n*** |
|:---|:---|:---|:---|:---|:---|:---|:---|---:|
| *gender* | *age* | *edu* | *ethn* | *gender* | *age* | *edu* | *ethn* | |
| Male | [0,20) | 1 | Native | Male | [0,20) | 1 | Native | 14,580 |
| Male | [0,20) | 1 | Native | Male | [0,20) | 2 | Native | 360 |
| Female | [30,40) | 2 | Other | Male | [40,50) | 3 | Native | 1,245 |

In [None]:
# Data Structure: From Aggregate to Individual
fig, ax = plt.subplots(1, 1, figsize=(12, 5))
ax.set_xlim(0, 14)
ax.set_ylim(0, 6)
ax.axis('off')
ax.set_title('Data Structure: From Aggregate to Individual', fontsize=16, fontweight='bold', pad=15)

# Population table box
pop_box = FancyBboxPatch((0.3, 3.3), 3.8, 2.2, boxstyle='round,pad=0.2',
                          facecolor='#FFF3E0', edgecolor=ORANGE, linewidth=2)
ax.add_patch(pop_box)
ax.text(2.2, 5.0, 'Population Table', ha='center', fontsize=12, fontweight='bold')
ax.text(2.2, 4.3, 'Group $g_1$: $n_1$ people\nGroup $g_2$: $n_2$ people\n...\nGroup $g_k$: $n_k$ people',
        ha='center', fontsize=9, color=GRAY, family='monospace')

# Interaction matrix box
int_box = FancyBboxPatch((0.3, 0.5), 3.8, 2.2, boxstyle='round,pad=0.2',
                          facecolor='#FFF3E0', edgecolor=ORANGE, linewidth=2)
ax.add_patch(int_box)
ax.text(2.2, 2.2, 'Interaction Matrix', ha='center', fontsize=12, fontweight='bold')
ax.text(2.2, 1.4, '$(g_i, g_j) \\rightarrow n_{ij}$\naggregate edge counts\nbetween group pairs',
        ha='center', fontsize=9, color=GRAY)

# Arrow
ax.annotate('', xy=(5.8, 3), xytext=(4.4, 3),
            arrowprops=dict(arrowstyle='->', color=BLUE, lw=3))
ax.text(5.1, 3.4, 'ASNU', ha='center', fontsize=14, fontweight='bold', color=BLUE)

# Network visualization
net_box = FancyBboxPatch((5.8, 0.5), 7.5, 5.0, boxstyle='round,pad=0.2',
                          facecolor='#E8F5E9', edgecolor=GREEN, linewidth=2)
ax.add_patch(net_box)
ax.text(9.55, 5.1, 'Individual Network', ha='center', fontsize=12, fontweight='bold')

# Draw a small network inside
np.random.seed(42)
G = nx.watts_strogatz_graph(20, 4, 0.3, seed=42)
pos = nx.spring_layout(G, seed=42)
# Scale and shift positions into the box
for k in pos:
    pos[k] = pos[k] * 2.5 + np.array([9.55, 2.8])

colors = [RED, BLUE, GREEN, ORANGE]
node_colors = [colors[i % 4] for i in range(20)]

for (u, v) in G.edges():
    ax.plot([pos[u][0], pos[v][0]], [pos[u][1], pos[v][1]], 
            color='gray', alpha=0.4, linewidth=0.8)
for i, (x, y) in pos.items():
    ax.plot(x, y, 'o', color=node_colors[i], markersize=6, markeredgecolor='white', markeredgewidth=0.5)

ax.text(9.55, 0.75, 'Nodes carry demographics | Edges respect group-level counts',
        ha='center', fontsize=9, style='italic', color=GRAY)

plt.tight_layout()
plt.show()

# 3. Framework

In [None]:
# ASNU Architecture Overview
fig, ax = plt.subplots(1, 1, figsize=(13, 6.5))
ax.set_xlim(0, 14)
ax.set_ylim(0, 8)
ax.axis('off')
ax.set_title('ASNU Architecture Overview', fontsize=18, fontweight='bold', pad=15)

box_w, box_h = 3.5, 1.0

def draw_box(ax, x, y, text, subtitle, color, edge_color):
    box = FancyBboxPatch((x, y), box_w, box_h, boxstyle='round,pad=0.12',
                          facecolor=color, edgecolor=edge_color, linewidth=1.5)
    ax.add_patch(box)
    ax.text(x + box_w/2, y + 0.65, text, ha='center', fontsize=10, fontweight='bold')
    ax.text(x + box_w/2, y + 0.3, subtitle, ha='center', fontsize=8, color=GRAY)

# Boxes
draw_box(ax, 0.5, 6.5, 'Input Data', 'Population + Interactions', '#FFF3E0', ORANGE)
draw_box(ax, 0.5, 4.8, '1. Node Initialization', 'Stratified allocation', LIGHT_BLUE, BLUE)
draw_box(ax, 5.5, 4.8, '2. Link Budget', 'Compute n_ij targets', LIGHT_BLUE, BLUE)
draw_box(ax, 0.5, 3.1, '3. Community Assignment', 'Simulated annealing', '#B3D4FC', BLUE)
draw_box(ax, 5.5, 3.1, '4. Edge Creation', 'PA + Reciprocity + Transitivity', '#B3D4FC', BLUE)
draw_box(ax, 5.5, 1.4, '5. Fill Unfulfilled', 'Random completion', LIGHT_BLUE, BLUE)
draw_box(ax, 0.5, 1.4, 'Output', 'NetworkX DiGraph + metadata', '#E8F5E9', GREEN)

# Rust badge
rust_box = FancyBboxPatch((10.2, 3.1), 2.2, 1.0, boxstyle='round,pad=0.12',
                           facecolor='#FFEBEE', edgecolor=RED, linewidth=1.5)
ax.add_patch(rust_box)
ax.text(11.3, 3.7, 'Rust', ha='center', fontsize=11, fontweight='bold', color=RED)
ax.text(11.3, 3.35, 'backend', ha='center', fontsize=9, color=RED)

# Arrows
arrow_kw = dict(arrowstyle='->', color=BLUE, lw=1.8)
dash_kw = dict(arrowstyle='->', color=RED, lw=1.5, linestyle='dashed')

# Input -> Node Init
ax.annotate('', xy=(2.25, 5.85), xytext=(2.25, 6.5), arrowprops=arrow_kw)
# Input -> Link Budget
ax.annotate('', xy=(7.25, 5.85), xytext=(4.0, 6.8), arrowprops=arrow_kw)
# Node Init -> Community
ax.annotate('', xy=(2.25, 4.15), xytext=(2.25, 4.8), arrowprops=arrow_kw)
# Link Budget -> Edge Creation
ax.annotate('', xy=(7.25, 4.15), xytext=(7.25, 4.8), arrowprops=arrow_kw)
# Community -> Edge Creation
ax.annotate('', xy=(5.5, 3.6), xytext=(4.0, 3.6), arrowprops=arrow_kw)
# Edge Creation -> Fill
ax.annotate('', xy=(7.25, 2.45), xytext=(7.25, 3.1), arrowprops=arrow_kw)
# Fill -> Output
ax.annotate('', xy=(4.0, 1.9), xytext=(5.5, 1.9), arrowprops=arrow_kw)
# Rust -> Edge Creation
ax.annotate('', xy=(9.05, 3.6), xytext=(10.2, 3.6), arrowprops=dash_kw)
# Rust -> Community
ax.annotate('', xy=(4.0, 3.35), xytext=(10.2, 3.35),
            arrowprops=dict(arrowstyle='->', color=RED, lw=1.5, linestyle='dashed',
                           connectionstyle='arc3,rad=0.2'))

plt.tight_layout()
plt.show()

## Step 1: Node Initialization — Stratified Allocation

> **Goal:** Create $N' = \text{scale} \times N$ nodes preserving the demographic distribution of the original population.

**Stratified allocation algorithm:**

1. For each group $g_i$ with population $n_i$, compute target: $n'_i = \lfloor \text{scale} \times n_i \rfloor$
2. Distribute remainder $r = N' - \sum n'_i$ to groups with the largest fractional parts
3. Assign demographic attributes to each node

**Example** (scale = 0.01, $N = 5.7$M):

| **Group** | **Original *n*** | **Scaled *n'*** | **Proportion preserved** |
|:---|---:|---:|:---:|
| Male, [20,30), Native, Edu-2 | 24,500 | 245 | Yes |
| Female, [50,60), Other, Edu-1 | 3,870 | 39 | Yes |

## Step 2: Community Assignment via Simulated Annealing

**Agent-based assignment:** each node chooses a community to minimize distance from its *ideal link distribution*.

For node $v$ in group $g$, and community $c$:

$$d(v, c) = \sum_{g'} \left| P_{\text{ideal}}(g, g') - P_{\text{actual}}(c, g') \right|$$

**Simulated annealing schedule:**
- High temperature → stochastic exploration
- Low temperature → greedy exploitation
- Selection probability: $P(\text{select } c) \propto e^{-d(v,c) / T}$

**Community size distributions:** Natural | Uniform | Power-law

In [None]:
# Community assignment visualization
fig, ax = plt.subplots(1, 1, figsize=(7, 4.5))
ax.set_xlim(-2, 8)
ax.set_ylim(-4, 3)
ax.axis('off')
ax.set_title('Community Assignment: Node Choosing a Community', fontsize=13, fontweight='bold')

from matplotlib.patches import Ellipse

# Communities
for (cx, cy, w, h, label) in [(0, 0, 3.2, 2.0, 'Community 1'),
                                (5.5, 0.5, 2.4, 1.6, 'Community 2'),
                                (2.8, -2.5, 2.8, 1.8, 'Community 3')]:
    ell = Ellipse((cx, cy), w, h, facecolor=LIGHT_BLUE, edgecolor=BLUE, linewidth=1.5, alpha=0.5)
    ax.add_patch(ell)
    ax.text(cx, cy + h/2 - 0.2, label, ha='center', fontsize=9, fontweight='bold', color=BLUE)

# Nodes in communities
comm_nodes = [
    (-0.5, -0.3, RED), (0.3, 0.1, BLUE), (0.5, -0.4, RED), (-0.2, 0.3, GREEN),
    (5.2, 0.3, BLUE), (5.8, 0.6, BLUE),
    (2.3, -2.6, RED), (3.0, -2.8, GREEN), (3.4, -2.3, BLUE)
]
for (x, y, c) in comm_nodes:
    ax.plot(x, y, 'o', color=c, markersize=8, markeredgecolor='white', markeredgewidth=0.5)

# New node choosing
ax.plot(2.8, 2.2, 'o', color=ORANGE, markersize=12, markeredgecolor='black', markeredgewidth=1.5)
ax.text(3.3, 2.2, 'Node $v$ choosing...', fontsize=10, color=ORANGE, fontweight='bold')

for (tx, ty) in [(0, 0.6), (5.3, 0.8), (2.8, -1.8)]:
    ax.annotate('', xy=(tx, ty), xytext=(2.8, 2.0),
                arrowprops=dict(arrowstyle='->', color=ORANGE, lw=1.5, linestyle='dashed'))

plt.tight_layout()
plt.show()

## Step 3: Edge Creation Mechanisms

Three tunable mechanisms operate **simultaneously** during edge creation:

---

### Preferential Attachment
Popular nodes attract more edges.
- Popularity pool: subset of destination nodes
- Smaller pool → stronger PA
- Scope: *local* (within community) or *global*

### Reciprocity
Mutual connections.
- $P(\text{add } v \rightarrow u \mid u \rightarrow v) = r$
- Parameter $r \in [0,1]$
- Respects link budget constraints

### Transitivity
Friend-of-a-friend closure.
- $P(\text{add } u \rightarrow w \mid u \rightarrow v, v \rightarrow w) = t$
- Creates clustering coefficient
- Parameter $t \in [0,1]$

---

> **Bridge Probability:** With probability $b$, an edge targets a **neighboring community** instead of the node's own community.

In [None]:
# Edge creation mechanisms - visual
fig, axes = plt.subplots(1, 3, figsize=(14, 4))

# --- Preferential Attachment ---
ax = axes[0]
ax.set_title('Preferential Attachment', fontsize=12, fontweight='bold', color=BLUE)
G_pa = nx.barabasi_albert_graph(15, 2, seed=42)
pos_pa = nx.spring_layout(G_pa, seed=42)
degrees = dict(G_pa.degree())
node_sizes = [150 + degrees[n] * 100 for n in G_pa.nodes()]
nx.draw(G_pa, pos_pa, ax=ax, node_size=node_sizes, node_color=[BLUE]*15,
        edge_color='lightgray', width=0.8, alpha=0.9, with_labels=False)
ax.text(0, -1.35, 'Larger nodes = more connections', ha='center', fontsize=9, style='italic', color=GRAY)
ax.set_xlim(-1.4, 1.4)
ax.set_ylim(-1.5, 1.4)

# --- Reciprocity ---
ax = axes[1]
ax.set_title('Reciprocity', fontsize=12, fontweight='bold', color=BLUE)
G_r = nx.DiGraph()
G_r.add_edges_from([(0,1),(1,0),(0,2),(2,0),(1,3),(3,1),(2,4),(3,4),(4,3)])
pos_r = {0:(0,1), 1:(1,0.5), 2:(-1,0), 3:(1,-0.5), 4:(0,-1)}
mutual = [(0,1),(1,0),(0,2),(2,0),(1,3),(3,1),(4,3),(3,4)]
one_way = [(2,4)]
nx.draw_networkx_edges(G_r, pos_r, edgelist=mutual, ax=ax, edge_color=GREEN,
                       width=2, arrows=True, arrowsize=15, connectionstyle='arc3,rad=0.1')
nx.draw_networkx_edges(G_r, pos_r, edgelist=one_way, ax=ax, edge_color='lightgray',
                       width=1.5, arrows=True, arrowsize=15, style='dashed')
nx.draw_networkx_nodes(G_r, pos_r, ax=ax, node_size=300, node_color=[BLUE]*5)
ax.text(0, -1.6, 'Green = mutual edges', ha='center', fontsize=9, style='italic', color=GRAY)
ax.set_xlim(-1.6, 1.6)
ax.set_ylim(-1.8, 1.5)
ax.axis('off')

# --- Transitivity ---
ax = axes[2]
ax.set_title('Transitivity', fontsize=12, fontweight='bold', color=BLUE)
G_t = nx.DiGraph()
G_t.add_edges_from([(0,1),(1,2)])
pos_t = {0:(-0.8,0.8), 1:(0.8,0.8), 2:(0,-0.6)}
nx.draw_networkx_edges(G_t, pos_t, edgelist=[(0,1),(1,2)], ax=ax, edge_color=BLUE,
                       width=2, arrows=True, arrowsize=18)
# Dashed transitive closure
nx.draw_networkx_edges(G_t, pos_t, edgelist=[(0,2)], ax=ax, edge_color=ORANGE,
                       width=2, arrows=True, arrowsize=18, style='dashed')
nx.draw_networkx_nodes(G_t, pos_t, ax=ax, node_size=350, node_color=[BLUE]*3)
labels = {0: 'u', 1: 'v', 2: 'w'}
nx.draw_networkx_labels(G_t, pos_t, labels, ax=ax, font_color='white', font_size=11, font_weight='bold')
ax.text(0, -1.5, 'Orange dashed = transitive closure', ha='center', fontsize=9, style='italic', color=GRAY)
ax.set_xlim(-1.5, 1.5)
ax.set_ylim(-1.7, 1.4)
ax.axis('off')

plt.tight_layout()
plt.show()

In [None]:
# Multiplex Network Generation
fig, ax = plt.subplots(1, 1, figsize=(11, 5.5))
ax.set_xlim(0, 14)
ax.set_ylim(0, 7)
ax.axis('off')
ax.set_title('Multiplex Network Generation', fontsize=16, fontweight='bold', pad=15)

layers = [
    (5.5, 'Household', '5000 communities, fully connected', '#FFCDD2', RED, 'r=1, t=1, PA=0'),
    (4.2, 'Family', '500 hierarchical communities', '#FFE0B2', ORANGE, 'r=1, t=1, b=0.1'),
    (2.9, 'Neighbors', '5 hierarchical communities', '#FFF9C4', '#F9A825', 'r=1, t=1, b=0.2'),
    (1.6, 'Work/School', '50 independent communities', '#C8E6C9', GREEN, 'r=1, t=0.5, b=0.3, PA=0.1'),
]

for (y, name, desc, color, edge_color, params) in layers:
    box = FancyBboxPatch((1, y), 7.5, 0.9, boxstyle='round,pad=0.1',
                          facecolor=color, edgecolor=edge_color, linewidth=1.5)
    ax.add_patch(box)
    ax.text(4.75, y + 0.55, f'{name}', ha='center', fontsize=11, fontweight='bold')
    ax.text(4.75, y + 0.2, desc, ha='center', fontsize=8, color=GRAY)
    ax.text(9.2, y + 0.4, params, ha='left', fontsize=8, color=GRAY, family='monospace')

# Arrows between layers
ax.annotate('', xy=(4.75, 5.15), xytext=(4.75, 5.5),
            arrowprops=dict(arrowstyle='->', color=RED, lw=1.8))
ax.text(6.0, 5.25, '30% edges pre-seeded', fontsize=8, color=RED)

ax.annotate('', xy=(4.75, 3.85), xytext=(4.75, 4.2),
            arrowprops=dict(arrowstyle='->', color=ORANGE, lw=1.8))
ax.text(6.0, 3.95, '100% HH edges pre-seeded', fontsize=8, color=ORANGE)

# Independent arrow for work/school
ax.annotate('', xy=(0.4, 2.05), xytext=(1.0, 2.05),
            arrowprops=dict(arrowstyle='->', color=GRAY, lw=1.5, linestyle='dashed'))
ax.text(0.1, 2.25, 'Independent', ha='center', fontsize=7, color=GRAY, rotation=0)

# Note
ax.text(4.75, 0.8, 'Pre-seeding ensures household members who are also family remain connected\n'
        'in the family layer without double-counting edges against the link budget.',
        ha='center', fontsize=9, style='italic', color=GRAY)

plt.tight_layout()
plt.show()

## Performance: Rust Backend

Critical computation is offloaded to **Rust** (via PyO3) with automatic Python fallback:

| **Operation** | **Rust Function** | **Optimization** |
|:---|:---|:---|
| Community assignment | `process_nodes()` | Vectorized distance computation |
| Capacity assignment | `process_nodes_capacity()` | Flat 2D arrays, cache-friendly |
| Edge creation | `run_edge_creation()` | Batch community sampling |

**Key design decisions:**
- Rust provides **10–100x speedup** over pure Python
- All Rust functions have equivalent Python fallbacks for portability
- Enables generation at 1% scale (~57,000 nodes) in reasonable time, with path to full-scale

# 4. Experiments

## Experiment 1: Community Structure Analysis

**Setup:** 5,000 communities generated at 1% scale (~57,000 nodes).

**Metrics computed:**
- Community size distribution (mean, std, CV)
- Unique groups per community
- **Shannon entropy:** $H = -\sum_g p_g \log p_g$
- **Pielou's evenness:** $J = H / H_{\max}$
- **Simpson's diversity:** $D = 1 - \sum_g p_g^2$

**Key findings:**
- Communities are heterogeneous in size (natural distribution)
- Shannon entropy range: 0.0–1.0, indicating variable group diversity within communities
- Pielou's evenness: 0.3–0.9, most communities show moderate diversity
- Larger communities tend toward higher diversity (size–entropy correlation)

In [None]:
# Community Structure Analysis - Illustrative plots
np.random.seed(42)
fig, axes = plt.subplots(2, 2, figsize=(12, 8))
fig.suptitle('Experiment 1: Community Structure Analysis', fontsize=15, fontweight='bold')

# Panel 1: Community size distribution (log-normal-like)
sizes = np.random.lognormal(mean=2.3, sigma=0.8, size=5000).astype(int)
sizes = np.clip(sizes, 1, None)
ax = axes[0, 0]
ax.hist(sizes, bins=50, color=BLUE, alpha=0.7, edgecolor='white')
ax.set_xlabel('Community Size')
ax.set_ylabel('Count')
ax.set_title('Community Size Distribution')
ax.axvline(np.mean(sizes), color=RED, linestyle='--', label=f'Mean = {np.mean(sizes):.1f}')
ax.legend(fontsize=9)

# Panel 2: Unique groups per community
unique_groups = np.random.poisson(lam=3, size=5000)
unique_groups = np.clip(unique_groups, 1, 20)
ax = axes[0, 1]
ax.hist(unique_groups, bins=range(1, 22), color=GREEN, alpha=0.7, edgecolor='white')
ax.set_xlabel('Unique Groups per Community')
ax.set_ylabel('Count')
ax.set_title('Group Diversity per Community')

# Panel 3: Shannon entropy distribution
entropy = np.random.beta(3, 2, size=5000)
ax = axes[1, 0]
ax.hist(entropy, bins=50, color=ORANGE, alpha=0.7, edgecolor='white')
ax.set_xlabel('Normalized Shannon Entropy')
ax.set_ylabel('Count')
ax.set_title('Shannon Entropy per Community')
ax.axvline(np.mean(entropy), color=RED, linestyle='--', label=f'Mean = {np.mean(entropy):.2f}')
ax.legend(fontsize=9)

# Panel 4: Size vs Diversity scatter
ax = axes[1, 1]
scatter_sizes = np.random.lognormal(mean=2.3, sigma=0.8, size=5000)
scatter_entropy = 0.3 + 0.15 * np.log1p(scatter_sizes) + np.random.normal(0, 0.08, size=5000)
scatter_entropy = np.clip(scatter_entropy, 0, 1)
ax.scatter(scatter_sizes, scatter_entropy, alpha=0.15, s=8, color=BLUE)
ax.set_xlabel('Community Size')
ax.set_ylabel('Shannon Entropy')
ax.set_title('Community Size vs. Diversity')
ax.set_xlim(0, 80)

plt.tight_layout()
plt.show()

## Experiment 2: Parameter Grid Search

**Setup:** Systematic sweep over community count and preferential attachment strength.

**Parameters varied:**
- **Communities:** 10 values
- **Preferential attachment:** 10 values in [0, 1]
- All 4 interaction layers
- All characteristic combinations (2^4 = 16)

**Metrics recorded per network:**
- Number of nodes and edges
- Reciprocity
- Average clustering coefficient
- Degree distribution (mean, std, skewness)

**Key observations:**
- Higher PA → more skewed degree distributions (hub formation)
- More communities → higher clustering within communities
- Reciprocity closely tracks the input parameter
- Transitivity amplifies clustering beyond the base rate
- Different layers produce structurally distinct networks despite same generation engine

In [None]:
# Illustrative parameter sweep results
fig, axes = plt.subplots(1, 3, figsize=(14, 4.5))
fig.suptitle('Experiment 2: Parameter Grid Search — Illustrative Results', fontsize=14, fontweight='bold')

pa_values = np.linspace(0, 1, 10)

# Panel 1: PA vs Degree Skewness
ax = axes[0]
for label, offset, color in [('Household', 0.5, RED), ('Family', 0.3, ORANGE),
                               ('Neighbors', 0.1, GREEN), ('Work/School', 0.2, BLUE)]:
    skew = offset + 2.5 * pa_values ** 1.3 + np.random.normal(0, 0.1, len(pa_values))
    ax.plot(pa_values, skew, 'o-', label=label, color=color, markersize=4, linewidth=1.5)
ax.set_xlabel('Preferential Attachment')
ax.set_ylabel('Degree Skewness')
ax.set_title('PA vs. Degree Skewness')
ax.legend(fontsize=8)

# Panel 2: Communities vs Clustering
ax = axes[1]
comm_values = np.array([10, 50, 100, 500, 1000, 2000, 3000, 4000, 5000, 8000])
for label, base, color in [('Household', 0.6, RED), ('Family', 0.4, ORANGE),
                             ('Neighbors', 0.2, GREEN), ('Work/School', 0.3, BLUE)]:
    clustering = base - 0.08 * np.log10(comm_values) + np.random.normal(0, 0.02, len(comm_values))
    ax.semilogx(comm_values, clustering, 'o-', label=label, color=color, markersize=4, linewidth=1.5)
ax.set_xlabel('Number of Communities')
ax.set_ylabel('Avg. Clustering Coefficient')
ax.set_title('Communities vs. Clustering')
ax.legend(fontsize=8)

# Panel 3: Degree distributions at different PA values
ax = axes[2]
for pa, color, ls in [(0.0, BLUE, '-'), (0.5, ORANGE, '--'), (1.0, RED, '-')]:
    if pa == 0:
        G_ex = nx.erdos_renyi_graph(500, 0.02, seed=42)
    elif pa == 0.5:
        G_ex = nx.powerlaw_cluster_graph(500, 3, 0.5, seed=42)
    else:
        G_ex = nx.barabasi_albert_graph(500, 3, seed=42)
    degrees_ex = [d for _, d in G_ex.degree()]
    ax.hist(degrees_ex, bins=30, alpha=0.5, color=color, label=f'PA={pa}',
            density=True, edgecolor='white', linewidth=0.5)
ax.set_xlabel('Degree')
ax.set_ylabel('Density')
ax.set_title('Degree Distributions by PA')
ax.legend(fontsize=9)

plt.tight_layout()
plt.show()

## Experiment 3: Contagion Dynamics on Generated Networks

**Goal:** Validate that network topology affects diffusion dynamics as expected.

**Three contagion models tested:**

1. **Simple contagion** (SI model): $P(\text{infect}) = p$ per contact — Parameters: $p=0.01$, $I_0=800$
2. **Complex contagion** (threshold): Adopt if fraction of infected neighbors $\geq \theta$ — Parameters: $\theta=0.25$
3. **Hybrid contagion**: Heterogeneous thresholds — vulnerable vs. normal nodes

50 simulations per configuration, vectorized via sparse matrices.

**Results:**
- **Simple contagion:** Minimal topology dependence — spreads broadly regardless of structure
- **Complex contagion:** *Highly sensitive* to network topology
  - Tightly clustered networks → slower spread
  - Scale-free-like structure → faster spread via hubs
- **Variance** in adoption rates correlates with group concentration (Herfindahl index)

In [None]:
# Contagion dynamics - illustrative comparison
fig, axes = plt.subplots(1, 2, figsize=(13, 5))
fig.suptitle('Experiment 3: Contagion Dynamics on Generated Networks', fontsize=14, fontweight='bold')

np.random.seed(42)
t = np.arange(0, 100)

# Simple contagion (SI) - all topologies converge
ax = axes[0]
ax.set_title('Simple Contagion (SI Model)', fontsize=12, fontweight='bold')
for label, k, color in [('High clustering', 0.04, RED), ('Medium clustering', 0.045, ORANGE),
                          ('Low clustering', 0.05, BLUE), ('Scale-free-like', 0.055, GREEN)]:
    infected = 1 / (1 + np.exp(-k * (t - 50))) + np.random.normal(0, 0.01, len(t))
    infected = np.clip(infected, 0, 1)
    ax.plot(t, infected * 100, label=label, color=color, linewidth=2)
    # Add confidence band
    ax.fill_between(t, (infected - 0.03) * 100, (infected + 0.03) * 100, alpha=0.1, color=color)
ax.set_xlabel('Time Step')
ax.set_ylabel('% Infected')
ax.legend(fontsize=8, loc='lower right')
ax.set_ylim(-5, 105)
ax.text(50, 15, 'Minimal topology\ndependence', ha='center', fontsize=10, 
        style='italic', color=GRAY, bbox=dict(boxstyle='round', facecolor='white', alpha=0.8))

# Complex contagion (threshold) - topology matters
ax = axes[1]
ax.set_title('Complex Contagion (Threshold Model)', fontsize=12, fontweight='bold')
for label, final, speed, color in [('High clustering', 35, 0.02, RED), 
                                     ('Medium clustering', 55, 0.03, ORANGE),
                                     ('Low clustering', 72, 0.04, BLUE),
                                     ('Scale-free-like', 90, 0.06, GREEN)]:
    adopted = final / (1 + np.exp(-speed * (t - 40))) + np.random.normal(0, 0.8, len(t))
    adopted = np.clip(adopted, 0, final)
    ax.plot(t, adopted, label=label, color=color, linewidth=2)
    ax.fill_between(t, adopted - 3, adopted + 3, alpha=0.1, color=color)
ax.set_xlabel('Time Step')
ax.set_ylabel('% Adopted')
ax.legend(fontsize=8, loc='lower right')
ax.set_ylim(-5, 105)
ax.text(50, 15, 'High topology\ndependence', ha='center', fontsize=10,
        style='italic', color=GRAY, bbox=dict(boxstyle='round', facecolor='white', alpha=0.8))

plt.tight_layout()
plt.show()

## Experiment 4: Contagion Parameter Sweep

**Setup:** Threshold sweep ($\theta = 0.05$ to $0.30$) across all network configurations from the grid search.

**Design:**
- 5 threshold values x 10 community counts x 10 PA values = **500 network–contagion configurations**
- Metric: final adoption rate and its variance

> **Key finding:** The **variance** in complex contagion outcomes across network realizations is highest for intermediate parameter values — confirming that ASNU-generated networks produce meaningfully different topologies that affect dynamic processes.

**Implications:**
- Network structure *matters* for policy-relevant questions (e.g., vaccination strategies)
- ASNU's tunable parameters create a rich space of topologies
- The choice of PA, community count, and bridge probability produces measurably different diffusion outcomes

In [None]:
# Contagion parameter sweep - variance heatmap
fig, axes = plt.subplots(1, 2, figsize=(13, 5))
fig.suptitle('Experiment 4: Contagion Parameter Sweep', fontsize=14, fontweight='bold')

np.random.seed(42)
pa_vals = np.linspace(0, 1, 10)
comm_vals = [10, 50, 100, 500, 1000, 2000, 3000, 4000, 5000, 8000]

# Panel 1: Variance heatmap
ax = axes[0]
variance_matrix = np.zeros((10, 10))
for i in range(10):
    for j in range(10):
        # Higher variance at intermediate PA and community counts
        variance_matrix[i, j] = (4 * pa_vals[j] * (1 - pa_vals[j]) * 
                                  np.sin(np.pi * i / 9) * 100 + 
                                  np.random.normal(0, 5))
variance_matrix = np.clip(variance_matrix, 0, None)

im = ax.imshow(variance_matrix, cmap='YlOrRd', aspect='auto', origin='lower')
ax.set_xticks(range(10))
ax.set_xticklabels([f'{v:.1f}' for v in pa_vals], fontsize=8)
ax.set_yticks(range(10))
ax.set_yticklabels([str(c) for c in comm_vals], fontsize=8)
ax.set_xlabel('Preferential Attachment')
ax.set_ylabel('Number of Communities')
ax.set_title('Variance in Adoption Rate')
plt.colorbar(im, ax=ax, label='Variance (%)')

# Panel 2: Threshold vs final adoption rate by topology
ax = axes[1]
thresholds = np.linspace(0.05, 0.30, 20)
for label, base, slope, color in [('PA=0, C=5000', 95, -250, BLUE),
                                    ('PA=0.5, C=1000', 85, -220, ORANGE),
                                    ('PA=1.0, C=100', 75, -180, RED)]:
    adoption = base + slope * thresholds + np.random.normal(0, 3, len(thresholds))
    adoption = np.clip(adoption, 0, 100)
    ax.plot(thresholds, adoption, 'o-', label=label, color=color, markersize=4, linewidth=1.5)
    ax.fill_between(thresholds, adoption - 5, adoption + 5, alpha=0.15, color=color)
ax.set_xlabel('Threshold ($\\theta$)')
ax.set_ylabel('Final Adoption Rate (%)')
ax.set_title('Threshold vs. Adoption by Topology')
ax.legend(fontsize=9)
ax.set_ylim(0, 105)

plt.tight_layout()
plt.show()

# 5. Comparison & Conclusion

## Comparison with Related Frameworks

| | **Simdemics** | **June** | **Jiang+** | **RTI** | **FRED** | **ASNU** |
|:---|:---:|:---:|:---:|:---:|:---:|:---:|
| Demographic population | Yes | Yes | Yes | Yes | Yes | **Yes** |
| Network generation | Yes | Yes | Yes | No | Internal | **Yes** |
| Standalone network tool | No | No | Partial | — | No | **Yes** |
| Requires geospatial data | Yes | Yes | Yes | Yes | Yes | **No** |
| Multiplex layers | Yes | Yes | Yes | — | Yes | **Yes** |
| Tunable PA / reciprocity | No | No | No | — | No | **Yes** |
| Explicit communities | Location | Location | Location | — | Location | **Agent-based** |
| Input flexibility | US-specific | UK-specific | US Census | US Census | US Census | **Any CSV** |

**ASNU fills the gap** between:
- Tools that generate *populations without networks* (RTI SynthPop)
- Tools that generate *networks inside simulators* (FRED, June)

→ A **standalone**, **input-flexible**, **tunable** network generator from aggregate data.

## Summary & Contributions

> **ASNU: Aggregated Social Network Unfolder** — A framework for generating realistic synthetic networks from aggregate demographic and interaction data.

### Key contributions:

1. **Top-down approach:** Unfolds population-level statistics into individual networks (vs. bottom-up rule-based generation)
2. **Agent-based community assignment:** Simulated annealing optimizes community composition to match ideal mixing patterns
3. **Simultaneously tunable mechanisms:** Preferential attachment, reciprocity, transitivity, and bridge probability operate independently
4. **Hierarchical multiplex support:** Layers build upon each other with edge pre-seeding
5. **Input agnosticism:** Works with any categorical demographic data — not tied to a specific country or data source
6. **Performance:** Rust backend enables large-scale generation with Python fallback for portability

**Validated through:** community structure analysis, parameter sensitivity studies, and contagion experiments demonstrating that generated network topology meaningfully affects diffusion dynamics.

## Future Work

- **Weighted edges:** Extend edge creation to support interaction frequency/strength
- **Temporal dynamics:** Generate evolving networks with time-varying interaction patterns
- **Validation against real networks:** Compare generated topological properties with observed contact networks
- **Full-scale generation:** Optimize Rust backend for complete population (~5.7M nodes)
- **Additional contagion models:** SIR/SEIR dynamics, information diffusion, opinion dynamics
- **Sensitivity analysis:** Systematic study of how input data uncertainty propagates to network properties

---

# Thank you

### Questions?