A learning prototype recommender and insights engine for Eduba House, a connection venture (book clubs, founder talks, workshops, gatherings) built on a 5-layer Connection Framework: Trust → Productive Friction → The Container → Repeat Architecture → Scale Paradox.
One model, three jobs:
- People matching — given a member, suggest other members they're likely to connect with.
- Event suggestions — given a member, suggest upcoming events they'd attend (respecting their relationship stage).
- Management insights — framework-grounded operational views: churn risk, empty rooms, clique traps, scale-collapse risk, topic supply/demand, segments.
# 1. Install
pip install -r requirements.txt
# 2. Generate the synthetic dataset (~4 seconds, seeded, reproducible)
python -m src.generate
# 3. Train and compare all four recommenders end-to-end
python -m src.evaluate
# 4. Run the framework-grounded operational report
python -m src.insightsThat's the whole loop. The four CSVs in data/synthetic/ (members, events, attendance, connections) are regenerated from seed=42 — they're gitignored on purpose.
Precision@k / recall@k across four methods on the same chronological split:
| method | event P@5 | event R@5 | match P@5 | match R@5 |
|---|---|---|---|---|
| popularity (baseline) | 0.018 | 0.041 | 0.022 | 0.043 |
| content (interest cosine + stage gate) | 0.118 | 0.304 | 0.012 | 0.026 |
| collaborative (ALS + Adamic-Adar) | 0.026 | 0.047 | 0.246 | 0.700 |
| hybrid (late-fusion) | 0.118 | 0.304 | 0.245 | 0.699 |
Content owns event recs by 6.5×. Collaborative owns matching by 20×. Hybrid is never worse than either specialist on its native task — one model for both jobs.
Eight named views, all derived from the Connection Framework:
- Churn risk — members active in [T-6, T-3] but silent in [T-3, T]; not Recognition-stage
- Empty Room — high-friction events (level ≥ 3) that didn't half-fill
- Shallow Pool — series with return rate below 40%
- Clique Trap — closed sub-communities (< 5% events shared with outsiders)
- Founder Bottleneck — any host hosting > 30% of attended events
- Scale Collapse — events over capacity, or series whose average attendance exceeds the 20-person small-group cap
- Topic supply vs. demand — under- and over-served interests
- Member segments — KMeans clusters on ALS user-factors, labeled by modal persona
The headline number: ALS recovers 66% of the hidden persona structure without ever seeing persona labels. The embeddings are learning real community structure, not memorizing.
Eduba House isn't a generic events platform — it operates on a deliberate connection-architecture thesis. That thesis is baked into the system:
- Container intimacy is signal. Co-attendance in an 8-seat dinner is worth ~5× co-attendance in a 40-seat panel. The collaborative recommender weights ALS confidence and graph edges by
1/capacity, encoding this directly. - Members have a stage, not just interests. Recognition (0–2 events) → Acquaintance (3–9) → Friendship (10–19) → Community (20+). The content recommender gates high-friction events against the member's current stage so it doesn't push a vulnerability retreat at someone who's done two coffee meetups.
- Insights map to named failure modes. Empty Room, Shallow Pool, Clique Trap, Founder Bottleneck, Scale Collapse — these are the framework's own taxonomy, not generic dashboard metrics.
.
├── CLAUDE.md # full spec — schema, personas, formula, definitions, API contract
├── DEVELOPMENT_NOTES.md # running decision log
├── README.md # this file
├── requirements.txt
├── data/synthetic/ # generated CSVs (gitignored — reproducible from seed)
└── src/
├── config.py # single source of truth: sizes, weights, thresholds
├── generate.py # personas → members → events → attendance → connections
├── evaluate.py # chronological split, edge holdout, P@k / R@k harness
├── insights.py # the eight framework-grounded views
└── methods/
├── base.py # Recommender API contract
├── popularity.py # baseline
├── content.py # interest cosine + stage-friction gate
├── collaborative.py # implicit ALS + Adamic-Adar on intimacy-weighted graph
└── embeddings.py # late-fusion hybrid
For the full specification — schema types, persona definitions, attendance-probability formula, evaluation methodology — see CLAUDE.md. For the running narrative of decisions and what changed when, see DEVELOPMENT_NOTES.md.
Everything tunable lives in src/config.py: dataset size, persona definitions, generator weights, friction levels, insight thresholds. Change a value, re-run python -m src.generate, re-run python -m src.evaluate. The whole loop takes under 10 seconds.
The current settings (1500 members, 500 events, seed 42) produce a dataset where 77% of connection edges link same-persona members vs. 17% chance — the hidden signal recommenders are graded against without ever being shown.
Prototype. Synthetic data only; no real Eduba House data flows through this yet. The methods, eval harness, and insights work end-to-end and produce interpretable results. Next likely directions: a joint-trained PyTorch embedding model to exceed late-fusion, real-data integration once Eduba House has enough operational history, and a UI for the insights output.