Skip to content

remicaesar/recommendation-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Eduba House Recommendation Engine

A learning prototype recommender and insights engine for Eduba House, a connection venture (book clubs, founder talks, workshops, gatherings) built on a 5-layer Connection Framework: Trust → Productive Friction → The Container → Repeat Architecture → Scale Paradox.

One model, three jobs:

  1. People matching — given a member, suggest other members they're likely to connect with.
  2. Event suggestions — given a member, suggest upcoming events they'd attend (respecting their relationship stage).
  3. Management insights — framework-grounded operational views: churn risk, empty rooms, clique traps, scale-collapse risk, topic supply/demand, segments.

Quick start

# 1. Install
pip install -r requirements.txt

# 2. Generate the synthetic dataset (~4 seconds, seeded, reproducible)
python -m src.generate

# 3. Train and compare all four recommenders end-to-end
python -m src.evaluate

# 4. Run the framework-grounded operational report
python -m src.insights

That's the whole loop. The four CSVs in data/synthetic/ (members, events, attendance, connections) are regenerated from seed=42 — they're gitignored on purpose.

What you'll see

python -m src.evaluate

Precision@k / recall@k across four methods on the same chronological split:

method event P@5 event R@5 match P@5 match R@5
popularity (baseline) 0.018 0.041 0.022 0.043
content (interest cosine + stage gate) 0.118 0.304 0.012 0.026
collaborative (ALS + Adamic-Adar) 0.026 0.047 0.246 0.700
hybrid (late-fusion) 0.118 0.304 0.245 0.699

Content owns event recs by 6.5×. Collaborative owns matching by 20×. Hybrid is never worse than either specialist on its native task — one model for both jobs.

python -m src.insights

Eight named views, all derived from the Connection Framework:

  • Churn risk — members active in [T-6, T-3] but silent in [T-3, T]; not Recognition-stage
  • Empty Room — high-friction events (level ≥ 3) that didn't half-fill
  • Shallow Pool — series with return rate below 40%
  • Clique Trap — closed sub-communities (< 5% events shared with outsiders)
  • Founder Bottleneck — any host hosting > 30% of attended events
  • Scale Collapse — events over capacity, or series whose average attendance exceeds the 20-person small-group cap
  • Topic supply vs. demand — under- and over-served interests
  • Member segments — KMeans clusters on ALS user-factors, labeled by modal persona

The headline number: ALS recovers 66% of the hidden persona structure without ever seeing persona labels. The embeddings are learning real community structure, not memorizing.

Why this is interesting

Eduba House isn't a generic events platform — it operates on a deliberate connection-architecture thesis. That thesis is baked into the system:

  • Container intimacy is signal. Co-attendance in an 8-seat dinner is worth ~5× co-attendance in a 40-seat panel. The collaborative recommender weights ALS confidence and graph edges by 1/capacity, encoding this directly.
  • Members have a stage, not just interests. Recognition (0–2 events) → Acquaintance (3–9) → Friendship (10–19) → Community (20+). The content recommender gates high-friction events against the member's current stage so it doesn't push a vulnerability retreat at someone who's done two coffee meetups.
  • Insights map to named failure modes. Empty Room, Shallow Pool, Clique Trap, Founder Bottleneck, Scale Collapse — these are the framework's own taxonomy, not generic dashboard metrics.

Project layout

.
├── CLAUDE.md                # full spec — schema, personas, formula, definitions, API contract
├── DEVELOPMENT_NOTES.md     # running decision log
├── README.md                # this file
├── requirements.txt
├── data/synthetic/          # generated CSVs (gitignored — reproducible from seed)
└── src/
    ├── config.py            # single source of truth: sizes, weights, thresholds
    ├── generate.py          # personas → members → events → attendance → connections
    ├── evaluate.py          # chronological split, edge holdout, P@k / R@k harness
    ├── insights.py          # the eight framework-grounded views
    └── methods/
        ├── base.py          # Recommender API contract
        ├── popularity.py    # baseline
        ├── content.py       # interest cosine + stage-friction gate
        ├── collaborative.py # implicit ALS + Adamic-Adar on intimacy-weighted graph
        └── embeddings.py    # late-fusion hybrid

For the full specification — schema types, persona definitions, attendance-probability formula, evaluation methodology — see CLAUDE.md. For the running narrative of decisions and what changed when, see DEVELOPMENT_NOTES.md.

Tweaking the dataset

Everything tunable lives in src/config.py: dataset size, persona definitions, generator weights, friction levels, insight thresholds. Change a value, re-run python -m src.generate, re-run python -m src.evaluate. The whole loop takes under 10 seconds.

The current settings (1500 members, 500 events, seed 42) produce a dataset where 77% of connection edges link same-persona members vs. 17% chance — the hidden signal recommenders are graded against without ever being shown.

Status

Prototype. Synthetic data only; no real Eduba House data flows through this yet. The methods, eval harness, and insights work end-to-end and produce interpretable results. Next likely directions: a joint-trained PyTorch embedding model to exceed late-fusion, real-data integration once Eduba House has enough operational history, and a UI for the insights output.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages