Skip to content

v1.0.0 — Winter 2024 → Winter 2026

Choose a tag to compare

@sjmoran sjmoran released this 19 Apr 08:27
· 2 commits to main since this release

First public release. Reproducible analysis pipeline covering every YC company from Winter 2024 through Winter 2026.

Scope

  • 1,014 companies across 6 cohorts (W24, W25, Sp25, Su25, F25, W26)
  • 924 AI-focused companies classified on 5 axes: end market × product layer × AI pattern × buyer × wedge archetype
  • ~25 binary signals per company (mentions_agents, regulated_industry, services_angle, etc.)
  • Founder-background signals extracted from bios (ex-big-tech, research, operator, domain expert, repeat founder, enterprise, regulated)

Headline findings

  1. AI is the default at YC. 86% → 93% cohort share.
  2. Vertical positioning rose 59% → 71%; horizontal fell 15% → 9%.
  3. The RAG era ended quietly. rag -7.8pp, retrieval -6.3pp, fine-tuning -2.9pp, copilot -5.4pp.
  4. Agents are real but partly rhetorical. Term agent +16.2pp vs classifier Autonomous Agent pattern +11.8pp — ~4pp is re-labeled copilots.
  5. AI-native service firms are emerging as a credible wedge. Replaces-outsourced-labor archetype +3.5pp.
  6. Compliance and audit are quietly central. audit +7.9pp, EU AI Act / regulation +6.2pp.
  7. Robotics / Embodied AI +9.4pp — larger than the discourse suggests.

What's in the release

  • src/ — scraper, classifier, founder analysis, trend analysis, visualisation, report writer
  • config/ — taxonomy, keywords, cohorts (all YAML; tune by editing)
  • data/processed/raw_companies.json — the full merged scrape (9 MB); ships with the release for one-command reproduction
  • outputs/charts/ — 16 charts
  • outputs/tables/ — 25 CSVs (cohort metrics, axis shares, deltas, emerging, term freqs, founder signals, wedge analysis)
  • outputs/analysis_summary.md — full narrative
  • outputs/key_findings.md — one-page bullet summary
  • outputs/MEDIUM_ARTICLE.md — long-form article

Reproduce

git clone https://github.com/sjmoran/yc-ai-cohort-analysis.git
cd yc-ai-cohort-analysis
pip install -r requirements.txt
python main.py --skip-scrape    # ~20s using shipped dataset
# or
python main.py --no-cache       # ~30s, full fresh scrape

Every classification carries the keyword hits that fired it. Every number in the narrative is traceable to a CSV. The whole thing is deterministic and inspectable.