Buyer-facing database of franchise financial performance data, sourced from publicly filed Franchise Disclosure Documents (FDDs). Monetized via affiliate referrals (franchise consultants + SBA preferred lenders) and (later) display ads.
Live site: https://josh-max2.github.io/Parser/ (after Pages is enabled — see below)
The repo and the generated docs/ folder are already in place. To make the site live:
- Go to https://github.com/josh-max2/Parser/settings/pages
- Under "Build and deployment" → Source, pick "Deploy from a branch"
- Set Branch to
mainand folder to/docs, then click Save
GitHub will build and publish within 1–2 minutes. The live URL will be:
https://josh-max2.github.io/Parser/
Once live, every git push to main rebuilds the site automatically.
parser/
├── README.md # This file
├── HANDOFF.md # Full project state, decision log, validation matrix
├── fdd_tool_build_spec.md # Original build plan
├── fdd-tool/ # Python pipeline (extraction + scraper + DB + site gen)
│ ├── src/
│ │ ├── pdf_utils.py # PDF text + section finder + cover detector
│ │ ├── prompts.py # Claude extraction prompts
│ │ ├── claude_client.py # Anthropic SDK wrapper
│ │ ├── extract.py # PDF -> 6 JSON files
│ │ ├── db.py # SQLite schema
│ │ ├── site_gen.py # SQLite -> docs/ HTML
│ │ ├── scrapers/wisconsin.py # WI DFI Playwright scraper
│ │ └── templates/ # Jinja2 templates (brand, index, category, about)
│ ├── scripts/ # One-off validation + ingest scripts
│ ├── output/ # 25 FDD extractions as JSON (committed — facts, not text)
│ └── data/ # Source PDFs (gitignored — copyrighted)
└── docs/ # Generated static site (served by GitHub Pages)
├── index.html
├── about/
├── franchise/{slug}/
└── category/{slug}/
cd fdd-tool
# Install deps
uv sync
uv run playwright install chromium # only if running scraper
# Re-ingest existing JSONs into SQLite
uv run python scripts/ingest_outputs.py
# Re-generate the static site
uv run python -m src.site_gen
# Commit + push to rebuild the live site
git add docs/
git commit -m "Regenerate site"
git push- Phase 0 (extraction validation) — DONE on 4 FDDs across diverse failure modes
- Phase 1 (WI scraper + 20-brand home-services pilot) — DONE, $6.50 total API spend
- Phase 2 (SQLite + static site) — DONE (this commit)
- Phase 3 (live, indexed, affiliate-monetized) — pending GitHub Pages enablement + affiliate program applications
See HANDOFF.md for the full state and decision log.
- $0.30/FDD avg API spend at current Sonnet 4.6 pricing
- $0 hosting (GitHub Pages)
- $0 domain (uses
josh-max2.github.io/Parser/until you wire up a custom domain) - Estimated $565 to ingest the full 1,879-brand Wisconsin corpus (deferred until storefront is live)