Parser — FDD Database

Buyer-facing database of franchise financial performance data, sourced from publicly filed Franchise Disclosure Documents (FDDs). Monetized via affiliate referrals (franchise consultants + SBA preferred lenders) and (later) display ads.

Live site: https://josh-max2.github.io/Parser/ (after Pages is enabled — see below)

How to enable GitHub Pages (3-click setup)

The repo and the generated docs/ folder are already in place. To make the site live:

Go to https://github.com/josh-max2/Parser/settings/pages
Under "Build and deployment" → Source, pick "Deploy from a branch"
Set Branch to main and folder to /docs, then click Save

GitHub will build and publish within 1–2 minutes. The live URL will be:

https://josh-max2.github.io/Parser/

Once live, every git push to main rebuilds the site automatically.

What's in this repo

parser/
├── README.md                      # This file
├── HANDOFF.md                     # Full project state, decision log, validation matrix
├── fdd_tool_build_spec.md         # Original build plan
├── fdd-tool/                      # Python pipeline (extraction + scraper + DB + site gen)
│   ├── src/
│   │   ├── pdf_utils.py           # PDF text + section finder + cover detector
│   │   ├── prompts.py             # Claude extraction prompts
│   │   ├── claude_client.py       # Anthropic SDK wrapper
│   │   ├── extract.py             # PDF -> 6 JSON files
│   │   ├── db.py                  # SQLite schema
│   │   ├── site_gen.py            # SQLite -> docs/ HTML
│   │   ├── scrapers/wisconsin.py  # WI DFI Playwright scraper
│   │   └── templates/             # Jinja2 templates (brand, index, category, about)
│   ├── scripts/                   # One-off validation + ingest scripts
│   ├── output/                    # 25 FDD extractions as JSON (committed — facts, not text)
│   └── data/                      # Source PDFs (gitignored — copyrighted)
└── docs/                          # Generated static site (served by GitHub Pages)
    ├── index.html
    ├── about/
    ├── franchise/{slug}/
    └── category/{slug}/

Local development

cd fdd-tool

# Install deps
uv sync
uv run playwright install chromium  # only if running scraper

# Re-ingest existing JSONs into SQLite
uv run python scripts/ingest_outputs.py

# Re-generate the static site
uv run python -m src.site_gen

# Commit + push to rebuild the live site
git add docs/
git commit -m "Regenerate site"
git push

Status (2026-05-16)

Phase 0 (extraction validation) — DONE on 4 FDDs across diverse failure modes
Phase 1 (WI scraper + 20-brand home-services pilot) — DONE, $6.50 total API spend
Phase 2 (SQLite + static site) — DONE (this commit)
Phase 3 (live, indexed, affiliate-monetized) — pending GitHub Pages enablement + affiliate program applications

See HANDOFF.md for the full state and decision log.

Costs

$0.30/FDD avg API spend at current Sonnet 4.6 pricing
$0 hosting (GitHub Pages)
$0 domain (uses josh-max2.github.io/Parser/ until you wire up a custom domain)
Estimated $565 to ingest the full 1,879-brand Wisconsin corpus (deferred until storefront is live)

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
docs		docs
fdd-tool		fdd-tool
.gitignore		.gitignore
HANDOFF.md		HANDOFF.md
PROJECT_TRACKER.md		PROJECT_TRACKER.md
README.md		README.md
fdd_tool_build_spec.md		fdd_tool_build_spec.md
franchisedepth_features_roadmap.md		franchisedepth_features_roadmap.md
franchisedepth_methodology_v2.md		franchisedepth_methodology_v2.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parser — FDD Database

How to enable GitHub Pages (3-click setup)

What's in this repo

Local development

Status (2026-05-16)

Costs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Parser — FDD Database

How to enable GitHub Pages (3-click setup)

What's in this repo

Local development

Status (2026-05-16)

Costs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages