leadforge

Opinionated framework for generating synthetic CRM and GTM datasets from simulated commercial worlds.

leadforge generates narrative-grounded synthetic revenue datasets — starting with lead scoring — designed for teaching, portfolio projects, and research. Rather than sampling rows from a distribution, it simulates a commercial world: a specific company, selling a specific product, to a specific kind of buyer, and renders realistic CRM-style outputs from that world.

Installation

Requires Python 3.11+.

pip install leadforge

Or install directly from GitHub:

pip install git+https://github.com/leadforge-dev/leadforge.git

For development:

git clone https://github.com/leadforge-dev/leadforge.git
cd leadforge
pip install -e ".[dev]"
pre-commit install

Quickstart

CLI

# List available recipes
leadforge list-recipes

# Generate a dataset bundle
leadforge generate \
  --recipe b2b_saas_procurement_v1 \
  --seed 42 \
  --mode student_public \
  --difficulty intermediate \
  --n-leads 5000 \
  --out ./out/demo_bundle

# Inspect bundle metadata
leadforge inspect ./out/demo_bundle

# Validate bundle integrity
leadforge validate ./out/demo_bundle

Python API

from leadforge.api import Generator

gen = Generator.from_recipe(
    "b2b_saas_procurement_v1",
    seed=42,
    exposure_mode="student_public",
)
bundle = gen.generate(n_leads=5000, difficulty="intermediate")
bundle.save("./out/demo_bundle")

Exposure Modes

Control what truth is visible in the output bundle:

Mode	Purpose	Includes
`student_public`	Teaching / portfolio use	Tables, features, task splits, dataset card
`research_instructor`	Full truth for instructors / researchers	All of the above + hidden graph, world spec, latent registry, mechanism summary

Set via --mode on the CLI or exposure_mode= in the Python API.

Difficulty Profiles

Each recipe ships with difficulty profiles that control signal-to-noise ratio:

Profile	Description
`intro`	Strong signal, low noise — good for first-time learners
`intermediate`	Moderate signal, realistic noise
`advanced`	Weak signal, high noise — challenges experienced practitioners

Set via --difficulty on the CLI or difficulty= in generate().

Output Bundle

bundle_root/
  manifest.json            # provenance, row counts, file hashes
  dataset_card.md          # human-readable dataset documentation
  feature_dictionary.csv   # feature names, types, descriptions
  tables/                  # 9 relational Parquet tables
  tasks/
    converted_within_90_days/
      train.parquet
      valid.parquet
      test.parquet
      task_manifest.json
  metadata/                # (research_instructor only) hidden graph, world spec, latents

Key Design Principles

Deterministic: same (recipe, seed, version) → identical output.
Relational-first: 9 normalized tables; flat ML exports are derived.
No external APIs: core generation never requires network access.
Simulation-driven labels: converted_within_90_days emerges from simulated events, not sampled directly.
Leakage-safe: no feature uses events after the snapshot anchor.

Documentation

Development

pip install -e ".[dev]"
pytest                        # run all tests (~800)
ruff check .                  # lint
ruff format .                 # format
mypy leadforge/               # type check
pre-commit run --all-files    # full pre-commit suite

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github/workflows		.github/workflows
docs		docs
lead_scoring_intro		lead_scoring_intro
leadforge		leadforge
scripts		scripts
tests		tests
.agent-plan.md		.agent-plan.md
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
llms.txt		llms.txt
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

leadforge

Installation

Quickstart

CLI

Python API

Exposure Modes

Difficulty Profiles

Output Bundle

Key Design Principles

Documentation

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

leadforge

Installation

Quickstart

CLI

Python API

Exposure Modes

Difficulty Profiles

Output Bundle

Key Design Principles

Documentation

Development

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages