Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 26 additions & 20 deletions .agent-plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,44 +6,50 @@

## Current System State

**v0.2.0 in progress.** Typed `Recipe` model, `GenerationConfig` with full validation, config
precedence system, `RNGRoot` with deterministic substreams, `Generator.from_recipe()` fully
implemented, `core/hashing.py`, `core/serialization.py`, and recipe narrative/difficulty-profile
assets for `b2b_saas_procurement_v1`. 59 tests passing.
**v0.2.0 in progress — Milestone 2 complete (PR open).** Typed `NarrativeSpec` hierarchy, `WorldSpec`
with narrative field, `Generator.from_recipe()` populates `world_spec`, dataset card renderer, and
full test coverage. 110 tests passing.

---

## Active Task Breakdown — Milestone 2: Narrative Layer (v0.2.0 cont.)
## Active Task Breakdown — Milestone 3: Schema Layer (v0.2.0 cont.)

Goal: Build the concrete company/product/market story objects that anchor all later simulation.
Goal: Define the relational entity schema (accounts, contacts, leads, etc.) and feature dictionary.

- [ ] **1. Narrative models**
- Implement typed dataclasses in `narrative/`: `CompanySpec`, `ProductSpec`, `MarketSpec`,
`PersonaSpec`, `FunnelSpec`
- Loader: parse `narrative.yaml` into these models with validation
- [ ] **1. Entity schema**
- Implement `schema/entities.py`: typed dataclasses for `Account`, `Contact`, `Lead`
- Implement `schema/events.py`: `Touch`, `SalesActivity`, `Opportunity` etc.

- [ ] **2. WorldSpec population**
- Flesh out `WorldSpec` to hold a resolved `NarrativeSpec`
- Wire into `Generator.from_recipe()` so `gen.world_spec` is populated after construction
- [ ] **2. Feature dictionary**
- Implement `schema/features.py` + `schema/dictionaries.py`
- Generate `feature_dictionary.csv` stub

- [ ] **3. Dataset card generation**
- Implement `narrative/dataset_card.py`: render a Markdown dataset card from `WorldSpec`
- Tests: round-trip model → YAML → model, dataset-card text contains expected fields
- [ ] **3. Task schema**
- Implement `schema/tasks.py`: `converted_within_90_days` task manifest structure

---

## Context Pointers

- Milestone 2 scope: `docs/leadforge_implementation_plan.md` §5 "Milestone 2"
- Milestone 3 scope: `docs/leadforge_implementation_plan.md` §6 "Milestone 3"
- Full milestone dependency graph: `docs/leadforge_implementation_plan.md` §6
- Narrative spec: `docs/leadforge_architecture_spec.md` §7
- Recipe assets: `leadforge/recipes/b2b_saas_procurement_v1/narrative.yaml`
- Schema spec: `docs/leadforge_architecture_spec.md` §8
- Recipe assets: `leadforge/recipes/b2b_saas_procurement_v1/`

---

## Completed Phases

### Milestone 1 — Canonical Config, Recipe & Model Objects ✓ (v0.2.0 in PR)
### Milestone 2 — Narrative Layer ✓ (v0.2.0 in PR)
- `leadforge/narrative/spec.py`: frozen dataclasses `NarrativeSpec`, `CompanySpec`, `ProductSpec`,
`MarketSpec`, `GtmMotionSpec`, `PersonaSpec`, `FunnelStageSpec` — all with validated `from_dict()`
- `leadforge/narrative/dataset_card.py`: `render_dataset_card(world_spec)` — Markdown card
- `leadforge/core/models.py`: `WorldSpec` gets `narrative: NarrativeSpec | None` field
- `leadforge/api/generator.py`: `world_spec` property; `from_recipe()` resolves narrative into
`WorldSpec`
- 51 new tests (spec validation, dataset card, Generator integration); total 110 passing

### Milestone 1 — Canonical Config, Recipe & Model Objects ✓ (v0.2.0 merged)
- `leadforge/core/rng.py`: `RNGRoot` with SHA-256-derived named substreams
- `leadforge/core/hashing.py`: `hash_config()` — stable SHA-256 digest of `GenerationConfig`
- `leadforge/core/serialization.py`: `load_yaml`, `load_json`, `dump_json`
Expand Down
31 changes: 21 additions & 10 deletions leadforge/api/generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from typing import Any

from leadforge.core.enums import DifficultyProfile, ExposureMode
from leadforge.core.models import GenerationConfig, WorldBundle
from leadforge.core.models import GenerationConfig, WorldBundle, WorldSpec
from leadforge.core.rng import RNGRoot
from leadforge.core.sentinels import _MISSING

Expand All @@ -23,17 +23,22 @@ class Generator:
bundle = gen.generate(n_leads=5000, difficulty="intermediate")
bundle.save("./out/demo_bundle")

``from_recipe`` is implemented in Milestone 1. Full generation
(``generate``) is implemented across Milestones 2–9.
``from_recipe`` is implemented in Milestone 1–2. Full generation
(``generate``) is implemented across Milestones 3–9.
"""

def __init__(self, config: GenerationConfig) -> None:
self._config = config
self._rng = RNGRoot(config.seed)
def __init__(self, world_spec: WorldSpec) -> None:
self._world_spec = world_spec
self._rng = RNGRoot(world_spec.config.seed)

@property
def config(self) -> GenerationConfig:
return self._config
return self._world_spec.config

@property
def world_spec(self) -> WorldSpec:
"""The resolved world specification, including narrative."""
return self._world_spec

@classmethod
def from_recipe(
Expand Down Expand Up @@ -69,15 +74,16 @@ def from_recipe(
Applied after recipe defaults but before explicit kwargs.

Returns:
A configured :class:`Generator` instance ready to call
:meth:`generate` on.
A configured :class:`Generator` with a populated
:attr:`world_spec` (narrative resolved from the recipe).

Raises:
:class:`~leadforge.core.exceptions.InvalidRecipeError`: if the
recipe does not exist, is malformed, or the requested
exposure mode / difficulty is not supported.
"""
from leadforge.api.recipes import Recipe
from leadforge.narrative.spec import NarrativeSpec
from leadforge.recipes.registry import load_recipe

raw = load_recipe(recipe_id)
Expand All @@ -93,7 +99,12 @@ def from_recipe(
output_path=output_path,
override=override,
)
return cls(config)

narrative_data = recipe.load_narrative()
narrative = NarrativeSpec.from_dict(narrative_data) if narrative_data else None
world_spec = WorldSpec(config=config, narrative=narrative)

return cls(world_spec)

def generate(
self,
Expand Down
10 changes: 8 additions & 2 deletions leadforge/core/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,15 @@
from __future__ import annotations

from dataclasses import dataclass, field
from typing import Any
from typing import TYPE_CHECKING, Any

from leadforge.core.enums import DifficultyProfile, ExposureMode
from leadforge.core.exceptions import InvalidConfigError
from leadforge.version import __version__

if TYPE_CHECKING:
from leadforge.narrative.spec import NarrativeSpec


def _require_positive_int(value: Any, name: str) -> None:
"""Raise ``InvalidConfigError`` unless *value* is a positive plain ``int``.
Expand Down Expand Up @@ -74,10 +77,13 @@ def __post_init__(self) -> None:
class WorldSpec:
"""Fully instantiated hidden world specification (post-sampling, pre-simulation).

Populated in Milestone 2 (narrative/schema) through Milestone 6 (mechanisms).
Populated incrementally across milestones:
- M2: config + narrative
- M3–M6: schema, structure, mechanisms
"""

config: GenerationConfig = field(default_factory=GenerationConfig)
narrative: NarrativeSpec | None = None


@dataclass
Expand Down
157 changes: 157 additions & 0 deletions leadforge/narrative/dataset_card.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
"""Dataset card renderer.

Produces the ``dataset_card.md`` artifact from a :class:`WorldSpec`.
The card follows the structure required by the architecture spec (§14.3).
"""

from __future__ import annotations

from typing import TYPE_CHECKING

if TYPE_CHECKING:
from leadforge.core.models import WorldSpec


def render_dataset_card(world_spec: WorldSpec) -> str:
"""Return a Markdown dataset card string for *world_spec*.

Sections present at all milestones:
- Header (recipe id, version, seed, exposure mode)
- Narrative summary (company, product, market, GTM)
- Primary task and label definition
- Suggested use cases
- Caveats

Sections populated in later milestones (rendered as stubs here):
- Table inventory
- Feature categories
"""
cfg = world_spec.config
narrative = world_spec.narrative

lines: list[str] = []

# ------------------------------------------------------------------
# Header
# ------------------------------------------------------------------
lines += [
"# leadforge dataset card",
"",
"| Field | Value |",
"|---|---|",
f"| Recipe | `{cfg.recipe_id}` |",
f"| Package version | `{cfg.package_version}` |",
f"| Seed | `{cfg.seed}` |",
f"| Exposure mode | `{cfg.exposure_mode}` |",
f"| Difficulty | `{cfg.difficulty}` |",
f"| Horizon | {cfg.horizon_days} days |",
"",
]

# ------------------------------------------------------------------
# Narrative summary
# ------------------------------------------------------------------
lines.append("## Narrative summary")
lines.append("")
if narrative is not None:
c = narrative.company
p = narrative.product
m = narrative.market
gtm = narrative.gtm_motion
lines += [
f"**Vendor:** {c.name} ({c.stage}, founded {c.founded_year},"
f" {c.hq_city}, {c.hq_country})",
"",
f"**Product:** {p.name} — {p.category}. "
f"Deployment: {p.deployment}. "
f"Pricing: {p.pricing_model}. "
f"ACV range: ${p.acv_range_usd[0]:,}–${p.acv_range_usd[1]:,}.",
"",
f"**Target market:** {m.icp_employee_range[0]}–{m.icp_employee_range[1]}-employee"
f" firms in {', '.join(m.geographies)}. "
f"Key industries: {', '.join(m.icp_industries)}. "
f"Average deal size: ${m.avg_deal_size_usd:,}. "
f"Average sales cycle: {m.avg_sales_cycle_days} days.",
"",
f"**GTM motion:** {', '.join(gtm.channels)} "
f"({gtm.inbound_share:.0%} inbound / "
f"{gtm.outbound_share:.0%} outbound / "
f"{gtm.partner_share:.0%} partner).",
"",
"**Buyer personas:**",
"",
]
for persona in narrative.personas:
ellipsis = "…" if len(persona.title_variants) > 2 else ""
lines.append(
f"- **{persona.role}** ({persona.decision_authority}) — "
f"{', '.join(persona.title_variants[:2])}{ellipsis}"
)
lines.append("")
else:
lines += ["*Narrative unavailable for this dataset.*", ""]

# ------------------------------------------------------------------
# Primary task
# ------------------------------------------------------------------
lines += [
"## Primary task",
"",
"**Task:** `converted_within_90_days`",
"",
"**Label definition:** A lead is considered converted if a `closed_won` event "
"is recorded within 90 days of the lead's snapshot anchor date. "
Comment thread
shaypal5 marked this conversation as resolved.
"The label is derived from simulated events — it is never sampled directly.",
"",
]

# ------------------------------------------------------------------
# Table inventory (stub — populated in later milestones)
# ------------------------------------------------------------------
lines += [
"## Table inventory",
"",
"*Table counts will appear here once the simulation layer is implemented (v0.3.0+).*",
"",
]

# ------------------------------------------------------------------
# Feature categories (stub)
# ------------------------------------------------------------------
lines += [
"## Feature categories",
"",
"*Feature dictionary will appear here once the schema layer is implemented (v0.3.0+).*",
"",
]

# ------------------------------------------------------------------
# Suggested use cases
# ------------------------------------------------------------------
lines += [
"## Suggested use cases",
"",
"- Teaching binary classification on realistic CRM data",
"- Portfolio projects demonstrating end-to-end ML pipelines",
"- Benchmarking lead-scoring models under controlled signal/noise conditions",
"- Research on causal structure in funnel conversion data",
"",
]

# ------------------------------------------------------------------
# Caveats
# ------------------------------------------------------------------
lines += [
"## Caveats",
"",
"- This is **synthetic** data. It does not represent any real company, product, or market.",
"- The hidden world structure varies by motif family and stochastic rewiring; "
"no two seeds produce the same DGP.",
"- Features are anchored at the snapshot date. No post-anchor data is "
"included (leakage-free by construction).",
"- In `student_public` mode, the latent world graph, mechanism summary, "
"and full world spec are withheld.",
"",
]

return "\n".join(lines)
Loading
Loading