leadforge-dev · shaypal5 · Apr 22, 2026 · Apr 21, 2026 · Apr 21, 2026 · Apr 21, 2026
diff --git a/.agent-plan.md b/.agent-plan.md
@@ -6,41 +6,50 @@
 
 ## Current System State
 
-**v0.2.0 in progress — Milestone 2 complete (PR open).** Typed `NarrativeSpec` hierarchy, `WorldSpec`
-with narrative field, `Generator.from_recipe()` populates `world_spec`, dataset card renderer, and
-full test coverage. 110 tests passing.
+**v0.2.0 in progress — Milestone 3 complete (PR open).** All 9 relational table schemas defined as
+typed row dataclasses with Parquet round-trip support. FK constraints, ID generation, feature
+dictionary, and task manifest implemented. 192 tests passing.
 
 ---
 
-## Active Task Breakdown — Milestone 3: Schema Layer (v0.2.0 cont.)
+## Active Task Breakdown — Milestone 4: World Structure (v0.3.0)
 
-Goal: Define the relational entity schema (accounts, contacts, leads, etc.) and feature dictionary.
+Goal: Implement the hidden world graph — DAG of latent nodes, motif families, and stochastic rewiring.
 
-- [ ] **1. Entity schema**
-  - Implement `schema/entities.py`: typed dataclasses for `Account`, `Contact`, `Lead`
-  - Implement `schema/events.py`: `Touch`, `SalesActivity`, `Opportunity` etc.
-
-- [ ] **2. Feature dictionary**
-  - Implement `schema/features.py` + `schema/dictionaries.py`
-  - Generate `feature_dictionary.csv` stub
-
-- [ ] **3. Task schema**
-  - Implement `schema/tasks.py`: `converted_within_90_days` task manifest structure
+- [ ] **1. Node type system** (`structure/node_types.py`)
+- [ ] **2. World graph** (`structure/graph.py`) — `networkx.DiGraph`, DAG validation
+- [ ] **3. Motif families** (`structure/motifs.py`, `structure/templates.py`) — 5 v1 families
+- [ ] **4. Stochastic rewiring** (`structure/rewiring.py`) — seeded perturbation
+- [ ] **5. Sampler** (`structure/sampler.py`) — draw a world graph from a motif + config
 
 ---
 
 ## Context Pointers
 
-- Milestone 3 scope: `docs/leadforge_implementation_plan.md` §6 "Milestone 3"
+- Milestone 4 scope: `docs/leadforge_implementation_plan.md` §7 "Milestone 4"
 - Full milestone dependency graph: `docs/leadforge_implementation_plan.md` §6
-- Schema spec: `docs/leadforge_architecture_spec.md` §8
-- Recipe assets: `leadforge/recipes/b2b_saas_procurement_v1/`
+- Structure spec: `docs/leadforge_architecture_spec.md` §11
+- Motif families: `docs/leadforge_architecture_spec.md` §11.2
 
 ---
 
 ## Completed Phases
 
-### Milestone 2 — Narrative Layer ✓ (v0.2.0 in PR)
+### Milestone 3 — Schema Layer ✓ (v0.2.0 in PR)
+- `leadforge/core/ids.py`: `make_id(prefix, n)` + `ID_PREFIXES` registry
+- `leadforge/schema/entities.py`: typed row dataclasses for all 9 tables (accounts, contacts,
+  leads, touches, sessions, sales_activities, opportunities, customers, subscriptions) with
+  `DTYPE_MAP`, `to_dict()`, `empty_dataframe()`, Parquet round-trip via `schema/tables.py`
+- `leadforge/schema/relationships.py`: `FKConstraint`, `ALL_CONSTRAINTS` (10 FK edges),
+  `validate_fk()` helper raising `FKViolationError`
+- `leadforge/schema/features.py`: `FeatureSpec` frozen dataclass + `LEAD_SNAPSHOT_FEATURES`
+  (29 features, one target)
+- `leadforge/schema/dictionaries.py`: `feature_dictionary_df()` + `write_feature_dictionary()`
+- `leadforge/schema/tasks.py`: `SplitSpec`, `TaskManifest`, `CONVERTED_WITHIN_90_DAYS` constant
+- `pyproject.toml`: added pandas≥2.0 + pyarrow≥14.0 as core deps; mypy overrides for both
+- 82 new tests; total 192 passing
+
+### Milestone 2 — Narrative Layer ✓ (v0.2.0 merged)
 - `leadforge/narrative/spec.py`: frozen dataclasses `NarrativeSpec`, `CompanySpec`, `ProductSpec`,
   `MarketSpec`, `GtmMotionSpec`, `PersonaSpec`, `FunnelStageSpec` — all with validated `from_dict()`
 - `leadforge/narrative/dataset_card.py`: `render_dataset_card(world_spec)` — Markdown card

diff --git a/.github/workflows/pr-agent-context-refresh-dispatcher.yml b/.github/workflows/pr-agent-context-refresh-dispatcher.yml
@@ -0,0 +1,176 @@
+name: PR agent context refresh dispatcher
+
+# Runs on a schedule and dispatches pr-agent-context-refresh for any open
+# same-repo PR that had recent review activity but no corresponding in-flight
+# or recently-succeeded refresh run.
+#
+# WHY THIS EXISTS
+# ---------------
+# When a bot (e.g. Copilot / copilot-pull-request-reviewer[bot]) submits a
+# review, the pull_request_review / pull_request_review_comment events fire
+# and trigger pr-agent-context-refresh — but the triggered run is immediately
+# blocked by GitHub's approval gate for bot/external actors:
+#
+#   • conclusion=startup_failure → workflow could not start (counts as blocked)
+#   • conclusion=action_required → was approval-gated, later auto-cancelled
+#
+# None of those outcomes produce a refresh comment.  This dispatcher fires
+# from the default branch (where it is active), bypasses the approval gate
+# by using the schedule's GITHUB_TOKEN, and dispatches a workflow_dispatch
+# run that executes with full repo permissions.
+#
+# DEDUPE CONTRACT
+# ---------------
+# A dispatch is suppressed only when there is already meaningful coverage for
+# the PR's current head SHA:
+#   • run is in_progress, queued, waiting, or requested  →  suppress
+#   • run completed with conclusion=success or =neutral recently  →  suppress
+#
+# Blocked / non-executed conclusions (startup_failure, action_required,
+# failure, cancelled, timed_out, skipped) do NOT count as coverage and do
+# NOT suppress a fallback dispatch.
+
+on:
+  schedule:
+    # Every 15 minutes, all day every day.
+    # Bot reviews can arrive at any hour; restrict to business hours only if
+    # cost is a concern (e.g. '*/15 7-23 * * 1-5' for Mon-Fri 07-23 UTC).
+    - cron: '*/15 * * * *'
+
+permissions:
+  actions: write
+  pull-requests: read
+
+jobs:
+  dispatch:
+    name: Dispatch stalled PR agent context refreshes
+    runs-on: ubuntu-latest
+    steps:
+      - name: Find and dispatch pending refreshes
+        uses: actions/github-script@v7
+        with:
+          script: |
+            const REFRESH_WORKFLOW = 'pr-agent-context-refresh.yml';
+
+            // Only look at review activity in the last N minutes.
+            const LOOKBACK_MINUTES = 20;
+            // A successfully-completed run within this window suppresses redispatch.
+            const RECENT_SUCCESS_WINDOW_MINUTES = 10;
+
+            // Conclusions that mean the run was BLOCKED and never produced a
+            // refresh comment.  These must NOT suppress a fallback dispatch.
+            const BLOCKED_CONCLUSIONS = new Set([
+              'startup_failure',
+              'action_required',
+              'failure',
+              'cancelled',
+              'timed_out',
+              'skipped',
+            ]);
+
+            const now = Date.now();
+            const since = new Date(now - LOOKBACK_MINUTES * 60 * 1000).toISOString();
+            const recentSuccessSince = new Date(
+              now - RECENT_SUCCESS_WINDOW_MINUTES * 60 * 1000
+            ).toISOString();
+
+            const defaultBranch = context.payload.repository.default_branch;
+
+            // List ALL open PRs in this repository via pagination (same-repo only).
+            const pulls = await github.paginate(github.rest.pulls.list, {
+              ...context.repo,
+              state: 'open',
+              per_page: 100,
+            });
+
+            for (const pr of pulls) {
+              // Same-repo guard: skip forks.
+              if (pr.head.repo.full_name !== context.payload.repository.full_name) {
+                continue;
+              }
+
+              try {
+                // --- Bounded recent activity check ---
+                const [{ data: reviews }, { data: reviewComments }] = await Promise.all([
+                  github.rest.pulls.listReviews({
+                    ...context.repo,
+                    pull_number: pr.number,
+                    per_page: 10,
+                  }),
+                  github.rest.pulls.listReviewComments({
+                    ...context.repo,
+                    pull_number: pr.number,
+                    per_page: 10,
+                  }),
+                ]);
+
+                const hasRecentActivity =
+                  reviews.some((r) => r.submitted_at >= since) ||
+                  reviewComments.some(
+                    (c) => c.created_at >= since || c.updated_at >= since
+                  );
+
+                if (!hasRecentActivity) continue;
+
+                // --- In-flight / recent-success dedupe ---
+                // Fetch runs for this exact head SHA so stale runs from
+                // earlier commits don't suppress dispatch for the new SHA.
+                const { data: { workflow_runs: runs } } =
+                  await github.rest.actions.listWorkflowRunsForWorkflow({
+                    ...context.repo,
+                    workflow_id: REFRESH_WORKFLOW,
+                    head_sha: pr.head.sha,
+                    per_page: 10,
+                  });
+
+                const hasValidCoverage = runs.some((r) => {
+                  // Actively working toward a refresh — don't interrupt.
+                  if (
+                    r.status === 'in_progress' ||
+                    r.status === 'queued' ||
+                    r.status === 'waiting' ||
+                    r.status === 'requested'
+                  ) {
+                    return true;
+                  }
+
+                  // Completed: only suppress if the run actually succeeded
+                  // recently.  Blocked / failed conclusions are transparent.
+                  if (r.status === 'completed') {
+                    if (BLOCKED_CONCLUSIONS.has(r.conclusion)) return false;
+                    return (
+                      (r.conclusion === 'success' || r.conclusion === 'neutral') &&
+                      r.updated_at >= recentSuccessSince
+                    );
+                  }
+
+                  return false;
+                });
+
+                if (hasValidCoverage) {
+                  console.log(
+                    `PR #${pr.number}: valid refresh already running or recently succeeded — skipping.`
+                  );
+                  continue;
+                }
+
+                // --- Dispatch ---
+                await github.rest.actions.createWorkflowDispatch({
+                  ...context.repo,
+                  workflow_id: REFRESH_WORKFLOW,
+                  ref: defaultBranch,
+                  inputs: {
+                    pull_request_number: String(pr.number),
+                    pull_request_head_sha: pr.head.sha,
+                    pull_request_base_sha: pr.base.sha,
+                  },
+                });
+
+                console.log(
+                  `Dispatched refresh for PR #${pr.number} (head: ${pr.head.sha}).`
+                );
+              } catch (err) {
+                // Per-PR error isolation: log and continue to the next PR.
+                console.error(`PR #${pr.number}: dispatch failed — ${err.message}`);
+              }
+            }
diff --git a/.github/workflows/pr-agent-context-refresh.yml b/.github/workflows/pr-agent-context-refresh.yml
@@ -7,10 +7,28 @@ on:
     types: [created, edited, deleted]
   check_run:
     types: [completed]
+  workflow_dispatch:
+    inputs:
+      pull_request_number:
+        description: PR number to refresh
+        required: true
+        type: string
+      pull_request_head_sha:
+        description: Head SHA of the PR
+        required: true
+        type: string
+      pull_request_base_sha:
+        description: Base SHA of the PR
+        required: true
+        type: string
 
+# SHA-aware concurrency: workflow_dispatch runs key on PR+SHA so same-PR/different-SHA
+# dispatches are not cancelled, but duplicate dispatches for the same PR+SHA are.
 concurrency:
   group: >-
     pr-agent-context-refresh-${{
+      (github.event_name == 'workflow_dispatch' &&
+       format('{0}-{1}', github.event.inputs.pull_request_number, github.event.inputs.pull_request_head_sha)) ||
       github.event.pull_request.number ||
       github.event.check_run.pull_requests[0].number ||
       github.event.check_run.head_sha ||
@@ -27,6 +45,7 @@ jobs:
   pr-agent-context-refresh:
     name: PR agent context refresh
     if: >-
+      github.event_name == 'workflow_dispatch' ||
       (github.event_name == 'pull_request_review' &&
        github.event.pull_request.head.repo.full_name == github.repository) ||
       (github.event_name == 'pull_request_review_comment' &&
@@ -52,3 +71,6 @@ jobs:
       wait_for_reviews_to_settle: true
       publish_all_clear_comments_in_refresh: false
       debug_artifacts: true
+      pull_request_number: ${{ inputs.pull_request_number || '' }}
+      pull_request_head_sha: ${{ inputs.pull_request_head_sha || '' }}
+      pull_request_base_sha: ${{ inputs.pull_request_base_sha || '' }}
diff --git a/leadforge/core/hashing.py b/leadforge/core/hashing.py
@@ -18,7 +18,7 @@ def _canonical(obj: Any) -> Any:
     """Recursively convert to a JSON-stable form (sorted keys, enums → str)."""
     if isinstance(obj, dict):
         return {k: _canonical(v) for k, v in sorted(obj.items())}
-    if isinstance(obj, (list, tuple)):
+    if isinstance(obj, (list, tuple)):  # noqa: UP038
         return [_canonical(v) for v in obj]
     # StrEnum values are already strings; this handles plain Enum too
     if hasattr(obj, "value"):

diff --git a/leadforge/core/ids.py b/leadforge/core/ids.py
@@ -1,16 +1,63 @@
 """Entity ID generation.
 
-Implemented in Milestone 3. All IDs must be stable, opaque, namespace-unique,
-and deterministic for a given run.
-
-Canonical prefixes:
-    acct_   — Account
-    cnt_    — Contact
-    lead_   — Lead
-    touch_  — Touch
-    sess_   — Session
-    act_    — SalesActivity
-    opp_    — Opportunity
-    cust_   — Customer
-    sub_    — Subscription
+All IDs are stable, opaque, namespace-unique, and deterministic for a given
+(recipe, config, seed) triple.  Callers derive a dedicated RNG substream via
+``RNGRoot.child()`` and pass a monotonically increasing counter to
+:func:`make_id`.
+
+Canonical prefixes
+------------------
+The following nine prefixes correspond directly to the nine relational tables
+defined in ``schema/entities.py``:
+
+acct_   — Account
+cnt_    — Contact
+lead_   — Lead
+touch_  — Touch
+sess_   — Session
+act_    — SalesActivity
+opp_    — Opportunity
+cust_   — Customer
+sub_    — Subscription
+
+The ``rep_`` prefix is an internal-only namespace used for sales-rep entities
+that participate in simulation mechanics but do **not** have a corresponding
+standalone relational table in the v1 output bundle.
 """
+
+from __future__ import annotations
+
+# Canonical prefix registry — single source of truth used by tests and
+# simulation code alike.
+ID_PREFIXES: dict[str, str] = {
+    "account": "acct",
+    "contact": "cnt",
+    "lead": "lead",
+    "touch": "touch",
+    "session": "sess",
+    "sales_activity": "act",
+    "opportunity": "opp",
+    "customer": "cust",
+    "subscription": "sub",
+    "rep": "rep",
+}
+
+_PAD_WIDTH = 6  # e.g. acct_000001
+
+
+def make_id(prefix: str, n: int) -> str:
+    """Return a zero-padded entity ID string.
+
+    Args:
+        prefix: The namespace prefix (e.g. ``"acct"``).
+        n: A 1-based counter for this entity type within one generation run.
+
+    Returns:
+        A string of the form ``"<prefix>_<n:06d>"``; e.g. ``"acct_000001"``.
+
+    Raises:
+        ValueError: if *n* is not a positive integer.
+    """
+    if not isinstance(n, int) or isinstance(n, bool) or n < 1:
+        raise ValueError(f"n must be a positive int, got {n!r}")
+    return f"{prefix}_{n:0{_PAD_WIDTH}d}"