Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 28 additions & 19 deletions .agent-plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,41 +6,50 @@

## Current System State

**v0.2.0 in progress — Milestone 2 complete (PR open).** Typed `NarrativeSpec` hierarchy, `WorldSpec`
with narrative field, `Generator.from_recipe()` populates `world_spec`, dataset card renderer, and
full test coverage. 110 tests passing.
**v0.2.0 in progress — Milestone 3 complete (PR open).** All 9 relational table schemas defined as
typed row dataclasses with Parquet round-trip support. FK constraints, ID generation, feature
dictionary, and task manifest implemented. 192 tests passing.

---

## Active Task Breakdown — Milestone 3: Schema Layer (v0.2.0 cont.)
## Active Task Breakdown — Milestone 4: World Structure (v0.3.0)

Goal: Define the relational entity schema (accounts, contacts, leads, etc.) and feature dictionary.
Goal: Implement the hidden world graph — DAG of latent nodes, motif families, and stochastic rewiring.

- [ ] **1. Entity schema**
- Implement `schema/entities.py`: typed dataclasses for `Account`, `Contact`, `Lead`
- Implement `schema/events.py`: `Touch`, `SalesActivity`, `Opportunity` etc.

- [ ] **2. Feature dictionary**
- Implement `schema/features.py` + `schema/dictionaries.py`
- Generate `feature_dictionary.csv` stub

- [ ] **3. Task schema**
- Implement `schema/tasks.py`: `converted_within_90_days` task manifest structure
- [ ] **1. Node type system** (`structure/node_types.py`)
- [ ] **2. World graph** (`structure/graph.py`) — `networkx.DiGraph`, DAG validation
- [ ] **3. Motif families** (`structure/motifs.py`, `structure/templates.py`) — 5 v1 families
- [ ] **4. Stochastic rewiring** (`structure/rewiring.py`) — seeded perturbation
- [ ] **5. Sampler** (`structure/sampler.py`) — draw a world graph from a motif + config

---

## Context Pointers

- Milestone 3 scope: `docs/leadforge_implementation_plan.md` §6 "Milestone 3"
- Milestone 4 scope: `docs/leadforge_implementation_plan.md` §7 "Milestone 4"
- Full milestone dependency graph: `docs/leadforge_implementation_plan.md` §6
- Schema spec: `docs/leadforge_architecture_spec.md` §8
- Recipe assets: `leadforge/recipes/b2b_saas_procurement_v1/`
- Structure spec: `docs/leadforge_architecture_spec.md` §11
- Motif families: `docs/leadforge_architecture_spec.md` §11.2

---

## Completed Phases

### Milestone 2 — Narrative Layer ✓ (v0.2.0 in PR)
### Milestone 3 — Schema Layer ✓ (v0.2.0 in PR)
- `leadforge/core/ids.py`: `make_id(prefix, n)` + `ID_PREFIXES` registry
- `leadforge/schema/entities.py`: typed row dataclasses for all 9 tables (accounts, contacts,
leads, touches, sessions, sales_activities, opportunities, customers, subscriptions) with
`DTYPE_MAP`, `to_dict()`, `empty_dataframe()`, Parquet round-trip via `schema/tables.py`
- `leadforge/schema/relationships.py`: `FKConstraint`, `ALL_CONSTRAINTS` (10 FK edges),
`validate_fk()` helper raising `FKViolationError`
- `leadforge/schema/features.py`: `FeatureSpec` frozen dataclass + `LEAD_SNAPSHOT_FEATURES`
(29 features, one target)
- `leadforge/schema/dictionaries.py`: `feature_dictionary_df()` + `write_feature_dictionary()`
- `leadforge/schema/tasks.py`: `SplitSpec`, `TaskManifest`, `CONVERTED_WITHIN_90_DAYS` constant
- `pyproject.toml`: added pandas≥2.0 + pyarrow≥14.0 as core deps; mypy overrides for both
- 82 new tests; total 192 passing

### Milestone 2 — Narrative Layer ✓ (v0.2.0 merged)
- `leadforge/narrative/spec.py`: frozen dataclasses `NarrativeSpec`, `CompanySpec`, `ProductSpec`,
`MarketSpec`, `GtmMotionSpec`, `PersonaSpec`, `FunnelStageSpec` — all with validated `from_dict()`
- `leadforge/narrative/dataset_card.py`: `render_dataset_card(world_spec)` — Markdown card
Expand Down
176 changes: 176 additions & 0 deletions .github/workflows/pr-agent-context-refresh-dispatcher.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
name: PR agent context refresh dispatcher

# Runs on a schedule and dispatches pr-agent-context-refresh for any open
# same-repo PR that had recent review activity but no corresponding in-flight
# or recently-succeeded refresh run.
#
# WHY THIS EXISTS
# ---------------
# When a bot (e.g. Copilot / copilot-pull-request-reviewer[bot]) submits a
# review, the pull_request_review / pull_request_review_comment events fire
# and trigger pr-agent-context-refresh — but the triggered run is immediately
# blocked by GitHub's approval gate for bot/external actors:
#
# • conclusion=startup_failure → workflow could not start (counts as blocked)
# • conclusion=action_required → was approval-gated, later auto-cancelled
#
# None of those outcomes produce a refresh comment. This dispatcher fires
# from the default branch (where it is active), bypasses the approval gate
# by using the schedule's GITHUB_TOKEN, and dispatches a workflow_dispatch
# run that executes with full repo permissions.
#
# DEDUPE CONTRACT
# ---------------
# A dispatch is suppressed only when there is already meaningful coverage for
# the PR's current head SHA:
# • run is in_progress, queued, waiting, or requested → suppress
# • run completed with conclusion=success or =neutral recently → suppress
#
# Blocked / non-executed conclusions (startup_failure, action_required,
# failure, cancelled, timed_out, skipped) do NOT count as coverage and do
# NOT suppress a fallback dispatch.

on:
schedule:
# Every 15 minutes, all day every day.
# Bot reviews can arrive at any hour; restrict to business hours only if
# cost is a concern (e.g. '*/15 7-23 * * 1-5' for Mon-Fri 07-23 UTC).
- cron: '*/15 * * * *'

permissions:
actions: write
pull-requests: read

jobs:
dispatch:
name: Dispatch stalled PR agent context refreshes
runs-on: ubuntu-latest
steps:
- name: Find and dispatch pending refreshes
uses: actions/github-script@v7
with:
script: |
const REFRESH_WORKFLOW = 'pr-agent-context-refresh.yml';

// Only look at review activity in the last N minutes.
const LOOKBACK_MINUTES = 20;
// A successfully-completed run within this window suppresses redispatch.
const RECENT_SUCCESS_WINDOW_MINUTES = 10;

// Conclusions that mean the run was BLOCKED and never produced a
// refresh comment. These must NOT suppress a fallback dispatch.
const BLOCKED_CONCLUSIONS = new Set([
'startup_failure',
'action_required',
'failure',
'cancelled',
'timed_out',
'skipped',
]);

const now = Date.now();
const since = new Date(now - LOOKBACK_MINUTES * 60 * 1000).toISOString();
const recentSuccessSince = new Date(
now - RECENT_SUCCESS_WINDOW_MINUTES * 60 * 1000
).toISOString();

const defaultBranch = context.payload.repository.default_branch;

// List ALL open PRs in this repository via pagination (same-repo only).
const pulls = await github.paginate(github.rest.pulls.list, {
...context.repo,
state: 'open',
per_page: 100,
});

for (const pr of pulls) {
// Same-repo guard: skip forks.
if (pr.head.repo.full_name !== context.payload.repository.full_name) {
continue;
}

try {
// --- Bounded recent activity check ---
const [{ data: reviews }, { data: reviewComments }] = await Promise.all([
github.rest.pulls.listReviews({
...context.repo,
pull_number: pr.number,
per_page: 10,
}),
github.rest.pulls.listReviewComments({
...context.repo,
pull_number: pr.number,
per_page: 10,
}),
]);

const hasRecentActivity =
reviews.some((r) => r.submitted_at >= since) ||
reviewComments.some(
(c) => c.created_at >= since || c.updated_at >= since
);

if (!hasRecentActivity) continue;

// --- In-flight / recent-success dedupe ---
// Fetch runs for this exact head SHA so stale runs from
// earlier commits don't suppress dispatch for the new SHA.
const { data: { workflow_runs: runs } } =
await github.rest.actions.listWorkflowRunsForWorkflow({
...context.repo,
workflow_id: REFRESH_WORKFLOW,
head_sha: pr.head.sha,
per_page: 10,
});
Comment thread
shaypal5 marked this conversation as resolved.

const hasValidCoverage = runs.some((r) => {
// Actively working toward a refresh — don't interrupt.
if (
r.status === 'in_progress' ||
r.status === 'queued' ||
r.status === 'waiting' ||
r.status === 'requested'
) {
return true;
}

// Completed: only suppress if the run actually succeeded
// recently. Blocked / failed conclusions are transparent.
if (r.status === 'completed') {
if (BLOCKED_CONCLUSIONS.has(r.conclusion)) return false;
return (
(r.conclusion === 'success' || r.conclusion === 'neutral') &&
r.updated_at >= recentSuccessSince
);
}

return false;
});

if (hasValidCoverage) {
console.log(
`PR #${pr.number}: valid refresh already running or recently succeeded — skipping.`
);
continue;
}

// --- Dispatch ---
await github.rest.actions.createWorkflowDispatch({
...context.repo,
workflow_id: REFRESH_WORKFLOW,
ref: defaultBranch,
inputs: {
pull_request_number: String(pr.number),
pull_request_head_sha: pr.head.sha,
pull_request_base_sha: pr.base.sha,
},
});

console.log(
`Dispatched refresh for PR #${pr.number} (head: ${pr.head.sha}).`
);
} catch (err) {
// Per-PR error isolation: log and continue to the next PR.
console.error(`PR #${pr.number}: dispatch failed — ${err.message}`);
}
}
22 changes: 22 additions & 0 deletions .github/workflows/pr-agent-context-refresh.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,28 @@ on:
types: [created, edited, deleted]
check_run:
types: [completed]
workflow_dispatch:
inputs:
pull_request_number:
description: PR number to refresh
required: true
type: string
pull_request_head_sha:
description: Head SHA of the PR
required: true
type: string
pull_request_base_sha:
description: Base SHA of the PR
required: true
type: string

# SHA-aware concurrency: workflow_dispatch runs key on PR+SHA so same-PR/different-SHA
# dispatches are not cancelled, but duplicate dispatches for the same PR+SHA are.
concurrency:
group: >-
pr-agent-context-refresh-${{
(github.event_name == 'workflow_dispatch' &&
format('{0}-{1}', github.event.inputs.pull_request_number, github.event.inputs.pull_request_head_sha)) ||
github.event.pull_request.number ||
github.event.check_run.pull_requests[0].number ||
github.event.check_run.head_sha ||
Expand All @@ -27,6 +45,7 @@ jobs:
pr-agent-context-refresh:
name: PR agent context refresh
if: >-
github.event_name == 'workflow_dispatch' ||
(github.event_name == 'pull_request_review' &&
github.event.pull_request.head.repo.full_name == github.repository) ||
(github.event_name == 'pull_request_review_comment' &&
Expand All @@ -52,3 +71,6 @@ jobs:
wait_for_reviews_to_settle: true
publish_all_clear_comments_in_refresh: false
debug_artifacts: true
pull_request_number: ${{ inputs.pull_request_number || '' }}
pull_request_head_sha: ${{ inputs.pull_request_head_sha || '' }}
pull_request_base_sha: ${{ inputs.pull_request_base_sha || '' }}
Comment thread
shaypal5 marked this conversation as resolved.
Comment on lines +74 to +76
2 changes: 1 addition & 1 deletion leadforge/core/hashing.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ def _canonical(obj: Any) -> Any:
"""Recursively convert to a JSON-stable form (sorted keys, enums → str)."""
if isinstance(obj, dict):
return {k: _canonical(v) for k, v in sorted(obj.items())}
if isinstance(obj, (list, tuple)):
if isinstance(obj, (list, tuple)): # noqa: UP038
return [_canonical(v) for v in obj]
# StrEnum values are already strings; this handles plain Enum too
if hasattr(obj, "value"):
Expand Down
73 changes: 60 additions & 13 deletions leadforge/core/ids.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,63 @@
"""Entity ID generation.

Implemented in Milestone 3. All IDs must be stable, opaque, namespace-unique,
and deterministic for a given run.

Canonical prefixes:
acct_ — Account
cnt_ — Contact
lead_ — Lead
touch_ — Touch
sess_ — Session
act_ — SalesActivity
opp_ — Opportunity
cust_ — Customer
sub_ — Subscription
All IDs are stable, opaque, namespace-unique, and deterministic for a given
(recipe, config, seed) triple. Callers derive a dedicated RNG substream via
``RNGRoot.child()`` and pass a monotonically increasing counter to
:func:`make_id`.

Canonical prefixes
------------------
The following nine prefixes correspond directly to the nine relational tables
defined in ``schema/entities.py``:

acct_ — Account
cnt_ — Contact
lead_ — Lead
touch_ — Touch
sess_ — Session
act_ — SalesActivity
opp_ — Opportunity
cust_ — Customer
sub_ — Subscription

The ``rep_`` prefix is an internal-only namespace used for sales-rep entities
Comment thread
shaypal5 marked this conversation as resolved.
that participate in simulation mechanics but do **not** have a corresponding
standalone relational table in the v1 output bundle.
"""

from __future__ import annotations

# Canonical prefix registry — single source of truth used by tests and
# simulation code alike.
ID_PREFIXES: dict[str, str] = {
"account": "acct",
"contact": "cnt",
"lead": "lead",
"touch": "touch",
"session": "sess",
"sales_activity": "act",
"opportunity": "opp",
"customer": "cust",
"subscription": "sub",
"rep": "rep",
}
Comment thread
shaypal5 marked this conversation as resolved.
Comment thread
shaypal5 marked this conversation as resolved.

_PAD_WIDTH = 6 # e.g. acct_000001


def make_id(prefix: str, n: int) -> str:
"""Return a zero-padded entity ID string.

Args:
prefix: The namespace prefix (e.g. ``"acct"``).
n: A 1-based counter for this entity type within one generation run.

Returns:
A string of the form ``"<prefix>_<n:06d>"``; e.g. ``"acct_000001"``.

Raises:
ValueError: if *n* is not a positive integer.
"""
if not isinstance(n, int) or isinstance(n, bool) or n < 1:
raise ValueError(f"n must be a positive int, got {n!r}")
return f"{prefix}_{n:0{_PAD_WIDTH}d}"
Loading
Loading