Conversation
…res, tasks) - core/ids.py: make_id(prefix, n) → zero-padded entity ID strings; ID_PREFIXES registry covering all 9 entity namespaces - schema/entities.py: typed row dataclasses for all 9 v1 tables (AccountRow, ContactRow, LeadRow, TouchRow, SessionRow, SalesActivityRow, OpportunityRow, CustomerRow, SubscriptionRow) each with DTYPE_MAP, to_dict(), empty_dataframe(), and Parquet round-trip support via schema/tables.py - schema/relationships.py: FKConstraint dataclass, ALL_CONSTRAINTS tuple (10 FK edges from §9.2), validate_fk() raising FKViolationError - schema/features.py: FeatureSpec frozen dataclass; LEAD_SNAPSHOT_FEATURES — 29 pre-anchor features + 1 target for converted_within_90_days task - schema/dictionaries.py: feature_dictionary_df() and write_feature_dictionary() for the mandatory feature_dictionary.csv - schema/tasks.py: SplitSpec + TaskManifest frozen dataclasses; CONVERTED_WITHIN_90_DAYS constant (70/15/15 split, 90-day window) - pyproject.toml: pandas≥2.0 and pyarrow≥14.0 added as core deps; mypy overrides to ignore missing stubs for both - 82 new tests (ids, entities, relationships, features, tasks); total 192 passing; ruff + mypy clean Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
Implements Milestone 3’s “schema layer” for Leadforge by introducing typed table row contracts, ID generation, FK validation utilities, feature/task metadata, and Parquet I/O helpers to support future simulation/render layers.
Changes:
- Added schema modules for entities (row dataclasses + empty DataFrames), relationships (FK constraints + validator), features/dictionary, tasks (task manifest + split spec), and Parquet read/write helpers.
- Added ID generation utilities (
make_id,ID_PREFIXES) and expanded core dependencies to includepandasandpyarrow. - Added a large suite of new schema tests covering the new functionality.
Reviewed changes
Copilot reviewed 16 out of 17 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
leadforge/core/ids.py |
Adds canonical ID prefixes and deterministic zero-padded ID generation. |
leadforge/core/hashing.py |
Tweaks canonicalization logic used for config hashing. |
leadforge/narrative/spec.py |
Updates YAML validation type checks in narrative spec parsing. |
leadforge/schema/entities.py |
Introduces row dataclasses + dtype maps + empty DataFrame builders for 9 tables. |
leadforge/schema/relationships.py |
Adds FK constraint definitions and FK validation helper + error type. |
leadforge/schema/features.py |
Adds FeatureSpec and canonical lead snapshot feature list. |
leadforge/schema/dictionaries.py |
Builds/writes the required feature dictionary CSV artifact. |
leadforge/schema/tasks.py |
Adds TaskManifest/SplitSpec and v1 task constant. |
leadforge/schema/tables.py |
Adds Parquet read/write helpers (pyarrow engine). |
tests/schema/test_entities.py |
Tests entity schemas, empty DataFrames, and Parquet round-trips. |
tests/schema/test_relationships.py |
Tests FK constraints and FK validation error behavior. |
tests/schema/test_features.py |
Tests feature specs and feature dictionary DataFrame/CSV writing. |
tests/schema/test_tasks.py |
Tests task manifest serialization and split validation. |
tests/schema/test_ids.py |
Tests ID formatting and prefix registry coverage. |
pyproject.toml |
Adds pandas/pyarrow dependencies + mypy ignores for missing stubs. |
.agent-plan.md |
Updates project planning status to reflect Milestone 3 completion. |
tests/schema/__init__.py |
Initializes schema test package (empty). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This comment has been minimized.
This comment has been minimized.
- narrative/spec.py, core/hashing.py: revert UP038 unsafe fix — replace all isinstance(x, A | B) with isinstance(x, (A, B)) + noqa: UP038 (Python's isinstance does not accept PEP 604 union types at runtime; the ruff auto-fix was incorrect) (COPILOT-1, 2, 4, 7, 8, 9, 10, 11) - schema/tasks.py: SplitSpec.__post_init__ now validates each fraction is finite and in [0, 1] before checking the sum, preventing manifests with values like (1.2, -0.1, -0.1) that would sum to 1 but are invalid (COPILOT-3) - schema/entities.py: TouchRow.DTYPE_MAP column order aligned to match dataclass field order (touch_direction before campaign_id) (COPILOT-5) - schema/dictionaries.py: feature_dictionary_df() now casts string columns to pd.StringDtype; docstring corrected to reflect actual dtypes (COPILOT-6) - tests/schema/test_tasks.py: two new tests covering per-component SplitSpec rejection (negative and >1 values) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
Implements the Milestone 3 “schema layer” for Leadforge by defining typed table-row contracts, Parquet IO helpers, FK relationship validation, a canonical feature dictionary for the primary task, and a task manifest—plus dependencies and a broad test suite to lock in these contracts.
Changes:
- Add schema primitives: entity row dataclasses (with dtype maps + empty DataFrames), Parquet read/write utilities, and FK constraints + validation.
- Introduce task/ML metadata: lead snapshot feature specs + CSV feature dictionary generation, and a task manifest for
converted_within_90_days. - Add core ID formatting utilities and expand tests to cover the new schema layer end-to-end.
Reviewed changes
Copilot reviewed 16 out of 17 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
leadforge/core/ids.py |
Adds make_id() and ID_PREFIXES registry for entity ID formatting. |
leadforge/core/hashing.py |
Small lint-only change (adds # noqa: UP038). |
leadforge/narrative/spec.py |
Small lint-only changes (adds # noqa: UP038 in type checks). |
leadforge/schema/entities.py |
Defines typed row dataclasses for the 9 v1 tables + empty DataFrame constructors. |
leadforge/schema/tables.py |
Adds Parquet read/write helpers used by tests (and later render layer). |
leadforge/schema/relationships.py |
Adds FKConstraint graph + validate_fk() + error type. |
leadforge/schema/features.py |
Adds FeatureSpec + canonical lead snapshot feature list including target. |
leadforge/schema/dictionaries.py |
Builds/writes feature_dictionary.csv from feature specs. |
leadforge/schema/tasks.py |
Adds SplitSpec, TaskManifest, and the v1 task constant. |
pyproject.toml |
Adds pandas + pyarrow dependencies and mypy ignore overrides for them. |
tests/schema/test_ids.py |
Tests ID formatting and prefix registry. |
tests/schema/test_entities.py |
Tests entity row contracts, empty DataFrames, and Parquet round-trips. |
tests/schema/test_relationships.py |
Tests FK constraints and validate_fk() behavior/errors. |
tests/schema/test_features.py |
Tests feature specs + feature dictionary DataFrame/CSV generation. |
tests/schema/test_tasks.py |
Tests split validation, manifest fields, and JSON-serializable output. |
tests/schema/__init__.py |
Adds schema test package marker. |
.agent-plan.md |
Updates milestone tracking documentation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ck pattern
Hardens the refresh flow so bot-authored / approval-gated review events
(e.g. from Copilot) do not leave pr-agent-context refresh stuck waiting
for a manual approval or re-run.
Changes to pr-agent-context-refresh.yml:
- Add workflow_dispatch trigger with three required inputs:
pull_request_number, pull_request_head_sha, pull_request_base_sha
(needed by dispatcher-initiated runs that carry no PR event payload)
- Pass all three inputs as explicit PR context overrides to the reusable
workflow call (empty string when event-triggered, so upstream default
resolution applies)
- SHA-aware concurrency group: workflow_dispatch runs key on
"{pr_number}-{head_sha}", so different-SHA dispatches for the same PR
run rather than cancel, while duplicate same-SHA dispatches are deduplicated
- Extend job `if` guard to always pass for workflow_dispatch
New pr-agent-context-refresh-dispatcher.yml:
- Triggers on schedule (every 15 min, Mon-Fri 07:00-23:00 UTC)
- For each open same-repo PR with recent review activity (last 20 min):
- Bounded lookup: only checks the 10 most recent reviews/comments
- In-flight dedupe: skips if any run for that head SHA is queued,
in_progress, waiting, or completed within the last 10 min
- Per-PR error isolation: failures are caught and logged; other PRs
are still processed
- Uses correct actions/github-script@v7 Octokit method names
(github.rest.pulls.*, github.rest.actions.*)
- Dispatches to the repo default branch so the workflow file is
always resolvable
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
- schema/tasks.py: SplitSpec.__post_init__ now explicitly rejects bool values before the numeric checks (bool is an int subclass, so True/False previously passed the isinstance guard and range check silently) (COPILOT-1); added test_split_spec_rejects_bool_component - schema/dictionaries.py: remove empty `if TYPE_CHECKING: pass` block (COPILOT-2) - core/ids.py: clarify in module docstring that the 9 table-backed prefixes correspond to relational entities, and that rep_ is an internal-only namespace with no standalone output table (COPILOT-3) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
Implements the Milestone 3 “schema layer” for Leadforge by adding a typed relational schema (row contracts + dtypes), ID generation, FK constraint validation, Parquet helpers, and task/feature metadata (feature dictionary + task manifest), with comprehensive tests and dependency updates.
Changes:
- Add schema modules for entity row dataclasses, Parquet I/O, FK constraints/validation, feature specs + feature dictionary CSV, and task manifest definitions.
- Add core ID helper (
make_id) plus canonical prefix registry, and update mypy/ruff-related annotations. - Add extensive test coverage for the new schema layer and update CI workflows for PR-agent context refresh dispatching.
Reviewed changes
Copilot reviewed 18 out of 19 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
leadforge/core/ids.py |
Adds make_id() and ID_PREFIXES registry for ID formatting/namespaces. |
leadforge/core/hashing.py |
Adds a ruff suppression comment for isinstance(..., (list, tuple)). |
leadforge/narrative/spec.py |
Adds ruff suppression comments (no logic change). |
leadforge/schema/entities.py |
Introduces 9 typed row dataclasses + dtype maps and empty DataFrame builders. |
leadforge/schema/tables.py |
Adds write_parquet / read_parquet helpers using pyarrow. |
leadforge/schema/relationships.py |
Adds FK constraint definitions + validate_fk() raising FKViolationError. |
leadforge/schema/features.py |
Adds FeatureSpec and the canonical lead snapshot feature list. |
leadforge/schema/dictionaries.py |
Adds feature dictionary DataFrame builder + CSV writer. |
leadforge/schema/tasks.py |
Adds SplitSpec, TaskManifest, and CONVERTED_WITHIN_90_DAYS. |
tests/schema/test_ids.py |
Tests ID formatting, validation, and prefix registry coverage. |
tests/schema/test_relationships.py |
Tests FK constraints list and FK validation behavior/errors. |
tests/schema/test_tasks.py |
Tests split validation + task manifest immutability and serialization. |
tests/schema/test_features.py |
Tests feature specs and feature dictionary CSV/DataFrame output. |
tests/schema/test_entities.py |
Tests entity row to_dict, empty schema DataFrames, and Parquet round-trips. |
tests/schema/__init__.py |
Adds schema test package marker. |
pyproject.toml |
Adds pandas/pyarrow deps and mypy ignore overrides for them. |
.github/workflows/pr-agent-context-refresh.yml |
Adds workflow_dispatch support + SHA-aware concurrency inputs. |
.github/workflows/pr-agent-context-refresh-dispatcher.yml |
Adds scheduled dispatcher to trigger refresh runs when needed. |
.agent-plan.md |
Updates milestone tracking notes to reflect schema layer completion. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…llback Root cause of the stale refresh comment after Copilot's review on PR #7: 1. The dispatcher workflow only lives on this PR branch; scheduled workflows only run from the default branch, so it had never fired. This resolves the moment the PR merges. 2. The dedupe logic in the dispatcher was too broad: (r.status === 'completed' && r.updated_at >= recentCompletedSince) This treated ANY recently-completed run as valid coverage, including runs with conclusion=startup_failure (bot-triggered run failed to start) and conclusion=action_required (run was approval-gated and auto-cancelled). Both conclusions mean the run was blocked and never produced a refresh comment — yet they suppressed the fallback, leaving the PR stale. Additionally, runs with status=action_required (actively waiting for approval) were erroneously counted as "waiting" coverage. Fix applied to the dispatcher: - Extract BLOCKED_CONCLUSIONS set: startup_failure, action_required, failure, cancelled, timed_out, skipped — none of these suppress dispatch. - A completed run only counts as coverage when its conclusion is 'success' or 'neutral' AND it completed within the recent-success window. - status=action_required (approval-gated, will never run) explicitly returns false — not treated as active coverage. - Active statuses (in_progress, queued, waiting, requested) still suppress dispatch — those will actually produce a refresh. - Schedule changed from business-hours-only to '*/15 * * * *' so bot reviews arriving outside Mon–Fri 07-23 UTC are also caught. - Comments in the YAML explain the dedupe contract explicitly. Verification against the failing scenario (SHA e23aea4): - run 24729455592 (startup_failure): BLOCKED_CONCLUSIONS → not coverage → dispatcher would redispatch ✓ - pull_request_review_comment runs (action_required): BLOCKED_CONCLUSIONS → not coverage → dispatcher would redispatch ✓ - A currently in_progress dispatch run: active → suppress ✓ - A recently succeeded dispatch run: success + within window → suppress ✓ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Why the Copilot refresh comment was missing — and what was fixedRoot cause 1 — dispatcher not yet active (structural)The fallback dispatcher workflow ( Root cause 2 — dedupe bug suppressed the fallback (logic)Even once merged, the dispatcher had a bug that would have caused it to silently skip PRs in exactly the situation it was designed to cover. The old dedupe check: (r.status === 'completed' && r.updated_at >= recentCompletedSince)This treated any recently-completed run as valid coverage — including:
Both conclusions mean the run was blocked before doing any work and produced no refresh comment. Yet the old dedupe matched them and skipped dispatch — suppressing the fallback precisely when it was needed. The old code also did not handle Fix
A completed run now only suppresses dispatch when:
Active runs ( Schedule also changed from business-hours-only ( Verification against the failing scenario
|
This comment has been minimized.
This comment has been minimized.
- relationships.py: replace set-difference orphan collection with an O(1) single-pass pattern (count + bounded sample list) to avoid building a full intermediate set; rename _MAX_SAMPLE → _max_sample for ruff N806 - test_entities.py: wrap nullable-boolean iloc result in bool() before identity-comparing to True, avoiding pandas BooleanDtype NA-coercion risk - test_features.py: use direct boolean filter df[df["is_target"]] instead of df[df["is_target"] == True] (ruff E712) - pr-agent-context-refresh-dispatcher.yml: switch from per_page:50 cap to github.paginate() so all open PRs are checked, not just the first page Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
Implements the Milestone 3 schema layer for Leadforge v1, defining typed table row contracts, ID generation, FK validation, task/feature metadata artifacts, and associated tests—plus updates to PR-agent context refresh workflows.
Changes:
- Added schema modules for entity rows (
DTYPE_MAP+empty_dataframe()), Parquet IO helpers, FK constraints/validation, feature specs + feature dictionary CSV, and task manifest definitions. - Added extensive pytest coverage for IDs, entities, FK validation, Parquet round-trips, feature dictionary output, and task manifest serialization.
- Updated dependencies/mypy config for pandas/pyarrow and introduced workflow_dispatch + a dispatcher workflow for PR-agent context refresh.
Reviewed changes
Copilot reviewed 18 out of 19 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
leadforge/core/ids.py |
Adds ID prefix registry and make_id() formatting/validation. |
leadforge/core/hashing.py |
Adds # noqa: UP038 to preserve existing isinstance style. |
leadforge/narrative/spec.py |
Adds # noqa: UP038 annotations to existing validation checks. |
leadforge/schema/entities.py |
Introduces 9 typed row dataclasses with DTYPE_MAP, to_dict(), empty_dataframe(), and registries. |
leadforge/schema/features.py |
Defines FeatureSpec and canonical ordered LEAD_SNAPSHOT_FEATURES. |
leadforge/schema/dictionaries.py |
Builds/writes feature_dictionary.csv from feature specs. |
leadforge/schema/relationships.py |
Defines FK constraint graph and validate_fk() with FKViolationError. |
leadforge/schema/tables.py |
Adds Parquet read/write helpers using pyarrow. |
leadforge/schema/tasks.py |
Defines SplitSpec, TaskManifest, and CONVERTED_WITHIN_90_DAYS. |
tests/schema/test_ids.py |
Tests ID formatting and prefix registry coverage. |
tests/schema/test_entities.py |
Tests row to_dict(), empty dataframe schema, and Parquet round-trips. |
tests/schema/test_relationships.py |
Tests FK constraint definitions and validation error behavior. |
tests/schema/test_features.py |
Tests feature spec invariants and feature dictionary CSV output. |
tests/schema/test_tasks.py |
Tests split validation, task manifest invariants, and JSON serialization. |
tests/schema/__init__.py |
Creates schema test package marker. |
pyproject.toml |
Adds pandas/pyarrow deps and mypy overrides. |
.github/workflows/pr-agent-context-refresh.yml |
Adds workflow_dispatch inputs and SHA-aware concurrency/passing inputs. |
.github/workflows/pr-agent-context-refresh-dispatcher.yml |
Adds scheduled dispatcher to trigger refresh when approval-gated. |
.agent-plan.md |
Updates plan/status to reflect Milestone 3 completion and Milestone 4 next steps. |
Comments suppressed due to low confidence (2)
.github/workflows/pr-agent-context-refresh.yml:35
- The workflow uses the
inputs.*context in theconcurrency.groupexpression, but this workflow also runs on non-workflow_dispatchevents. For those events,inputsis not a defined context and will cause expression evaluation failures. Usegithub.event.inputs.<name>(guarded bygithub.event_name == 'workflow_dispatch') or otherwise avoid referencinginputsoutside workflow_dispatch-only workflows.
concurrency:
group: >-
pr-agent-context-refresh-${{
(github.event_name == 'workflow_dispatch' &&
format('{0}-{1}', inputs.pull_request_number, inputs.pull_request_head_sha)) ||
github.event.pull_request.number ||
github.event.check_run.pull_requests[0].number ||
github.event.check_run.head_sha ||
github.sha
.github/workflows/pr-agent-context-refresh.yml:35
concurrency.groupindexesgithub.event.check_run.pull_requests[0]unconditionally. This workflow is triggered by allcheck_run: completedevents (including ones with no associated PRs), sopull_requests[0]can be out of bounds and fail the workflow before the job-levelif:guard runs. Wrap thepull_requests[0]access in a conditional that checks the array is non-empty, or fall back togithub.event.check_run.head_shaonly when no PRs are present.
(github.event_name == 'workflow_dispatch' &&
format('{0}-{1}', inputs.pull_request_number, inputs.pull_request_head_sha)) ||
github.event.pull_request.number ||
github.event.check_run.pull_requests[0].number ||
github.event.check_run.head_sha ||
github.sha
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- pr-agent-context-refresh.yml: use github.event.inputs.* instead of inputs.* in concurrency.group; the inputs context is only defined for workflow_dispatch/workflow_call events, github.event.inputs is always safe to reference - pr-agent-context-refresh-dispatcher.yml: remove dead r.status === 'action_required' guard — action_required is a conclusion value, not a status value, so this branch never fired; the actual case (completed+conclusion=action_required) is already handled by BLOCKED_CONCLUSIONS; update top-of-file comment accordingly Resolves COPILOT-1 and COPILOT-2. COPILOT-3 (make_id docstring) is resolved as already treated — the docstring correctly describes the prefix argument pattern. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
Implements the Milestone 3 “schema layer” by adding typed schema definitions (entities, relationships, tasks, features) plus Parquet I/O helpers and corresponding tests, and updates workflow automation to better refresh PR-agent context.
Changes:
- Add v1 schema modules: row dataclasses for 9 tables, FK constraint definitions + validation, task manifest + split spec, feature specs + feature dictionary export, and Parquet read/write helpers.
- Add extensive test coverage for IDs, schema row contracts, Parquet round-trips, FK validation, feature dictionary export, and task manifest serialization.
- Update GitHub Actions PR-agent context refresh workflows (new dispatcher + workflow_dispatch support + SHA-aware concurrency).
Reviewed changes
Copilot reviewed 18 out of 19 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
leadforge/core/ids.py |
Adds canonical ID prefix registry and make_id() for deterministic, zero-padded IDs. |
leadforge/schema/entities.py |
Introduces typed row dataclasses for all v1 tables with DTYPE_MAP, to_dict(), and empty_dataframe(). |
leadforge/schema/relationships.py |
Adds FK constraint model and validate_fk() raising FKViolationError on orphans. |
leadforge/schema/tables.py |
Adds Parquet read/write helpers using PyArrow. |
leadforge/schema/features.py |
Defines FeatureSpec and the canonical ordered lead-snapshot feature list. |
leadforge/schema/dictionaries.py |
Builds/writes feature_dictionary.csv from feature specs. |
leadforge/schema/tasks.py |
Adds SplitSpec, TaskManifest, and the CONVERTED_WITHIN_90_DAYS task definition. |
tests/schema/test_ids.py |
Tests ID format/validation and prefix registry coverage. |
tests/schema/test_entities.py |
Tests row contracts, empty DataFrame schema, and Parquet round-trips. |
tests/schema/test_relationships.py |
Tests FK constraint registry and FK validation behavior/error messages. |
tests/schema/test_features.py |
Tests feature spec invariants and feature dictionary CSV output. |
tests/schema/test_tasks.py |
Tests split validation, manifest immutability, and JSON-serializable dict output. |
pyproject.toml |
Adds pandas/pyarrow deps and mypy ignore overrides. |
.github/workflows/pr-agent-context-refresh.yml |
Adds workflow_dispatch inputs + SHA-aware concurrency updates for refresh workflow. |
.github/workflows/pr-agent-context-refresh-dispatcher.yml |
Adds scheduled dispatcher workflow to bypass approval-gated bot-triggered runs. |
leadforge/narrative/spec.py |
Adds Ruff UP038 suppressions for list/tuple checks. |
leadforge/core/hashing.py |
Adds Ruff UP038 suppression for list/tuple canonicalization. |
.agent-plan.md |
Updates milestone tracking text from M3 to M4. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This comment has been minimized.
This comment has been minimized.
github.rest.actions.listWorkflowRuns is the repo-level endpoint and does not accept a workflow_id filter; the correct method is listWorkflowRunsForWorkflow, which scopes the query to the named workflow and supports the head_sha parameter used for SHA-aware dedupe. Without this fix the call throws at runtime, the try/catch swallows the error, and the dedupe step is silently skipped — causing the dispatcher to always fire without deduplication. Resolves COPILOT-1 (round 5). FAIL-1 (startup_failure on bot-triggered refresh run) is expected approval-gate behaviour; no code change needed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
pr-agent-context report: This run includes an unresolved review comment on PR #7.
For each unresolved review comment, recommend one of: resolve as irrelevant, accept and implement
the recommended solution, open a separate issue and resolve as out-of-scope for this PR, accept and
implement a different solution, or resolve as already treated by the code.
After I reply with my decision per item, implement the accepted actions, resolve the corresponding
PR comments, and push all of these changes in a single commit.
# Copilot Comments
## COPILOT-1
Location: .github/workflows/pr-agent-context-refresh-dispatcher.yml:124
URL: https://github.com/leadforge-dev/leadforge/pull/7#discussion_r3123205024
Root author: copilot-pull-request-reviewer
Comment:
`actions/github-script` uses Octokit REST methods; `github.rest.actions.listWorkflowRuns` is not a valid method name for listing runs of a specific workflow. This will fail at runtime and prevent the dispatcher from deduping/dispatching. Use the workflow-specific endpoint method (e.g., `github.rest.actions.listWorkflowRunsForWorkflow`) with the same `workflow_id`/`head_sha` parameters instead.Run metadata: |
There was a problem hiding this comment.
Pull request overview
Implements Milestone 3’s schema layer for leadforge by adding typed entity row contracts, ID generation, FK validation, feature specs + dictionary export, and a task manifest, along with extensive tests and dependency updates.
Changes:
- Add core schema modules: entities (row dataclasses + empty DataFrames), Parquet IO helpers, FK constraints/validation, feature specs + feature dictionary CSV writer, and task manifest definitions.
- Add
core.ids.make_id()+ID_PREFIXESregistry and small lint-related tweaks in existing modules. - Add new GitHub Actions workflow dispatcher + workflow_dispatch support, plus a large test suite for the new schema layer.
Reviewed changes
Copilot reviewed 18 out of 19 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
leadforge/core/ids.py |
Adds canonical ID prefix registry and make_id() zero-padded ID generator. |
leadforge/core/hashing.py |
Minor lint suppression for tuple/list isinstance check. |
leadforge/schema/entities.py |
Introduces row dataclasses for 9 v1 relational tables with DTYPE_MAP, to_dict(), and empty_dataframe(). |
leadforge/schema/tables.py |
Adds write_parquet / read_parquet helpers (pyarrow engine). |
leadforge/schema/relationships.py |
Defines FK constraints graph and validate_fk() raising FKViolationError. |
leadforge/schema/features.py |
Defines FeatureSpec and the canonical lead snapshot feature list. |
leadforge/schema/dictionaries.py |
Builds/writes feature_dictionary.csv from feature specs. |
leadforge/schema/tasks.py |
Defines SplitSpec, TaskManifest, and CONVERTED_WITHIN_90_DAYS. |
leadforge/narrative/spec.py |
Adds lint suppression comments for tuple/list isinstance checks. |
pyproject.toml |
Adds pandas and pyarrow as core deps; mypy ignore overrides for both. |
tests/schema/test_ids.py |
Tests for ID generation and prefix registry coverage. |
tests/schema/test_entities.py |
Tests for row to_dict(), empty DataFrame schemas, and Parquet round-trips. |
tests/schema/test_relationships.py |
Tests FK constraints list and FK validation error behavior. |
tests/schema/test_features.py |
Tests feature specs + feature dictionary DataFrame/CSV output. |
tests/schema/test_tasks.py |
Tests split validation and task manifest serialization/frozen-ness. |
tests/schema/__init__.py |
Initializes schema test package. |
.github/workflows/pr-agent-context-refresh.yml |
Adds workflow_dispatch inputs + SHA-aware concurrency and forwards PR/SHA inputs. |
.github/workflows/pr-agent-context-refresh-dispatcher.yml |
Adds scheduled dispatcher workflow to trigger refresh runs for recent PR review activity. |
.agent-plan.md |
Updates plan state to reflect Milestone 3 completion and Milestone 4 focus. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| or math.isnan(value) | ||
| or math.isinf(value) |
| pull_request_number: ${{ inputs.pull_request_number || '' }} | ||
| pull_request_head_sha: ${{ inputs.pull_request_head_sha || '' }} | ||
| pull_request_base_sha: ${{ inputs.pull_request_base_sha || '' }} |
| Each class represents one row in a Parquet table. Fields map directly to | ||
| the column specifications in §16 of the architecture spec. Optional columns | ||
| (nullable in the output) use ``... | None`` typing. |
Review feedback addressed: - Remove primary_task/label_window_days as explicit kwargs from resolve_config() and Generator.from_recipe() — these fields are resolved from recipe YAML and override dict only, not casually overridable, since the generation pipeline doesn't yet support arbitrary task types (Copilot-1, Copilot-3, shaypal5 #1, #2) - Add label_window_days <= horizon_days validation in GenerationConfig.__post_init__ (Copilot-2, shaypal5 #3) - Add tests for invalid primary_task on GenerationConfig: empty string, non-string type (shaypal5 #6, pr-agent-context) - Add tests for invalid label_window_days on Recipe.from_dict: bool, non-positive, float (shaypal5 #7, pr-agent-context) - Add test for label_window_days > horizon_days rejection - Fix existing test using horizon_days=30 (now conflicts with default label_window_days=90) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: carry primary_task and label_window_days into WorldSpec for dataset card Add `primary_task` and `label_window_days` fields to `GenerationConfig` (with defaults preserving current behavior). Propagate through `Recipe.from_dict()`, `resolve_config()`, and `Generator.from_recipe()` so recipe YAML can override them. Update `render_dataset_card()` to read from `world_spec.config` instead of hard-coded string literals. Closes #6 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update .agent-plan.md for WorldSpec task fields (PR #36) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address review feedback — tighten scope, add validation + tests Review feedback addressed: - Remove primary_task/label_window_days as explicit kwargs from resolve_config() and Generator.from_recipe() — these fields are resolved from recipe YAML and override dict only, not casually overridable, since the generation pipeline doesn't yet support arbitrary task types (Copilot-1, Copilot-3, shaypal5 #1, #2) - Add label_window_days <= horizon_days validation in GenerationConfig.__post_init__ (Copilot-2, shaypal5 #3) - Add tests for invalid primary_task on GenerationConfig: empty string, non-string type (shaypal5 #6, pr-agent-context) - Add tests for invalid label_window_days on Recipe.from_dict: bool, non-positive, float (shaypal5 #7, pr-agent-context) - Add test for label_window_days > horizon_days rejection - Fix existing test using horizon_days=30 (now conflicts with default label_window_days=90) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Summary
leadforge/core/ids.py—make_id(prefix, n)returning zero-padded IDs (acct_000001);ID_PREFIXESregistry for all 9 entity namespaces.leadforge/schema/entities.py— typed row dataclasses for all 9 v1 tables (AccountRow,ContactRow,LeadRow,TouchRow,SessionRow,SalesActivityRow,OpportunityRow,CustomerRow,SubscriptionRow). Each hasTABLE_NAME,DTYPE_MAP,to_dict(), andempty_dataframe().leadforge/schema/tables.py—write_parquet()/read_parquet()helpers used in tests and future render layer.leadforge/schema/relationships.py—FKConstraintfrozen dataclass;ALL_CONSTRAINTS(10 FK edges from §9.2 of architecture spec);validate_fk()raisingFKViolationErroron orphaned keys.leadforge/schema/features.py—FeatureSpecfrozen dataclass with name, dtype, description, category,is_target,leakage_risk;LEAD_SNAPSHOT_FEATURES(29 pre-anchor features + 1 target).leadforge/schema/dictionaries.py—feature_dictionary_df()andwrite_feature_dictionary()for the mandatoryfeature_dictionary.csvbundle artifact.leadforge/schema/tasks.py—SplitSpec+TaskManifestfrozen dataclasses;CONVERTED_WITHIN_90_DAYSconstant (70/15/15 split, 90-day label window, binary classification).pyproject.toml—pandas≥2.0andpyarrow≥14.0added as core dependencies; mypy overrides added for both (no stubs required).Test plan
pytest— 192 tests passruff check . && ruff format --check .— cleanmypy leadforge/— no errorstest_empty_dataframe_parquet_roundtrip[*]— all 9 table types serialize and restore correctlytest_validate_fk_raises_on_orphan— FK violation producesFKViolationErrortest_write_feature_dictionary_creates_file— CSV artifact written correctlytest_task_manifest_to_dict_is_json_serializable— manifest serializes to JSON🤖 Generated with Claude Code