Skip to content

feat: Milestone 3 — schema layer (entities, IDs, FK validation, feature dictionary, task manifest)#7

Merged
shaypal5 merged 8 commits intomainfrom
feat/milestone-3-schema
Apr 22, 2026
Merged

feat: Milestone 3 — schema layer (entities, IDs, FK validation, feature dictionary, task manifest)#7
shaypal5 merged 8 commits intomainfrom
feat/milestone-3-schema

Conversation

@shaypal5
Copy link
Copy Markdown
Contributor

Summary

  • leadforge/core/ids.pymake_id(prefix, n) returning zero-padded IDs (acct_000001); ID_PREFIXES registry for all 9 entity namespaces.
  • leadforge/schema/entities.py — typed row dataclasses for all 9 v1 tables (AccountRow, ContactRow, LeadRow, TouchRow, SessionRow, SalesActivityRow, OpportunityRow, CustomerRow, SubscriptionRow). Each has TABLE_NAME, DTYPE_MAP, to_dict(), and empty_dataframe().
  • leadforge/schema/tables.pywrite_parquet() / read_parquet() helpers used in tests and future render layer.
  • leadforge/schema/relationships.pyFKConstraint frozen dataclass; ALL_CONSTRAINTS (10 FK edges from §9.2 of architecture spec); validate_fk() raising FKViolationError on orphaned keys.
  • leadforge/schema/features.pyFeatureSpec frozen dataclass with name, dtype, description, category, is_target, leakage_risk; LEAD_SNAPSHOT_FEATURES (29 pre-anchor features + 1 target).
  • leadforge/schema/dictionaries.pyfeature_dictionary_df() and write_feature_dictionary() for the mandatory feature_dictionary.csv bundle artifact.
  • leadforge/schema/tasks.pySplitSpec + TaskManifest frozen dataclasses; CONVERTED_WITHIN_90_DAYS constant (70/15/15 split, 90-day label window, binary classification).
  • pyproject.tomlpandas≥2.0 and pyarrow≥14.0 added as core dependencies; mypy overrides added for both (no stubs required).
  • 82 new tests covering IDs, all 9 entity row types, Parquet round-trips, FK validation, feature dictionary, and task manifest. Total: 192 tests passing.

Test plan

  • pytest — 192 tests pass
  • ruff check . && ruff format --check . — clean
  • mypy leadforge/ — no errors
  • test_empty_dataframe_parquet_roundtrip[*] — all 9 table types serialize and restore correctly
  • test_validate_fk_raises_on_orphan — FK violation produces FKViolationError
  • test_write_feature_dictionary_creates_file — CSV artifact written correctly
  • test_task_manifest_to_dict_is_json_serializable — manifest serializes to JSON

🤖 Generated with Claude Code

…res, tasks)

- core/ids.py: make_id(prefix, n) → zero-padded entity ID strings;
  ID_PREFIXES registry covering all 9 entity namespaces
- schema/entities.py: typed row dataclasses for all 9 v1 tables
  (AccountRow, ContactRow, LeadRow, TouchRow, SessionRow,
  SalesActivityRow, OpportunityRow, CustomerRow, SubscriptionRow)
  each with DTYPE_MAP, to_dict(), empty_dataframe(), and Parquet
  round-trip support via schema/tables.py
- schema/relationships.py: FKConstraint dataclass, ALL_CONSTRAINTS
  tuple (10 FK edges from §9.2), validate_fk() raising FKViolationError
- schema/features.py: FeatureSpec frozen dataclass; LEAD_SNAPSHOT_FEATURES
  — 29 pre-anchor features + 1 target for converted_within_90_days task
- schema/dictionaries.py: feature_dictionary_df() and
  write_feature_dictionary() for the mandatory feature_dictionary.csv
- schema/tasks.py: SplitSpec + TaskManifest frozen dataclasses;
  CONVERTED_WITHIN_90_DAYS constant (70/15/15 split, 90-day window)
- pyproject.toml: pandas≥2.0 and pyarrow≥14.0 added as core deps;
  mypy overrides to ignore missing stubs for both
- 82 new tests (ids, entities, relationships, features, tasks);
  total 192 passing; ruff + mypy clean

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@shaypal5 shaypal5 added type: feature New capability layer: core core/ primitives (RNG, IDs, models, exceptions) layer: schema schema/ entity/event contracts labels Apr 21, 2026
Copilot AI review requested due to automatic review settings April 21, 2026 11:23
@shaypal5 shaypal5 self-assigned this Apr 21, 2026
@github-actions

This comment has been minimized.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements Milestone 3’s “schema layer” for Leadforge by introducing typed table row contracts, ID generation, FK validation utilities, feature/task metadata, and Parquet I/O helpers to support future simulation/render layers.

Changes:

  • Added schema modules for entities (row dataclasses + empty DataFrames), relationships (FK constraints + validator), features/dictionary, tasks (task manifest + split spec), and Parquet read/write helpers.
  • Added ID generation utilities (make_id, ID_PREFIXES) and expanded core dependencies to include pandas and pyarrow.
  • Added a large suite of new schema tests covering the new functionality.

Reviewed changes

Copilot reviewed 16 out of 17 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
leadforge/core/ids.py Adds canonical ID prefixes and deterministic zero-padded ID generation.
leadforge/core/hashing.py Tweaks canonicalization logic used for config hashing.
leadforge/narrative/spec.py Updates YAML validation type checks in narrative spec parsing.
leadforge/schema/entities.py Introduces row dataclasses + dtype maps + empty DataFrame builders for 9 tables.
leadforge/schema/relationships.py Adds FK constraint definitions and FK validation helper + error type.
leadforge/schema/features.py Adds FeatureSpec and canonical lead snapshot feature list.
leadforge/schema/dictionaries.py Builds/writes the required feature dictionary CSV artifact.
leadforge/schema/tasks.py Adds TaskManifest/SplitSpec and v1 task constant.
leadforge/schema/tables.py Adds Parquet read/write helpers (pyarrow engine).
tests/schema/test_entities.py Tests entity schemas, empty DataFrames, and Parquet round-trips.
tests/schema/test_relationships.py Tests FK constraints and FK validation error behavior.
tests/schema/test_features.py Tests feature specs and feature dictionary DataFrame/CSV writing.
tests/schema/test_tasks.py Tests task manifest serialization and split validation.
tests/schema/test_ids.py Tests ID formatting and prefix registry coverage.
pyproject.toml Adds pandas/pyarrow dependencies + mypy ignores for missing stubs.
.agent-plan.md Updates project planning status to reflect Milestone 3 completion.
tests/schema/__init__.py Initializes schema test package (empty).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread leadforge/narrative/spec.py
Comment thread leadforge/narrative/spec.py
Comment thread leadforge/schema/tasks.py
Comment thread leadforge/narrative/spec.py
Comment thread leadforge/schema/entities.py
Comment thread leadforge/core/hashing.py Outdated
Comment thread leadforge/narrative/spec.py
Comment thread leadforge/narrative/spec.py
Comment thread leadforge/narrative/spec.py
Comment thread leadforge/narrative/spec.py
@github-actions

This comment has been minimized.

- narrative/spec.py, core/hashing.py: revert UP038 unsafe fix — replace
  all isinstance(x, A | B) with isinstance(x, (A, B)) + noqa: UP038
  (Python's isinstance does not accept PEP 604 union types at runtime;
  the ruff auto-fix was incorrect) (COPILOT-1, 2, 4, 7, 8, 9, 10, 11)
- schema/tasks.py: SplitSpec.__post_init__ now validates each fraction
  is finite and in [0, 1] before checking the sum, preventing manifests
  with values like (1.2, -0.1, -0.1) that would sum to 1 but are invalid
  (COPILOT-3)
- schema/entities.py: TouchRow.DTYPE_MAP column order aligned to match
  dataclass field order (touch_direction before campaign_id) (COPILOT-5)
- schema/dictionaries.py: feature_dictionary_df() now casts string
  columns to pd.StringDtype; docstring corrected to reflect actual dtypes
  (COPILOT-6)
- tests/schema/test_tasks.py: two new tests covering per-component
  SplitSpec rejection (negative and >1 values)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the Milestone 3 “schema layer” for Leadforge by defining typed table-row contracts, Parquet IO helpers, FK relationship validation, a canonical feature dictionary for the primary task, and a task manifest—plus dependencies and a broad test suite to lock in these contracts.

Changes:

  • Add schema primitives: entity row dataclasses (with dtype maps + empty DataFrames), Parquet read/write utilities, and FK constraints + validation.
  • Introduce task/ML metadata: lead snapshot feature specs + CSV feature dictionary generation, and a task manifest for converted_within_90_days.
  • Add core ID formatting utilities and expand tests to cover the new schema layer end-to-end.

Reviewed changes

Copilot reviewed 16 out of 17 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
leadforge/core/ids.py Adds make_id() and ID_PREFIXES registry for entity ID formatting.
leadforge/core/hashing.py Small lint-only change (adds # noqa: UP038).
leadforge/narrative/spec.py Small lint-only changes (adds # noqa: UP038 in type checks).
leadforge/schema/entities.py Defines typed row dataclasses for the 9 v1 tables + empty DataFrame constructors.
leadforge/schema/tables.py Adds Parquet read/write helpers used by tests (and later render layer).
leadforge/schema/relationships.py Adds FKConstraint graph + validate_fk() + error type.
leadforge/schema/features.py Adds FeatureSpec + canonical lead snapshot feature list including target.
leadforge/schema/dictionaries.py Builds/writes feature_dictionary.csv from feature specs.
leadforge/schema/tasks.py Adds SplitSpec, TaskManifest, and the v1 task constant.
pyproject.toml Adds pandas + pyarrow dependencies and mypy ignore overrides for them.
tests/schema/test_ids.py Tests ID formatting and prefix registry.
tests/schema/test_entities.py Tests entity row contracts, empty DataFrames, and Parquet round-trips.
tests/schema/test_relationships.py Tests FK constraints and validate_fk() behavior/errors.
tests/schema/test_features.py Tests feature specs + feature dictionary DataFrame/CSV generation.
tests/schema/test_tasks.py Tests split validation, manifest fields, and JSON-serializable output.
tests/schema/__init__.py Adds schema test package marker.
.agent-plan.md Updates milestone tracking documentation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread leadforge/schema/tasks.py
Comment thread leadforge/schema/dictionaries.py Outdated
Comment thread leadforge/core/ids.py
…ck pattern

Hardens the refresh flow so bot-authored / approval-gated review events
(e.g. from Copilot) do not leave pr-agent-context refresh stuck waiting
for a manual approval or re-run.

Changes to pr-agent-context-refresh.yml:
- Add workflow_dispatch trigger with three required inputs:
  pull_request_number, pull_request_head_sha, pull_request_base_sha
  (needed by dispatcher-initiated runs that carry no PR event payload)
- Pass all three inputs as explicit PR context overrides to the reusable
  workflow call (empty string when event-triggered, so upstream default
  resolution applies)
- SHA-aware concurrency group: workflow_dispatch runs key on
  "{pr_number}-{head_sha}", so different-SHA dispatches for the same PR
  run rather than cancel, while duplicate same-SHA dispatches are deduplicated
- Extend job `if` guard to always pass for workflow_dispatch

New pr-agent-context-refresh-dispatcher.yml:
- Triggers on schedule (every 15 min, Mon-Fri 07:00-23:00 UTC)
- For each open same-repo PR with recent review activity (last 20 min):
  - Bounded lookup: only checks the 10 most recent reviews/comments
  - In-flight dedupe: skips if any run for that head SHA is queued,
    in_progress, waiting, or completed within the last 10 min
  - Per-PR error isolation: failures are caught and logged; other PRs
    are still processed
  - Uses correct actions/github-script@v7 Octokit method names
    (github.rest.pulls.*, github.rest.actions.*)
  - Dispatches to the repo default branch so the workflow file is
    always resolvable

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

- schema/tasks.py: SplitSpec.__post_init__ now explicitly rejects bool
  values before the numeric checks (bool is an int subclass, so True/False
  previously passed the isinstance guard and range check silently)
  (COPILOT-1); added test_split_spec_rejects_bool_component
- schema/dictionaries.py: remove empty `if TYPE_CHECKING: pass` block
  (COPILOT-2)
- core/ids.py: clarify in module docstring that the 9 table-backed
  prefixes correspond to relational entities, and that rep_ is an
  internal-only namespace with no standalone output table (COPILOT-3)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 21, 2026 14:50
@github-actions

This comment has been minimized.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the Milestone 3 “schema layer” for Leadforge by adding a typed relational schema (row contracts + dtypes), ID generation, FK constraint validation, Parquet helpers, and task/feature metadata (feature dictionary + task manifest), with comprehensive tests and dependency updates.

Changes:

  • Add schema modules for entity row dataclasses, Parquet I/O, FK constraints/validation, feature specs + feature dictionary CSV, and task manifest definitions.
  • Add core ID helper (make_id) plus canonical prefix registry, and update mypy/ruff-related annotations.
  • Add extensive test coverage for the new schema layer and update CI workflows for PR-agent context refresh dispatching.

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
leadforge/core/ids.py Adds make_id() and ID_PREFIXES registry for ID formatting/namespaces.
leadforge/core/hashing.py Adds a ruff suppression comment for isinstance(..., (list, tuple)).
leadforge/narrative/spec.py Adds ruff suppression comments (no logic change).
leadforge/schema/entities.py Introduces 9 typed row dataclasses + dtype maps and empty DataFrame builders.
leadforge/schema/tables.py Adds write_parquet / read_parquet helpers using pyarrow.
leadforge/schema/relationships.py Adds FK constraint definitions + validate_fk() raising FKViolationError.
leadforge/schema/features.py Adds FeatureSpec and the canonical lead snapshot feature list.
leadforge/schema/dictionaries.py Adds feature dictionary DataFrame builder + CSV writer.
leadforge/schema/tasks.py Adds SplitSpec, TaskManifest, and CONVERTED_WITHIN_90_DAYS.
tests/schema/test_ids.py Tests ID formatting, validation, and prefix registry coverage.
tests/schema/test_relationships.py Tests FK constraints list and FK validation behavior/errors.
tests/schema/test_tasks.py Tests split validation + task manifest immutability and serialization.
tests/schema/test_features.py Tests feature specs and feature dictionary CSV/DataFrame output.
tests/schema/test_entities.py Tests entity row to_dict, empty schema DataFrames, and Parquet round-trips.
tests/schema/__init__.py Adds schema test package marker.
pyproject.toml Adds pandas/pyarrow deps and mypy ignore overrides for them.
.github/workflows/pr-agent-context-refresh.yml Adds workflow_dispatch support + SHA-aware concurrency inputs.
.github/workflows/pr-agent-context-refresh-dispatcher.yml Adds scheduled dispatcher to trigger refresh runs when needed.
.agent-plan.md Updates milestone tracking notes to reflect schema layer completion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread leadforge/schema/relationships.py Outdated
Comment thread leadforge/core/ids.py
Comment thread tests/schema/test_entities.py Outdated
Comment thread tests/schema/test_features.py Outdated
Comment thread .github/workflows/pr-agent-context-refresh-dispatcher.yml Outdated
…llback

Root cause of the stale refresh comment after Copilot's review on PR #7:

1. The dispatcher workflow only lives on this PR branch; scheduled workflows
   only run from the default branch, so it had never fired.  This resolves
   the moment the PR merges.

2. The dedupe logic in the dispatcher was too broad:

     (r.status === 'completed' && r.updated_at >= recentCompletedSince)

   This treated ANY recently-completed run as valid coverage, including runs
   with conclusion=startup_failure (bot-triggered run failed to start) and
   conclusion=action_required (run was approval-gated and auto-cancelled).
   Both conclusions mean the run was blocked and never produced a refresh
   comment — yet they suppressed the fallback, leaving the PR stale.

   Additionally, runs with status=action_required (actively waiting for
   approval) were erroneously counted as "waiting" coverage.

Fix applied to the dispatcher:
- Extract BLOCKED_CONCLUSIONS set: startup_failure, action_required,
  failure, cancelled, timed_out, skipped — none of these suppress dispatch.
- A completed run only counts as coverage when its conclusion is 'success'
  or 'neutral' AND it completed within the recent-success window.
- status=action_required (approval-gated, will never run) explicitly
  returns false — not treated as active coverage.
- Active statuses (in_progress, queued, waiting, requested) still suppress
  dispatch — those will actually produce a refresh.
- Schedule changed from business-hours-only to '*/15 * * * *' so bot
  reviews arriving outside Mon–Fri 07-23 UTC are also caught.
- Comments in the YAML explain the dedupe contract explicitly.

Verification against the failing scenario (SHA e23aea4):
- run 24729455592 (startup_failure): BLOCKED_CONCLUSIONS → not coverage
  → dispatcher would redispatch ✓
- pull_request_review_comment runs (action_required): BLOCKED_CONCLUSIONS
  → not coverage → dispatcher would redispatch ✓
- A currently in_progress dispatch run: active → suppress ✓
- A recently succeeded dispatch run: success + within window → suppress ✓

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@shaypal5
Copy link
Copy Markdown
Contributor Author

Why the Copilot refresh comment was missing — and what was fixed

Root cause 1 — dispatcher not yet active (structural)

The fallback dispatcher workflow (pr-agent-context-refresh-dispatcher.yml) only existed on this PR branch. GitHub only runs scheduled workflows from the default branch (main). The dispatcher had therefore never fired. This resolves automatically when this PR merges.

Root cause 2 — dedupe bug suppressed the fallback (logic)

Even once merged, the dispatcher had a bug that would have caused it to silently skip PRs in exactly the situation it was designed to cover.

The old dedupe check:

(r.status === 'completed' && r.updated_at >= recentCompletedSince)

This treated any recently-completed run as valid coverage — including:

Run Status Conclusion Meaning
24729455592 completed startup_failure Bot-triggered; workflow could not start
Several others completed action_required Bot-triggered; approval-gated, auto-cancelled

Both conclusions mean the run was blocked before doing any work and produced no refresh comment. Yet the old dedupe matched them and skipped dispatch — suppressing the fallback precisely when it was needed.

The old code also did not handle status === 'action_required' (a run actively waiting for maintainer approval that will never execute on its own).

Fix

BLOCKED_CONCLUSIONS set introduced: startup_failure, action_required, failure, cancelled, timed_out, skipped. Runs with any of these conclusions are transparent to the dispatcher — they do not count as coverage.

A completed run now only suppresses dispatch when:

  • conclusion === 'success' or conclusion === 'neutral' AND
  • updated_at is within the recent-success window

Active runs (in_progress, queued, waiting, requested) still suppress dispatch — those will actually produce a refresh.

Schedule also changed from business-hours-only (*/15 7-23 * * 1-5) to */15 * * * * so bot reviews outside Mon–Fri 07–23 UTC are also caught.

Verification against the failing scenario

Signal Old behaviour New behaviour
Run 24729455592 (startup_failure) ❌ Matched dedupe → skip ✅ BLOCKED_CONCLUSIONS → not coverage → dispatch
action_required-concluded runs ❌ Matched dedupe → skip ✅ BLOCKED_CONCLUSIONS → not coverage → dispatch
Currently in_progress dispatch run ✅ Suppressed ✅ Still suppressed
Recently success-concluded dispatch run ✅ Suppressed ✅ Still suppressed

@github-actions

This comment has been minimized.

- relationships.py: replace set-difference orphan collection with an O(1)
  single-pass pattern (count + bounded sample list) to avoid building a
  full intermediate set; rename _MAX_SAMPLE → _max_sample for ruff N806
- test_entities.py: wrap nullable-boolean iloc result in bool() before
  identity-comparing to True, avoiding pandas BooleanDtype NA-coercion risk
- test_features.py: use direct boolean filter df[df["is_target"]] instead
  of df[df["is_target"] == True] (ruff E712)
- pr-agent-context-refresh-dispatcher.yml: switch from per_page:50 cap to
  github.paginate() so all open PRs are checked, not just the first page

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 21, 2026 16:03
@github-actions

This comment has been minimized.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the Milestone 3 schema layer for Leadforge v1, defining typed table row contracts, ID generation, FK validation, task/feature metadata artifacts, and associated tests—plus updates to PR-agent context refresh workflows.

Changes:

  • Added schema modules for entity rows (DTYPE_MAP + empty_dataframe()), Parquet IO helpers, FK constraints/validation, feature specs + feature dictionary CSV, and task manifest definitions.
  • Added extensive pytest coverage for IDs, entities, FK validation, Parquet round-trips, feature dictionary output, and task manifest serialization.
  • Updated dependencies/mypy config for pandas/pyarrow and introduced workflow_dispatch + a dispatcher workflow for PR-agent context refresh.

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
leadforge/core/ids.py Adds ID prefix registry and make_id() formatting/validation.
leadforge/core/hashing.py Adds # noqa: UP038 to preserve existing isinstance style.
leadforge/narrative/spec.py Adds # noqa: UP038 annotations to existing validation checks.
leadforge/schema/entities.py Introduces 9 typed row dataclasses with DTYPE_MAP, to_dict(), empty_dataframe(), and registries.
leadforge/schema/features.py Defines FeatureSpec and canonical ordered LEAD_SNAPSHOT_FEATURES.
leadforge/schema/dictionaries.py Builds/writes feature_dictionary.csv from feature specs.
leadforge/schema/relationships.py Defines FK constraint graph and validate_fk() with FKViolationError.
leadforge/schema/tables.py Adds Parquet read/write helpers using pyarrow.
leadforge/schema/tasks.py Defines SplitSpec, TaskManifest, and CONVERTED_WITHIN_90_DAYS.
tests/schema/test_ids.py Tests ID formatting and prefix registry coverage.
tests/schema/test_entities.py Tests row to_dict(), empty dataframe schema, and Parquet round-trips.
tests/schema/test_relationships.py Tests FK constraint definitions and validation error behavior.
tests/schema/test_features.py Tests feature spec invariants and feature dictionary CSV output.
tests/schema/test_tasks.py Tests split validation, task manifest invariants, and JSON serialization.
tests/schema/__init__.py Creates schema test package marker.
pyproject.toml Adds pandas/pyarrow deps and mypy overrides.
.github/workflows/pr-agent-context-refresh.yml Adds workflow_dispatch inputs and SHA-aware concurrency/passing inputs.
.github/workflows/pr-agent-context-refresh-dispatcher.yml Adds scheduled dispatcher to trigger refresh when approval-gated.
.agent-plan.md Updates plan/status to reflect Milestone 3 completion and Milestone 4 next steps.
Comments suppressed due to low confidence (2)

.github/workflows/pr-agent-context-refresh.yml:35

  • The workflow uses the inputs.* context in the concurrency.group expression, but this workflow also runs on non-workflow_dispatch events. For those events, inputs is not a defined context and will cause expression evaluation failures. Use github.event.inputs.<name> (guarded by github.event_name == 'workflow_dispatch') or otherwise avoid referencing inputs outside workflow_dispatch-only workflows.
concurrency:
  group: >-
    pr-agent-context-refresh-${{
      (github.event_name == 'workflow_dispatch' &&
       format('{0}-{1}', inputs.pull_request_number, inputs.pull_request_head_sha)) ||
      github.event.pull_request.number ||
      github.event.check_run.pull_requests[0].number ||
      github.event.check_run.head_sha ||
      github.sha

.github/workflows/pr-agent-context-refresh.yml:35

  • concurrency.group indexes github.event.check_run.pull_requests[0] unconditionally. This workflow is triggered by all check_run: completed events (including ones with no associated PRs), so pull_requests[0] can be out of bounds and fail the workflow before the job-level if: guard runs. Wrap the pull_requests[0] access in a conditional that checks the array is non-empty, or fall back to github.event.check_run.head_sha only when no PRs are present.
      (github.event_name == 'workflow_dispatch' &&
       format('{0}-{1}', inputs.pull_request_number, inputs.pull_request_head_sha)) ||
      github.event.pull_request.number ||
      github.event.check_run.pull_requests[0].number ||
      github.event.check_run.head_sha ||
      github.sha

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/pr-agent-context-refresh.yml
Comment thread .github/workflows/pr-agent-context-refresh-dispatcher.yml Outdated
Comment thread leadforge/core/ids.py
- pr-agent-context-refresh.yml: use github.event.inputs.* instead of
  inputs.* in concurrency.group; the inputs context is only defined for
  workflow_dispatch/workflow_call events, github.event.inputs is always
  safe to reference
- pr-agent-context-refresh-dispatcher.yml: remove dead
  r.status === 'action_required' guard — action_required is a conclusion
  value, not a status value, so this branch never fired; the actual case
  (completed+conclusion=action_required) is already handled by
  BLOCKED_CONCLUSIONS; update top-of-file comment accordingly

Resolves COPILOT-1 and COPILOT-2. COPILOT-3 (make_id docstring) is
resolved as already treated — the docstring correctly describes the prefix
argument pattern.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the Milestone 3 “schema layer” by adding typed schema definitions (entities, relationships, tasks, features) plus Parquet I/O helpers and corresponding tests, and updates workflow automation to better refresh PR-agent context.

Changes:

  • Add v1 schema modules: row dataclasses for 9 tables, FK constraint definitions + validation, task manifest + split spec, feature specs + feature dictionary export, and Parquet read/write helpers.
  • Add extensive test coverage for IDs, schema row contracts, Parquet round-trips, FK validation, feature dictionary export, and task manifest serialization.
  • Update GitHub Actions PR-agent context refresh workflows (new dispatcher + workflow_dispatch support + SHA-aware concurrency).

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
leadforge/core/ids.py Adds canonical ID prefix registry and make_id() for deterministic, zero-padded IDs.
leadforge/schema/entities.py Introduces typed row dataclasses for all v1 tables with DTYPE_MAP, to_dict(), and empty_dataframe().
leadforge/schema/relationships.py Adds FK constraint model and validate_fk() raising FKViolationError on orphans.
leadforge/schema/tables.py Adds Parquet read/write helpers using PyArrow.
leadforge/schema/features.py Defines FeatureSpec and the canonical ordered lead-snapshot feature list.
leadforge/schema/dictionaries.py Builds/writes feature_dictionary.csv from feature specs.
leadforge/schema/tasks.py Adds SplitSpec, TaskManifest, and the CONVERTED_WITHIN_90_DAYS task definition.
tests/schema/test_ids.py Tests ID format/validation and prefix registry coverage.
tests/schema/test_entities.py Tests row contracts, empty DataFrame schema, and Parquet round-trips.
tests/schema/test_relationships.py Tests FK constraint registry and FK validation behavior/error messages.
tests/schema/test_features.py Tests feature spec invariants and feature dictionary CSV output.
tests/schema/test_tasks.py Tests split validation, manifest immutability, and JSON-serializable dict output.
pyproject.toml Adds pandas/pyarrow deps and mypy ignore overrides.
.github/workflows/pr-agent-context-refresh.yml Adds workflow_dispatch inputs + SHA-aware concurrency updates for refresh workflow.
.github/workflows/pr-agent-context-refresh-dispatcher.yml Adds scheduled dispatcher workflow to bypass approval-gated bot-triggered runs.
leadforge/narrative/spec.py Adds Ruff UP038 suppressions for list/tuple checks.
leadforge/core/hashing.py Adds Ruff UP038 suppression for list/tuple canonicalization.
.agent-plan.md Updates milestone tracking text from M3 to M4.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/pr-agent-context-refresh-dispatcher.yml
@github-actions

This comment has been minimized.

github.rest.actions.listWorkflowRuns is the repo-level endpoint and does
not accept a workflow_id filter; the correct method is
listWorkflowRunsForWorkflow, which scopes the query to the named workflow
and supports the head_sha parameter used for SHA-aware dedupe.

Without this fix the call throws at runtime, the try/catch swallows the
error, and the dedupe step is silently skipped — causing the dispatcher
to always fire without deduplication.

Resolves COPILOT-1 (round 5). FAIL-1 (startup_failure on bot-triggered
refresh run) is expected approval-gate behaviour; no code change needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

pr-agent-context report:

This run includes an unresolved review comment on PR #7.

For each unresolved review comment, recommend one of: resolve as irrelevant, accept and implement
the recommended solution, open a separate issue and resolve as out-of-scope for this PR, accept and
implement a different solution, or resolve as already treated by the code.

After I reply with my decision per item, implement the accepted actions, resolve the corresponding
PR comments, and push all of these changes in a single commit.

# Copilot Comments

## COPILOT-1
Location: .github/workflows/pr-agent-context-refresh-dispatcher.yml:124
URL: https://github.com/leadforge-dev/leadforge/pull/7#discussion_r3123205024
Root author: copilot-pull-request-reviewer

Comment:
    `actions/github-script` uses Octokit REST methods; `github.rest.actions.listWorkflowRuns` is not a valid method name for listing runs of a specific workflow. This will fail at runtime and prevent the dispatcher from deduping/dispatching. Use the workflow-specific endpoint method (e.g., `github.rest.actions.listWorkflowRunsForWorkflow`) with the same `workflow_id`/`head_sha` parameters instead.

Run metadata:

Tool ref: v4
Tool version: 4.0.19
Trigger: commit pushed
Workflow run: 24775534520 attempt 1
Comment timestamp: 2026-04-22T11:22:15.142662+00:00
PR head commit: 84562fec5b49c63c9b32090cd503a21276af8584

@shaypal5 shaypal5 requested a review from Copilot April 22, 2026 11:25
@shaypal5 shaypal5 merged commit d60338d into main Apr 22, 2026
7 checks passed
@shaypal5 shaypal5 deleted the feat/milestone-3-schema branch April 22, 2026 11:28
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements Milestone 3’s schema layer for leadforge by adding typed entity row contracts, ID generation, FK validation, feature specs + dictionary export, and a task manifest, along with extensive tests and dependency updates.

Changes:

  • Add core schema modules: entities (row dataclasses + empty DataFrames), Parquet IO helpers, FK constraints/validation, feature specs + feature dictionary CSV writer, and task manifest definitions.
  • Add core.ids.make_id() + ID_PREFIXES registry and small lint-related tweaks in existing modules.
  • Add new GitHub Actions workflow dispatcher + workflow_dispatch support, plus a large test suite for the new schema layer.

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
leadforge/core/ids.py Adds canonical ID prefix registry and make_id() zero-padded ID generator.
leadforge/core/hashing.py Minor lint suppression for tuple/list isinstance check.
leadforge/schema/entities.py Introduces row dataclasses for 9 v1 relational tables with DTYPE_MAP, to_dict(), and empty_dataframe().
leadforge/schema/tables.py Adds write_parquet / read_parquet helpers (pyarrow engine).
leadforge/schema/relationships.py Defines FK constraints graph and validate_fk() raising FKViolationError.
leadforge/schema/features.py Defines FeatureSpec and the canonical lead snapshot feature list.
leadforge/schema/dictionaries.py Builds/writes feature_dictionary.csv from feature specs.
leadforge/schema/tasks.py Defines SplitSpec, TaskManifest, and CONVERTED_WITHIN_90_DAYS.
leadforge/narrative/spec.py Adds lint suppression comments for tuple/list isinstance checks.
pyproject.toml Adds pandas and pyarrow as core deps; mypy ignore overrides for both.
tests/schema/test_ids.py Tests for ID generation and prefix registry coverage.
tests/schema/test_entities.py Tests for row to_dict(), empty DataFrame schemas, and Parquet round-trips.
tests/schema/test_relationships.py Tests FK constraints list and FK validation error behavior.
tests/schema/test_features.py Tests feature specs + feature dictionary DataFrame/CSV output.
tests/schema/test_tasks.py Tests split validation and task manifest serialization/frozen-ness.
tests/schema/__init__.py Initializes schema test package.
.github/workflows/pr-agent-context-refresh.yml Adds workflow_dispatch inputs + SHA-aware concurrency and forwards PR/SHA inputs.
.github/workflows/pr-agent-context-refresh-dispatcher.yml Adds scheduled dispatcher workflow to trigger refresh runs for recent PR review activity.
.agent-plan.md Updates plan state to reflect Milestone 3 completion and Milestone 4 focus.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread leadforge/schema/tasks.py
Comment on lines +37 to +38
or math.isnan(value)
or math.isinf(value)
Comment on lines +74 to +76
pull_request_number: ${{ inputs.pull_request_number || '' }}
pull_request_head_sha: ${{ inputs.pull_request_head_sha || '' }}
pull_request_base_sha: ${{ inputs.pull_request_base_sha || '' }}
Comment on lines +3 to +5
Each class represents one row in a Parquet table. Fields map directly to
the column specifications in §16 of the architecture spec. Optional columns
(nullable in the output) use ``... | None`` typing.
shaypal5 added a commit that referenced this pull request May 1, 2026
Review feedback addressed:

- Remove primary_task/label_window_days as explicit kwargs from
  resolve_config() and Generator.from_recipe() — these fields are
  resolved from recipe YAML and override dict only, not casually
  overridable, since the generation pipeline doesn't yet support
  arbitrary task types (Copilot-1, Copilot-3, shaypal5 #1, #2)
- Add label_window_days <= horizon_days validation in
  GenerationConfig.__post_init__ (Copilot-2, shaypal5 #3)
- Add tests for invalid primary_task on GenerationConfig: empty
  string, non-string type (shaypal5 #6, pr-agent-context)
- Add tests for invalid label_window_days on Recipe.from_dict: bool,
  non-positive, float (shaypal5 #7, pr-agent-context)
- Add test for label_window_days > horizon_days rejection
- Fix existing test using horizon_days=30 (now conflicts with default
  label_window_days=90)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
shaypal5 added a commit that referenced this pull request May 1, 2026
* feat: carry primary_task and label_window_days into WorldSpec for dataset card

Add `primary_task` and `label_window_days` fields to `GenerationConfig`
(with defaults preserving current behavior). Propagate through
`Recipe.from_dict()`, `resolve_config()`, and `Generator.from_recipe()`
so recipe YAML can override them. Update `render_dataset_card()` to read
from `world_spec.config` instead of hard-coded string literals.

Closes #6

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: update .agent-plan.md for WorldSpec task fields (PR #36)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review feedback — tighten scope, add validation + tests

Review feedback addressed:

- Remove primary_task/label_window_days as explicit kwargs from
  resolve_config() and Generator.from_recipe() — these fields are
  resolved from recipe YAML and override dict only, not casually
  overridable, since the generation pipeline doesn't yet support
  arbitrary task types (Copilot-1, Copilot-3, shaypal5 #1, #2)
- Add label_window_days <= horizon_days validation in
  GenerationConfig.__post_init__ (Copilot-2, shaypal5 #3)
- Add tests for invalid primary_task on GenerationConfig: empty
  string, non-string type (shaypal5 #6, pr-agent-context)
- Add tests for invalid label_window_days on Recipe.from_dict: bool,
  non-positive, float (shaypal5 #7, pr-agent-context)
- Add test for label_window_days > horizon_days rejection
- Fix existing test using horizon_days=30 (now conflicts with default
  label_window_days=90)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

layer: core core/ primitives (RNG, IDs, models, exceptions) layer: schema schema/ entity/event contracts type: feature New capability

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants