feat: carry primary_task and label_window_days into WorldSpec#36
feat: carry primary_task and label_window_days into WorldSpec#36
Conversation
…aset card Add `primary_task` and `label_window_days` fields to `GenerationConfig` (with defaults preserving current behavior). Propagate through `Recipe.from_dict()`, `resolve_config()`, and `Generator.from_recipe()` so recipe YAML can override them. Update `render_dataset_card()` to read from `world_spec.config` instead of hard-coded string literals. Closes #6 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
Propagates task metadata (primary_task, label_window_days) through the generation configuration so narrative outputs (dataset card) can reflect resolved task settings instead of hard-coded literals.
Changes:
- Add
primary_taskandlabel_window_daystoGenerationConfigwith validation and defaults preserving current behavior. - Extend
Recipe.from_dict()/Recipe.resolve_config()andGenerator.from_recipe()to parse/resolve/override the new fields. - Update dataset card rendering and add targeted tests for dataset card + config precedence.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
leadforge/core/models.py |
Adds primary_task / label_window_days to GenerationConfig and validates them. |
leadforge/api/recipes.py |
Parses optional label_window_days from recipe dict and propagates both fields through config resolution layers. |
leadforge/api/generator.py |
Exposes primary_task / label_window_days as Generator.from_recipe() kwargs and forwards them to config resolution. |
leadforge/narrative/dataset_card.py |
Renders task name + label window from world_spec.config instead of literals. |
tests/api/test_recipes.py |
Adds tests for precedence/defaulting behavior of the new config fields. |
tests/narrative/test_dataset_card.py |
Adds tests asserting dataset card reflects custom primary_task / label_window_days. |
.agent-plan.md |
Updates internal plan checklist for the PR work. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
shaypal5
left a comment
There was a problem hiding this comment.
PR #36 Review — Senior Dev Perspective
The big problem: this PR creates a configurable lie
You've added primary_task and label_window_days to GenerationConfig and wired them into the dataset card renderer. But that's the only place they're actually used. The rest of the system completely ignores these fields:
1. The bundle still hardcodes the task everywhere (critical)
api/bundle.py:97—task_row_counts={CONVERTED_WITHIN_90_DAYS.task_id: task_row_counts}writes the manifest with the hardcoded constant regardless ofconfig.primary_taskrender/tasks.py:25— task splits are written totasks/converted_within_90_days/using theCONVERTED_WITHIN_90_DAYSconstant as defaultvalidation/realism.py:43,85— validates by readingtasks/converted_within_90_days/train.parquetliterallyvalidation/drift.py:33,37— same hardcoded path
So if someone sets primary_task="churned_within_60_days", the dataset card proudly says "Task: churned_within_60_days" but the actual directory is still tasks/converted_within_90_days/, the manifest still says converted_within_90_days, and validation still looks for that hardcoded path. The card becomes a misleading label on an unchanged box.
2. primary_task on Recipe vs GenerationConfig — asymmetric handling
Recipe already has primary_task: str (line 38), and resolve_config() does resolved["primary_task"] = self.primary_task at layer 3. But the PR doesn't acknowledge this existing field or explain the relationship. Meanwhile, label_window_days gets None-guarded treatment on Recipe (if self.label_window_days is not None) but primary_task always overwrites from the recipe — meaning the package default on GenerationConfig for primary_task is dead code that can never win over any loaded recipe.
3. label_window_days vs horizon_days — no enforcement of any relationship
These two fields are conceptually related (the label window can't exceed the simulation horizon), but there's zero validation tying them together. Someone can set horizon_days=30, label_window_days=90 — simulating 30 days but claiming the label covers 90. The dataset card will happily render "within 90 days" for a 30-day simulation. This is worse than the hardcoded version, which at least was consistent.
4. The dataset card template is still mostly hardcoded
The label definition text still says "A lead is considered converted if a closed_won event is recorded within {N} days..." — only the number is dynamic. If primary_task is "churned_within_60_days", it still says "converted" and "closed_won". The template is task-specific prose with two values punched out. That's not really configurable — it's a string interpolation that produces nonsense for any non-default value.
5. Config hashing side effect
core/hashing.py uses dataclasses.asdict(config) — so adding two new fields to GenerationConfig silently changes the config hash for every existing run, even when using the same defaults. This breaks the determinism invariant: same (recipe, config, seed, version) must produce identical output, but now the hash differs from pre-PR bundles. Not addressed or mentioned.
6. No test for invalid primary_task on GenerationConfig
You validate primary_task must be a non-empty string in __post_init__, but there's no test for GenerationConfig(primary_task="") or GenerationConfig(primary_task=42) raising InvalidConfigError.
7. No test for invalid label_window_days on Recipe.from_dict
You added validation in from_dict for label_window_days (bool check, positive int), but there are no tests exercising those error paths.
Verdict
The issue asks to "carry primary_task and label_window_days into WorldSpec for dataset card rendering" — and technically this PR does exactly that. But it does it in a way that creates semantic inconsistency: the config says one thing, the bundle does another. If the scope is truly just the dataset card, then these fields arguably don't belong on GenerationConfig at all — they should be derived from the TaskManifest / CONVERTED_WITHIN_90_DAYS constant that already exists in schema/tasks.py. Adding them to config suggests they're configurable across the system, which they aren't.
The cleaner approach: have render_dataset_card() read from schema.tasks.CONVERTED_WITHIN_90_DAYS (which already has task_id and label_window_days), or accept a TaskManifest parameter. No new config fields, no precedence plumbing, no pretense of configurability.
If the intent is full configurability, the PR is maybe 20% of the work — it needs to flow through bundle writing, task splits, validation, simulation, entity schema, and features before it's actually useful.
This comment has been minimized.
This comment has been minimized.
Review feedback addressed: - Remove primary_task/label_window_days as explicit kwargs from resolve_config() and Generator.from_recipe() — these fields are resolved from recipe YAML and override dict only, not casually overridable, since the generation pipeline doesn't yet support arbitrary task types (Copilot-1, Copilot-3, shaypal5 #1, #2) - Add label_window_days <= horizon_days validation in GenerationConfig.__post_init__ (Copilot-2, shaypal5 #3) - Add tests for invalid primary_task on GenerationConfig: empty string, non-string type (shaypal5 #6, pr-agent-context) - Add tests for invalid label_window_days on Recipe.from_dict: bool, non-positive, float (shaypal5 #7, pr-agent-context) - Add test for label_window_days > horizon_days rejection - Fix existing test using horizon_days=30 (now conflicts with default label_window_days=90) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
pr-agent-context report: This run includes unresolved review comments on PR #36 in repository https://github.com/leadforge-dev/leadforge
For each unresolved review comment, recommend one of: resolve as irrelevant, accept and implement
the recommended solution, open a separate issue and resolve as out-of-scope for this PR, accept and
implement a different solution, or resolve as already treated by the code.
After I reply with my decision per item, implement the accepted actions, resolve the corresponding
PR comments, and push all of these changes in a single commit.
# Copilot Comments
## COPILOT-1
Location: leadforge/narrative/dataset_card.py:104
URL: https://github.com/leadforge-dev/leadforge/pull/36#discussion_r3172443347
Root author: copilot-pull-request-reviewer
Comment:
`render_dataset_card()` now renders `cfg.primary_task` / `cfg.label_window_days`, but the actual generated bundle still hard-codes the task id/label column/window to `converted_within_90_days`/90 (e.g., bundle writing uses `CONVERTED_WITHIN_90_DAYS` and simulation outputs a `converted_within_90_days` column). This means callers can produce a dataset card that contradicts the exported task files and labels. Either (a) thread these config fields through task export + manifest + label column generation so they truly control the task, or (b) disallow/ignore overrides for now (e.g., validate they match the v1 invariant) so the dataset card can’t drift from the bundle contents.
## COPILOT-2
Location: leadforge/core/models.py:66
URL: https://github.com/leadforge-dev/leadforge/pull/36#discussion_r3172443359
Root author: copilot-pull-request-reviewer
Comment:
`label_window_days` is validated as positive, but there’s no validation tying it to `horizon_days`. Since the simulation only runs for `horizon_days` days, allowing `label_window_days > horizon_days` will make the config (and dataset card) claim a longer label window than the data can support. Consider enforcing `label_window_days <= horizon_days` (or `==` if that’s the current invariant) in `__post_init__`.
## COPILOT-3
Location: leadforge/api/generator.py
URL: https://github.com/leadforge-dev/leadforge/pull/36#discussion_r3172443369
Status: outdated
Root author: copilot-pull-request-reviewer
Comment:
`Generator.from_recipe()` exposes `primary_task` and `label_window_days` as overrides, but (as of now) the generation pipeline still exports `tasks/converted_within_90_days/` and a `converted_within_90_days` label column. This makes these kwargs look like they change the generated task when they currently only affect metadata (dataset card/config). Consider either removing these kwargs until task export supports them, or documenting very explicitly that they don’t affect task generation yet.Run metadata: |
Summary
primary_task: strandlabel_window_days: intfields toGenerationConfig(defaults preserve current behavior:"converted_within_90_days"/90)Recipe.from_dict(),resolve_config()(all 4 precedence layers), andGenerator.from_recipe()render_dataset_card()to read fromworld_spec.configinstead of hard-coded string literalsCloses #6
Test plan
primary_tasknamelabel_window_daysvalueprimary_taskdefaults from recipeprimary_taskvia explicit kwarglabel_window_daysdefaults (90)label_window_daysfrom recipe YAMLlabel_window_daysvia explicit kwargruff check .clean🤖 Generated with Claude Code