Skip to content

Thread primary_task through bundle writing, validation, and pipeline #37

@shaypal5

Description

@shaypal5

Context

PR #36 added primary_task and label_window_days to GenerationConfig for dataset card rendering. However, the rest of the generation pipeline still hardcodes converted_within_90_days:

  • api/bundle.py — manifest uses CONVERTED_WITHIN_90_DAYS.task_id
  • render/tasks.py — task splits written to tasks/converted_within_90_days/
  • validation/realism.py — reads hardcoded task directory path
  • validation/drift.py — reads hardcoded task directory path
  • schema/entities.pyLeadRow.converted_within_90_days field
  • simulation/engine.py — sets converted_within_90_days=state.converted
  • pipelines/build_v5.py, build_v6.py — hardcoded column rename mappings

What to do

Make the task name and label window configurable across the full pipeline so that config.primary_task and config.label_window_days control the actual generated output, not just the dataset card.

This is a large refactor touching schema, simulation, render, validation, and pipeline layers.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    layer: corecore/ primitives (RNG, IDs, models, exceptions)type: featureNew capability

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions