Skip to content

Make label_window_days affect simulation and snapshot behavior #41

@shaypal5

Description

@shaypal5

Context

PR #40 threaded primary_task and label_window_days through naming and metadata (directory names, manifest keys, validation paths). However, label_window_days currently has no effect on the actual generated data:

  • The simulation engine runs for config.horizon_days (default 90) regardless of label_window_days.
  • The LeadRow.converted_within_90_days field name is hardcoded — it doesn't change when the window changes.
  • The snapshot builder uses horizon_days for the observation window, not label_window_days.
  • The conversion label is derived from events within the full simulation horizon, not a configurable window.

What needs to change

For label_window_days to truly work, the conversion label should be derived from events within [0, label_window_days] rather than [0, horizon_days]. This likely requires:

  1. simulation/engine.py — either the engine or a post-processing step needs to compute the label based on label_window_days, not horizon_days. One approach: the simulation still runs for horizon_days (to generate realistic event histories), but the label is set based on whether conversion happened within label_window_days.
  2. schema/entities.py — consider whether LeadRow.converted_within_90_days should be renamed to a generic converted or parameterized. This is a large schema change with wide blast radius.
  3. render/snapshots.py — the snapshot builder may need to use label_window_days for label derivation rather than horizon_days.

Design considerations

  • The simulation should probably still run for horizon_days to produce rich event data. The label window is a separate concept from the simulation horizon.
  • LeadRow field renaming is a large refactor touching entities, engine, snapshots, features, pipeline rename maps, and all tests. A backward-compatible approach (e.g., keeping converted_within_90_days as the internal field name but documenting it as "conversion label") may be pragmatic.
  • This interacts with the snapshot_day parameter already used by the v4/v5/v6 build pipelines.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestlayer: corecore/ primitives (RNG, IDs, models, exceptions)type: featureNew capability

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions