Skip to content

Bug: normalize_over_analysis_period never divides, so per-day normalization is a silent no-op #3279

Description

@BryanBaird

In testing the new dau_per_1000_clients metrics (mozilla/metric-hub#1501), I discovered that the absolute units were not scaling as expected over weekly and overall windows. Values were regularly coming out >1000, suggesting that pre_treatments = ['normalize_over_analysis_period'] option configured here was not actually being applied.

With some Claude testing, there is no sign that the config itself is mis-specified. Rather, there appear to be some issues in Jetstream that force the divisor to always be a hardcoded default 1 instead of the relevant period length.

Possible contributing causes

Claude flagged two separate bugs that each would cause this issue:

  1. In analysis.py, the parameter check is comparing literal string values to enums -- the comparisons should either use .value on both period and AnalysisPeriod, or on neither.
  2. In statistics.py, the analysis_period_length value is being set on the class, but superseded by the defaults for the new instance created in pre_treatment.from_dict.

I have been able to independently validate issue (1), but (2) requires a deeper understanding of the nuances of class implementations here than I currently possess.

Possible fixes

For what it is worth, Claude suggests the following small fixes should clear the bug. I cannot attest to how completely that would be true, and what testing/documentation implications would be.

# Fix 1 — jetstream/analysis.py ~L1237 (compare enum to enum, not string to enum)
- if period.value == AnalysisPeriod.OVERALL:
+ if period == AnalysisPeriod.OVERALL:
      analysis_length_dates = time_limits.analysis_length_dates
- elif period.value == AnalysisPeriod.WEEK:
+ elif period == AnalysisPeriod.WEEK:
      analysis_length_dates = 7

# Fix 2 — jetstream/statistics.py ~L82 (set the value on the instance, not the class)
- pre_treatment.analysis_period_length = analysis_period_length or 1
- pre_treatments.append(pre_treatment.from_dict(pre_treatment_conf.args))
+ pt = pre_treatment.from_dict(pre_treatment_conf.args)
+ pt.analysis_period_length = analysis_period_length or 1
+ pre_treatments.append(pt)

┆Issue is synchronized with this Jira Task

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions