Per-metric relativization for Summary with preference metrics#5218
Closed
ItsMrLin wants to merge 1 commit into
Closed
Per-metric relativization for Summary with preference metrics#5218ItsMrLin wants to merge 1 commit into
ItsMrLin wants to merge 1 commit into
Conversation
Summary: ## Summary Supersedes D97533888. Addresses drfreund's review comment about deduping with `Data.relativize` by placing the per-metric scoping in the data layer rather than adding analysis-specific logic. When an experiment has both preference metrics (e.g., `pairwise_pref_query`) and standard tracking metrics, the Summary should relativize tracking metrics normally while skipping the preference metric (whose binary 0/1 labels have SQ mean near zero, causing "mean_control too small" crash). Previously D99037272 applied a blanket guard that skipped ALL relativization when any objective was a preference metric. This diff replaces that with per-metric scoping: non-preference metrics are relativized and %-formatted, while preference metrics are excluded from relativization and their columns are dropped from the summary table (binary 0/1 labels are not informative in a tabular summary). Labeling-only trial rows (with no tracking metric data) are also dropped. Changes: - `Data.relativize()` and `relativize_dataframe()` in `ax/core/data.py`: add `metric_names` parameter to scope which metrics get relativized. Unscoped metrics pass through with raw values. SEM zeroing for status quo rows is also scoped -- non-relativized metrics retain their original SEM. - `Experiment.to_df()` in `ax/core/experiment.py`: add `metric_names_to_relativize` parameter, threaded to `Data.relativize()`. Percentage formatting also scoped to only relativized metrics. - `Summary.compute()` in `ax/analysis/summary.py`: replace blanket `not has_preference_objective` guard with per-metric scoping. Builds a list of non-preference metric names, passes it as `metric_names_to_relativize`, then drops preference columns and labeling-only rows. Differential Revision: D99149923
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5218 +/- ##
=======================================
Coverage 96.50% 96.50%
=======================================
Files 617 617
Lines 69776 69833 +57
=======================================
+ Hits 67339 67395 +56
- Misses 2437 2438 +1 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
This pull request has been merged in c40dfae. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Summary
Supersedes D97533888. Addresses drfreund's review comment about deduping with
Data.relativizeby placing the per-metric scoping in the data layer rather than adding analysis-specific logic.When an experiment has both preference metrics (e.g.,
pairwise_pref_query) and standard tracking metrics, the Summary should relativize tracking metrics normally while skipping the preference metric (whose binary 0/1 labels have SQ mean near zero, causing "mean_control too small" crash).Previously D99037272 applied a blanket guard that skipped ALL relativization when any objective was a preference metric. This diff replaces that with per-metric scoping: non-preference metrics are relativized and %-formatted, while preference metrics are excluded from relativization and their columns are dropped from the summary table (binary 0/1 labels are not informative in a tabular summary). Labeling-only trial rows (with no tracking metric data) are also dropped.
Changes:
Data.relativize()andrelativize_dataframe()inax/core/data.py: addmetric_namesparameter to scope which metrics get relativized. Unscoped metrics pass through with raw values. SEM zeroing for status quo rows is also scoped -- non-relativized metrics retain their original SEM.Experiment.to_df()inax/core/experiment.py: addmetric_names_to_relativizeparameter, threaded toData.relativize(). Percentage formatting also scoped to only relativized metrics.Summary.compute()inax/analysis/summary.py: replace blanketnot has_preference_objectiveguard with per-metric scoping. Builds a list of non-preference metric names, passes it asmetric_names_to_relativize, then drops preference columns and labeling-only rows.Differential Revision: D99149923