Skip to content

Add MinTrialsWithLILOInputHashCheck transition criterion#4994

Open
ItsMrLin wants to merge 3 commits intofacebook:mainfrom
ItsMrLin:export-D95284285
Open

Add MinTrialsWithLILOInputHashCheck transition criterion#4994
ItsMrLin wants to merge 3 commits intofacebook:mainfrom
ItsMrLin:export-D95284285

Conversation

@ItsMrLin
Copy link
Contributor

@ItsMrLin ItsMrLin commented Mar 6, 2026

Summary:
Add a hash-aware transition criterion for LILO GS loops. Unlike plain
MinTrials which counts all completed trials from a node,
MinTrialsWithLILOInputHashCheck only counts trials whose LILO input hash
matches the current experiment state. This ensures the GS correctly
transitions from LILO labeling → MBG only when enough fresh labels exist
(labels produced under the current experiment data + LLM messages).

Trials without a LILO input hash (non-LILO trials) are always counted,
preserving backward compatibility.

Changes:

  • Add MinTrialsWithLILOInputHashCheck class to transition_criterion.py
    that delegates hash computation to get_current_lilo_hash from hash_utils
    (replacing a private _compute_current_hash static method)
  • Remove redundant pass-through __init__ — the parent class handles all args
  • Register in JSON encoder/decoder registries for serialization support
  • Add tests verifying fresh/stale counting behavior

Reviewed By: saitcakmak

Differential Revision: D95284285

@meta-cla meta-cla bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Mar 6, 2026
@meta-codesync
Copy link

meta-codesync bot commented Mar 6, 2026

@ItsMrLin has exported this pull request. If you are a Meta employee, you can view the originating Diff in D95284285.

@codecov-commenter
Copy link

codecov-commenter commented Mar 6, 2026

Codecov Report

❌ Patch coverage is 91.71598% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.82%. Comparing base (3e8e4e7) to head (9facfc4).

Files with missing lines Patch % Lines
ax/adapter/torch.py 20.00% 8 Missing ⚠️
ax/generation_strategy/transition_criterion.py 85.00% 3 Missing ⚠️
ax/utils/common/hash_utils.py 92.30% 2 Missing ⚠️
ax/adapter/adapter_utils.py 94.11% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4994      +/-   ##
==========================================
- Coverage   96.84%   96.82%   -0.02%     
==========================================
  Files         601      602       +1     
  Lines       64732    64901     +169     
==========================================
+ Hits        62687    62843     +156     
- Misses       2045     2058      +13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

ItsMrLin added 2 commits March 8, 2026 21:10
Summary:
Add hash-based data freshness tracking for LILO (Language-in-the-Loop)
pairwise preference labels.

When LILOPairwiseMetric produces labels, it now stamps a SHA-256 hash of the
experiment's LILO inputs (metric data for input_metric_names + LLM messages)
onto the trial's _properties. If any of these inputs change (new data arrives,
data is updated, or the user modifies LLM messages), the hash changes,
indicating that existing LILO labels are stale.

Changes:
- Add `LILO_INPUT_HASH` key to `Keys` enum in `constants.py`
- Create `ax/utils/common/hash_utils.py` with `compute_lilo_input_hash`
  (standalone hash function) and `get_current_lilo_hash` (convenience helper
  that looks up the pairwise `DerivedMetric` on an experiment, extracts
  `input_metric_names`, and computes the hash — returns `None` if no pairwise
  metric is registered)
- Stamp hash in `LILOPairwiseMetric._compute_derived_values` after producing labels
- Add tests for hash determinism, sensitivity to data/message changes, stamping,
  and `get_current_lilo_hash` helper

Differential Revision: D95284287
Summary:
When building the RankingDataset for PairwiseGP model fitting, exclude LILO
trial data whose input hash doesn't match the current experiment state. This
ensures PairwiseGP is only fitted on labels that are consistent with the
current metric data and LLM messages.

Changes:
- Add `_get_fresh_pairwise_trial_indices` helper to `adapter_utils.py`:
  uses `get_current_lilo_hash` from `hash_utils` to compute the current hash
  and returns trial indices whose stamped hash matches, or `None` if not a
  LILO experiment (preserving BOPE compatibility)
- Filter pairwise data in `TorchAdapter._convert_experiment_data` before
  calling `prep_pairwise_data`, ensuring stale rows are excluded
- Add tests for hash-based filtering logic

Differential Revision: D95284286
ItsMrLin added a commit to ItsMrLin/Ax that referenced this pull request Mar 9, 2026
Summary:

Add a hash-aware transition criterion for LILO GS loops. Unlike plain
MinTrials which counts all completed trials from a node,
MinTrialsWithLILOInputHashCheck only counts trials whose LILO input hash
matches the current experiment state. This ensures the GS correctly
transitions from LILO labeling → MBG only when enough *fresh* labels exist
(labels produced under the current experiment data + LLM messages).

Trials without a LILO input hash (non-LILO trials) are always counted,
preserving backward compatibility.

Changes:
- Add `MinTrialsWithLILOInputHashCheck` class to `transition_criterion.py`
  that delegates hash computation to `get_current_lilo_hash` from `hash_utils`
  (replacing a private `_compute_current_hash` static method)
- Remove redundant pass-through `__init__` — the parent class handles all args
- Register in JSON encoder/decoder registries for serialization support
- Add tests verifying fresh/stale counting behavior

Reviewed By: saitcakmak

Differential Revision: D95284285
Summary:
Pull Request resolved: facebook#4994

Add a hash-aware transition criterion for LILO GS loops. Unlike plain
MinTrials which counts all completed trials from a node,
MinTrialsWithLILOInputHashCheck only counts trials whose LILO input hash
matches the current experiment state. This ensures the GS correctly
transitions from LILO labeling → MBG only when enough *fresh* labels exist
(labels produced under the current experiment data + LLM messages).

Trials without a LILO input hash (non-LILO trials) are always counted,
preserving backward compatibility.

Changes:
- Add `MinTrialsWithLILOInputHashCheck` class to `transition_criterion.py`
  that delegates hash computation to `get_current_lilo_hash` from `hash_utils`
  (replacing a private `_compute_current_hash` static method)
- Remove redundant pass-through `__init__` — the parent class handles all args
- Register in JSON encoder/decoder registries for serialization support
- Add tests verifying fresh/stale counting behavior

Reviewed By: saitcakmak

Differential Revision: D95284285
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed Do not delete this pull request or issue due to inactivity. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants