Skip to content

[tune] Modernize AxSearch for Ax Platform 1.0.0+#60522

Open
ans9868 wants to merge 6 commits intoray-project:masterfrom
ans9868:fix/ax-search-modern-api
Open

[tune] Modernize AxSearch for Ax Platform 1.0.0+#60522
ans9868 wants to merge 6 commits intoray-project:masterfrom
ans9868:fix/ax-search-modern-api

Conversation

@ans9868
Copy link
Copy Markdown

@ans9868 ans9868 commented Jan 27, 2026

Update AxSearch to work with modern Ax versions (1.0.0+) which have:

  • Stricter error checking (AssertionError instead of ValueError)
  • New objectives API replacing deprecated objective_name/minimize params

Changes:

  • Import ObjectiveProperties from ax.service.ax_client
  • Update create_experiment() to use objectives={metric: ObjectiveProperties(...)}
  • Handle AssertionError when checking for existing experiment (Ax 1.0+)

Fixes part of #60512

@ans9868 ans9868 requested a review from a team as a code owner January 27, 2026 02:57
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively modernizes the AxSearch integration to support Ax platform versions 1.0.0 and newer. The changes correctly adopt the new objectives API, replacing deprecated parameters, and properly handle the new AssertionError for checking experiment existence, ensuring compatibility across different Ax versions. The addition of defensive error handling when accessing the experiment post-setup is a good improvement for robustness. Overall, the changes are clear, correct, and well-executed.

@ray-gardener ray-gardener bot added tune Tune-related issues community-contribution Contributed by the community labels Jan 27, 2026
@edoakes
Copy link
Copy Markdown
Collaborator

edoakes commented Feb 2, 2026

@justinvyu PTAL

@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Feb 17, 2026
@ans9868
Copy link
Copy Markdown
Author

ans9868 commented Feb 17, 2026

Note stale.

@ans9868
Copy link
Copy Markdown
Author

ans9868 commented Feb 18, 2026

To ensure stability across the modern Ax API, I have verified these fixes using Ray Tune’s smoke tests and a custom test suite across key stable releases in the Ax 1.x ecosystem:

  • Ax 1.0.0
  • Ax 1.1.2
  • Ax 1.2.1
    The tests confirm that the create_experiment API modernization and updated error handling function consistently across all tested versions. Most importantly, without these changes, Ray Tune currently fails or crashes when used with Ax 1.0+ due to legacy API calls and unhandled exceptions.

@github-actions github-actions bot added unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it. and removed stale The issue is stale. It will be closed within 7 days unless there are further conversation labels Feb 18, 2026
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @matthewdeng. I updated tune-requirements.txt to pin ax-platform==1.2.1 (with a note on the >=1.0.0 compatibility requirement).

I also rebased the branch to fix the missing DCO sign-offs. Ready for review.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the cursor-bot had a good catch. I am updating the test files to match the new API and removing the redundant error handling now. Will push a fix shortly.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet. I think one other thing you'll have to do is to get the compiled dependencies (which you can get from the Buildkite job) and update the requirements_compiled.txt file.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tip. My local build environment is acting up, so I will grab the generated requirements_compiled.txt from the Buildkite artifacts once this run finishes and push that update tonight/tomorrow.

Copy link
Copy Markdown
Author

@ans9868 ans9868 Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ray Requirements Compilation Update

TLDR: If I can't get the requirements_compiled.txt from BuildKite artifacts, can I hack a version using pip-tools on ARM64, or do I need access to an AMD64/x86_64 system due to platform-specific GPU dependencies?

Update on Requirements Compilation

Hey @matthewdeng, thanks for the guidance! I'm still working on getting the dependencies sorted out properly.

Background:
I discovered that Ray uses a monolithic requirements_compiled.txt that includes dependencies from all modular requirements files (base, tune, serve, llm, test, cloud, etc.), rather than separate compiled files.

The core issue:
When BuildKite tried to build with just the updated tune-requirements.txt (ax-platform==1.2.1), it failed with a constraint conflict:

ERROR: Cannot install ax-platform==1.2.1 because these package versions have conflicting dependencies.
The conflict is caused by:
    The user requested ax-platform==1.2.1
    The user requested (constraint) ax-platform==0.3.2

This happened because requirements_compiled.txt still had the old constraint while the source file specified the new version.

Compilation challenges on ARM64:
Ray's official compilation script skips ARM64 platforms (I am currently using a mac mini):

$ ./ci/ci.sh compile_pip_dependencies
+ [[ arm64 == \a\a\r\c\h\6\4 ]]
+ echo 'Skipping for aarch64'
Skipping for aarch64

Current approach (manual hack):
I'm using pip-tools locally to regenerate requirements, but this approach has limitations:

  • Platform differences: ARM64 vs x86_64 dependency resolution may differ
  • Missing GPU packages: Many packages like torch+cu121 aren't available for ARM64
  • Ray's custom logic: Official compilation may have specific handling I'm missing

Looking at dl-gpu-requirements.txt, Ray uses complex platform-specific dependencies:

torch==2.3.0+cu121
tensorflow-macos==2.15.1; sys_platform == 'darwin' and platform_machine == 'arm64'
cupy-cuda12x>=13.4.0; sys_platform != 'darwin'

These won't resolve correctly on my ARM64 Mac.

Backup plan:
If the manual approach fails, I can use NYU's HPC cluster (AMD64) to run Ray's official compilation process:

./ci/ci.sh compile_pip_dependencies

Question: Is the manual pip-tools approach acceptable as a quick fix, or should I prioritize using the official compilation on x86_64 hardware?

Thanks for your patience.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick update: The manual pip-tools approach is creating version conflicts.

I'm going to compile properly on NYU's HPC (AMD64) using ./ci/ci.sh compile_pip_dependencies to avoid ARM64/platform issues. Should have the correct requirements_compiled.txt shortly.

@ans9868 ans9868 force-pushed the fix/ax-search-modern-api branch from 7579a7e to a7707e0 Compare February 21, 2026 01:01
@ans9868
Copy link
Copy Markdown
Author

ans9868 commented Feb 21, 2026

[Update + Seeking Guidance] Python 3.10 resolution-too-deep CI Failure

TLDR: The dependency graph fails only for Python 3.10. I suspect it's botorch. I am searching for solutions. The easiest fix I hope works is just increasing the resolution depth limit.

Intro:
The dependency graph compiles and installs successfully on Python 3.11 and 3.12, and compiled cleanly via bazel run ci/raydepsets:raydepsets -- build --all-configs on HPC. Only Python 3.10 in BuildKite fails with resolution-too-deep. I suspect ax-platform==1.2.1 pulling in botorch==0.16.1 pushes the graph past pip's default resolution depth limit for that version.

Builds:

  • ✅ HPC (bazel run ci/raydepsets:raydepsets -- build --all-configs)
  • ✅ Python 3.11 BuildKite
  • ✅ Python 3.12 BuildKite
  • ❌ Python 3.10 BuildKite — resolution-too-deep

Pending: I am running tests on HPC this weekend to find out if there is a minimum PIP_MAX_ROUNDS value needed for Python 3.10 to resolve the full ML dependency stack, which will give a concrete value for Option 1.


Options

Option 1 — Increase pip resolution depth limit (Recommended)
Ray is a large project whose dependency graph will only grow. Raising the limit addresses the root cause cleanly with a one line change.

# ci/docker/base.ml.Dockerfile or pip.conf
export PIP_MAX_ROUNDS=<value from testing>

Option 2 — Use legacy pip resolver for Python 3.10 only
The legacy resolver handles complex graphs better. Quick targeted fix but uses a deprecated flag.

# .buildkite/pipeline.yml, py3.10 jobs only
env:
  PIP_USE_DEPRECATED: "legacy-resolver"

Option 3 — Use uv resolver for Python 3.10 only
uv (already in requirements_compiled.txt as uv==0.8.9) is what Ray's Bazel build system uses internally and successfully resolved this graph.

# ci/docker/base.ml.Dockerfile
if [[ $(python --version) == *"3.10"* ]]; then
    uv pip install -r python/requirements/ml/tune-requirements.txt
fi

Happy to implement whichever direction works best. Will follow up with the PIP_MAX_ROUNDS test results this weekend.

cc @elliot-barn This is a CI compilation issue on Python 3.10, thought you might have insight given your knowledge of the compilation for ray.
cc @matthewdeng Following up on your suggestion to update requirements_compiled.txt, ran into this blocker in the process.

I would really appreciate some input here!

@ans9868
Copy link
Copy Markdown
Author

ans9868 commented Feb 23, 2026

[Update] Root Cause Found + Fix Ready

Closing the loop on my previous comment.

Root Cause (not resolution depth)

The resolution-too-deep error on Python 3.10 is not a depth limit issue. There is simply no valid solution for Python 3.10 given the current dependency chain.

ax-platform==1.2.1 pulls in botorch==0.16.1, which pulls in gpytorch>=1.14, which pulls in jaxtyping (unpinned). The resolver on Python 3.10 picks jaxtyping>=0.3.8, which declares requires_python>=3.11. This sends pip into an impossible resolution, causing it to backtrack until resolution-too-deep. This only became clear when switching from pip to uv, which surfaces the actual conflict.

ax-platform==1.2.1
  └── botorch==0.16.1
        └── gpytorch>=1.14
              └── jaxtyping>=0.3.8  ← requires Python>=3.11 ❌ on Python 3.10

Fix

Pin jaxtyping<0.3.8 in python/requirements/ml/tune-requirements.txt. ax-platform, botorch, and gpytorch all support Python 3.10, but jaxtyping is the only package in the chain that does not.

ax-platform==1.2.1
jaxtyping<0.3.8   # jaxtyping>=0.3.8 requires Python>=3.11; 0.3.7 supports >=3.10

Verified locally on Python 3.10 with uv pip install --dry-run across the full ML stack. BuildKite confirmation tomorrow.

TODO

  • Pytest smoke tests for AxSearch
  • Push recompiled requirements_compiled.txt with jaxtyping==0.3.7
  • Successful build on buildkite

@ans9868 ans9868 force-pushed the fix/ax-search-modern-api branch from 3ce036d to 74fce8a Compare February 23, 2026 19:36
@ans9868
Copy link
Copy Markdown
Author

ans9868 commented Feb 24, 2026

Quick note: root cause confirmed and fix is verified locally and on buildkite (see previous comment about jaxtyping>=0.3.8). Busy until Wednesday evening but will have the clean commits, recompiled requirements, and confirm passes smoke tests. Thanks for your patience.

@ans9868
Copy link
Copy Markdown
Author

ans9868 commented Feb 25, 2026

Another quick update:

TLDR: There are some CI tune test failures (test_sample, test_tune_restore_warm_start).

The test failures are caused by ax.modelbridge being removed in ax-platform 1.0+. I've updated both test files to use the new ax.adapter.registry / ax.generation_strategy paths with a fallback for older ax versions. I pushed and am now waiting on BuildKite to confirm that the fixes work.

@ans9868 ans9868 force-pushed the fix/ax-search-modern-api branch 2 times, most recently from badf17d to 1292276 Compare February 25, 2026 23:53
@ans9868
Copy link
Copy Markdown
Author

ans9868 commented Feb 26, 2026

TLDR: Code is ready. Two blockers are infrastructure issues, not my changes.

Ready for review. 3 commits, 6 files changed, pre-commit passes locally.

Two things to flag:

1. requirements_compiled.txt conflict

This file is a compiled artifact I generated on HPC. Master moved forward since then so there is a conflict now. Happy to rebase and recompile against latest master if you want. Or you can regenerate it at merge time. Whatever works for you.

2. Base Docker build failing (not from my changes)

The NodeSource Node 14.x apt repo got deprecated. The GPG key is no longer valid. This causes base.build.Dockerfile to fail during the base image rebuild.

I beleive the problem stems from my PR modifying requirements_compiled.txt which invalidates the Wanda Docker cache, forcing a full base image rebuild. That is what brings out the Node 14.x deprication error. PRs that don't touch compiled requirements use the cached image and don't run wanda build jobs.

Error in ci/docker/base.build.Dockerfile:8:

E: The repository 'https://deb.nodesource.com/node_14.x focal InRelease' is not signed.

Previous passing build before deprecation hit: https://buildkite.com/ray-project/microcheck/builds/39247/steps/canvas

Could a maintainer trigger premerge once the base image issue is resolved? Also let me know if you prefer I drop requirements_compiled.txt from this PR entirely to avoid the cache invalidation problem.

cc @elliot-barn @matthewdeng — would really appreciate some input here!

@ans9868
Copy link
Copy Markdown
Author

ans9868 commented Feb 26, 2026

P.S.
The Node 14.x GPG failure is a known NodeSource issue that started Jan 20, 2026: nodesource/distributions#1912

ans9868 added 3 commits March 23, 2026 19:10
ax-platform 1.2.1 uses the modern objectives API introduced in 1.0.0. jaxtyping>=0.3.8 requires Python>=3.11. The transitive chain
ax-platform -> botorch -> gpytorch -> jaxtyping pulls in an
incompatible version on Python 3.10, causing pip to report
resolution-too-deep. Pin jaxtyping<0.3.8 (0.3.7 supports >=3.10)
until Python 3.10 support is dropped.

Recompiled requirements_compiled.txt to reflect these constraints.

Signed-off-by: Adel Nour <ans9868@nyu.edu>
ax-platform 1.0+ replaced the deprecated objective_name/minimize parameters with an objectives dict using ObjectiveProperties. Modernize AxSearch to use the new objectives API while retaining compatibility with older ax-platform versions. Handle both ValueError (ax 0.x) and AssertionError (ax 1.x) when accessing an uninitialized experiment. Also raises unexpected AssertionErrors to avoid masking real Ax failures.

Signed-off-by: Adel Nour <ans9868@nyu.edu>
ax-platform 1.0+ removed ax.modelbridge entirely. Update test imports to use ax.adapter.registry and ax.generation_strategy while leaving a fallback for older ax versions. Handle the GenerationStep parameter rename (model= -> generator=) with nested try/except for full compatibility.

Signed-off-by: Adel Nour <ans9868@nyu.edu>
@ans9868 ans9868 force-pushed the fix/ax-search-modern-api branch from 1292276 to e0e80b1 Compare March 23, 2026 23:11
Signed-off-by: Adel Sahuc <ans9868@torch-login-4.hpc-infra.svc.cluster.local>
@ans9868 ans9868 force-pushed the fix/ax-search-modern-api branch from 583222d to a25c64a Compare March 24, 2026 13:54
Signed-off-by: Adel Nour <ans9868@nyu.edu>
@ans9868 ans9868 force-pushed the fix/ax-search-modern-api branch from a25c64a to 511f046 Compare March 24, 2026 14:26
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Signed-off-by: Adel Nour <ans9868@nyu.edu>
@ans9868
Copy link
Copy Markdown
Author

ans9868 commented Mar 24, 2026

Update: rebased on master, BuildKite green — ready for final review

TLDR: Final review (I promise) — all checks out.

Hey all, quick update.

I saw the Node 14 fix landed (#61533, Ubuntu 20.04 --> 22.04 upgrade) so I rebased onto the latest master and recompiled requirements_compiled.txt on HPC against the new base. BuildKite is passing now with no Node 14 failures.

Just to recap what this PR does for context since it has been a few weeks:

  • Pins ax-platform==1.2.1 and jaxtyping<0.3.8 in tune-requirements.txt to fix a resolution-too-deep pip error on Python 3.10 (root cause: jaxtyping>=0.3.8 requires Python>=3.11, pulled in sequentially through ax-platform --> botorch --> gpytorch --> jaxtyping)
  • Updates AxSearch to use the modern ax-platform 1.0+ objectives API
  • Updates the ax test files (test_sample.py, test_tune_restore_warm_start.py, test_searchers.py) to use the new ax.adapter.registry / ax.generation_strategy paths since ax.modelbridge was removed in 1.0+, and updates create_experiment calls to use the new objectives dict API

Microcheck is green, pre-commit passes, 6 commits, 7 files changed. Would love another look when you get a chance.

One thing is that premerge is still showing "waiting for status to be reported". Not sure if that needs a manual trigger on your end or if it kicks off automatically after review. Happy to do anything on my side if needed.

cc @elliot-barn @matthewdeng

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community tune Tune-related issues unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants