[tune] Modernize AxSearch for Ax Platform 1.0.0+#60522
[tune] Modernize AxSearch for Ax Platform 1.0.0+#60522ans9868 wants to merge 6 commits intoray-project:masterfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request effectively modernizes the AxSearch integration to support Ax platform versions 1.0.0 and newer. The changes correctly adopt the new objectives API, replacing deprecated parameters, and properly handle the new AssertionError for checking experiment existence, ensuring compatibility across different Ax versions. The addition of defensive error handling when accessing the experiment post-setup is a good improvement for robustness. Overall, the changes are clear, correct, and well-executed.
|
@justinvyu PTAL |
|
This pull request has been automatically marked as stale because it has not had You can always ask for help on our discussion forum or Ray's public slack channel. If you'd like to keep this open, just leave any comment, and the stale label will be removed. |
|
Note stale. |
|
To ensure stability across the modern Ax API, I have verified these fixes using Ray Tune’s smoke tests and a custom test suite across key stable releases in the Ax 1.x ecosystem:
|
There was a problem hiding this comment.
Could you update the version of ax-platform in https://github.com/ray-project/ray/blob/master/python/requirements/ml/tune-requirements.txt?
There was a problem hiding this comment.
Thanks @matthewdeng. I updated tune-requirements.txt to pin ax-platform==1.2.1 (with a note on the >=1.0.0 compatibility requirement).
I also rebased the branch to fix the missing DCO sign-offs. Ready for review.
There was a problem hiding this comment.
Ah, the cursor-bot had a good catch. I am updating the test files to match the new API and removing the redundant error handling now. Will push a fix shortly.
There was a problem hiding this comment.
Sweet. I think one other thing you'll have to do is to get the compiled dependencies (which you can get from the Buildkite job) and update the requirements_compiled.txt file.
There was a problem hiding this comment.
Thanks for the tip. My local build environment is acting up, so I will grab the generated requirements_compiled.txt from the Buildkite artifacts once this run finishes and push that update tonight/tomorrow.
There was a problem hiding this comment.
Ray Requirements Compilation Update
TLDR: If I can't get the requirements_compiled.txt from BuildKite artifacts, can I hack a version using pip-tools on ARM64, or do I need access to an AMD64/x86_64 system due to platform-specific GPU dependencies?
Update on Requirements Compilation
Hey @matthewdeng, thanks for the guidance! I'm still working on getting the dependencies sorted out properly.
Background:
I discovered that Ray uses a monolithic requirements_compiled.txt that includes dependencies from all modular requirements files (base, tune, serve, llm, test, cloud, etc.), rather than separate compiled files.
The core issue:
When BuildKite tried to build with just the updated tune-requirements.txt (ax-platform==1.2.1), it failed with a constraint conflict:
ERROR: Cannot install ax-platform==1.2.1 because these package versions have conflicting dependencies.
The conflict is caused by:
The user requested ax-platform==1.2.1
The user requested (constraint) ax-platform==0.3.2
This happened because requirements_compiled.txt still had the old constraint while the source file specified the new version.
Compilation challenges on ARM64:
Ray's official compilation script skips ARM64 platforms (I am currently using a mac mini):
$ ./ci/ci.sh compile_pip_dependencies
+ [[ arm64 == \a\a\r\c\h\6\4 ]]
+ echo 'Skipping for aarch64'
Skipping for aarch64Current approach (manual hack):
I'm using pip-tools locally to regenerate requirements, but this approach has limitations:
- Platform differences: ARM64 vs x86_64 dependency resolution may differ
- Missing GPU packages: Many packages like
torch+cu121aren't available for ARM64 - Ray's custom logic: Official compilation may have specific handling I'm missing
Looking at dl-gpu-requirements.txt, Ray uses complex platform-specific dependencies:
torch==2.3.0+cu121
tensorflow-macos==2.15.1; sys_platform == 'darwin' and platform_machine == 'arm64'
cupy-cuda12x>=13.4.0; sys_platform != 'darwin'
These won't resolve correctly on my ARM64 Mac.
Backup plan:
If the manual approach fails, I can use NYU's HPC cluster (AMD64) to run Ray's official compilation process:
./ci/ci.sh compile_pip_dependenciesQuestion: Is the manual pip-tools approach acceptable as a quick fix, or should I prioritize using the official compilation on x86_64 hardware?
Thanks for your patience.
There was a problem hiding this comment.
Quick update: The manual pip-tools approach is creating version conflicts.
I'm going to compile properly on NYU's HPC (AMD64) using ./ci/ci.sh compile_pip_dependencies to avoid ARM64/platform issues. Should have the correct requirements_compiled.txt shortly.
e44b946 to
49e6bbc
Compare
7579a7e to
a7707e0
Compare
[Update + Seeking Guidance] Python 3.10
|
[Update] Root Cause Found + Fix ReadyClosing the loop on my previous comment. Root Cause (not resolution depth)The
FixPin Verified locally on Python 3.10 with TODO
|
3ce036d to
74fce8a
Compare
|
Quick note: root cause confirmed and fix is verified locally and on buildkite (see previous comment about |
|
Another quick update: TLDR: There are some CI tune test failures (test_sample, test_tune_restore_warm_start). The test failures are caused by ax.modelbridge being removed in ax-platform 1.0+. I've updated both test files to use the new ax.adapter.registry / ax.generation_strategy paths with a fallback for older ax versions. I pushed and am now waiting on BuildKite to confirm that the fixes work. |
badf17d to
1292276
Compare
|
TLDR: Code is ready. Two blockers are infrastructure issues, not my changes. Ready for review. 3 commits, 6 files changed, pre-commit passes locally. Two things to flag:1. This file is a compiled artifact I generated on HPC. Master moved forward since then so there is a conflict now. Happy to rebase and recompile against latest master if you want. Or you can regenerate it at merge time. Whatever works for you. 2. Base Docker build failing (not from my changes) The NodeSource Node 14.x apt repo got deprecated. The GPG key is no longer valid. This causes I beleive the problem stems from my PR modifying Error in Previous passing build before deprecation hit: https://buildkite.com/ray-project/microcheck/builds/39247/steps/canvas Could a maintainer trigger premerge once the base image issue is resolved? Also let me know if you prefer I drop cc @elliot-barn @matthewdeng — would really appreciate some input here! |
|
P.S. |
ax-platform 1.2.1 uses the modern objectives API introduced in 1.0.0. jaxtyping>=0.3.8 requires Python>=3.11. The transitive chain ax-platform -> botorch -> gpytorch -> jaxtyping pulls in an incompatible version on Python 3.10, causing pip to report resolution-too-deep. Pin jaxtyping<0.3.8 (0.3.7 supports >=3.10) until Python 3.10 support is dropped. Recompiled requirements_compiled.txt to reflect these constraints. Signed-off-by: Adel Nour <ans9868@nyu.edu>
ax-platform 1.0+ replaced the deprecated objective_name/minimize parameters with an objectives dict using ObjectiveProperties. Modernize AxSearch to use the new objectives API while retaining compatibility with older ax-platform versions. Handle both ValueError (ax 0.x) and AssertionError (ax 1.x) when accessing an uninitialized experiment. Also raises unexpected AssertionErrors to avoid masking real Ax failures. Signed-off-by: Adel Nour <ans9868@nyu.edu>
ax-platform 1.0+ removed ax.modelbridge entirely. Update test imports to use ax.adapter.registry and ax.generation_strategy while leaving a fallback for older ax versions. Handle the GenerationStep parameter rename (model= -> generator=) with nested try/except for full compatibility. Signed-off-by: Adel Nour <ans9868@nyu.edu>
1292276 to
e0e80b1
Compare
Signed-off-by: Adel Sahuc <ans9868@torch-login-4.hpc-infra.svc.cluster.local>
583222d to
a25c64a
Compare
Signed-off-by: Adel Nour <ans9868@nyu.edu>
a25c64a to
511f046
Compare
Signed-off-by: Adel Nour <ans9868@nyu.edu>
Update: rebased on master, BuildKite green — ready for final reviewTLDR: Final review (I promise) — all checks out. Hey all, quick update. I saw the Node 14 fix landed (#61533, Ubuntu 20.04 --> 22.04 upgrade) so I rebased onto the latest master and recompiled Just to recap what this PR does for context since it has been a few weeks:
Microcheck is green, pre-commit passes, 6 commits, 7 files changed. Would love another look when you get a chance. One thing is that premerge is still showing "waiting for status to be reported". Not sure if that needs a manual trigger on your end or if it kicks off automatically after review. Happy to do anything on my side if needed. |

Update AxSearch to work with modern Ax versions (1.0.0+) which have:
Changes:
Fixes part of #60512