-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Train] [Tune] Refactor MLflow #20802
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome!
.buildkite/pipeline.yml
Outdated
@@ -715,7 +715,7 @@ | |||
- bazel test --config=ci $(./scripts/bazel_export_options) --build_tests_only --test_tag_filters=client_unit_tests,-gpu_only --test_env=RAY_CLIENT_MODE=1 python/ray/util/sgd/... | |||
|
|||
- label: ":octopus: Tune/SGD/Modin/Dask tests and examples. Python 3.7" | |||
conditions: ["RAY_CI_TUNE_AFFECTED", "RAY_CI_SGD_AFFECTED"] | |||
conditions: ["RAY_CI_TUNE_AFFECTED", "RAY_CI_TRAIN_AFFECTED"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we removing RAY_CI_SGD_AFFECTED
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it was ever needed in the first place.
Args: | ||
tracking_uri (Optional[str]): The tracking URI for where to manage | ||
experiments and runs. This can either be a local file path or a | ||
remote server. This arg gets passed directly to mlflow | ||
initialization. | ||
registry_uri (Optional[str]): The registry URI that gets passed | ||
directly to mlflow initialization. | ||
experiment_id (Optional[str]): The experiment id of an already | ||
existing experiment. If not | ||
passed in, experiment_name will be used. | ||
experiment_name (Optional[str]): The experiment name to use for this | ||
Train run. | ||
If the experiment with the name already exists with MLflow, | ||
it will be used. If not, a new experiment will be created with | ||
this name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you document the behavior when these are None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate more on what you'd like to see here?
There is information in the description for the arguments on the behavior if None is passed in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so I'm not clear from this doc what the behavior is if tracking_uri
or registry_uri
is None
. Taking another look at the implementation and docs I now see that it's mentioned in logdir
doc - it would be helpful to include this information in each of these parameter docs as well.
I also can't tell from this doc at all what happens if experiment_name
is None
(and the corresponding environment variable is not set). From the error message, it seems like at least experiment_name
or experiment_id
must be set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks for the explanation! Added it to the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great so far
@matthewdeng @krfricke please take another look! |
@matthewdeng can you take another look please? Also I agree with the future TODOs you listed out. Do you want to make Github issues for them? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functionality LGTM!
Failing tests are unrelated, going to merge |
Pulls out Tune's MLflow logging logic to a shared MLflow util.
Adds an MLflow logger callback to Ray Train
Closes #20642
Why are these changes needed?
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.