-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate experiment_id in fluent API start_run #5720
Conversation
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
mlflow/utils/validation.py
Outdated
@@ -345,6 +345,19 @@ def _validate_experiment_name(experiment_name): | |||
) | |||
|
|||
|
|||
def _validate_start_run_experiment_id(experiment_id): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already a function _validate_experiment_id
that performs a regex check on the experiment_id
that is used in a different capacity (file store operations _get_experiment(experiment_id)
, _get_experiment_files(experiment_id)
, _get_experiment_tag_path(experiment_id)
) that I didn't think would be worthwhile to combine as this is distinct validation logic for user-provided experiment_ids that can very rapidly throw based on invalid types.
mlflow/utils/validation.py
Outdated
@@ -345,6 +345,19 @@ def _validate_experiment_name(experiment_name): | |||
) | |||
|
|||
|
|||
def _validate_start_run_experiment_id(experiment_id): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _validate_start_run_experiment_id(experiment_id): | |
def _validate_experiment_id(experiment_id): |
Can we rename this to _validate_experiment_id
because this function can be used in other APIs? _validate_start_run_experiment_id
sounds like we should use this function only in mlflow.start_run
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a brief explanation here:
#5720 (comment)
Do you think this logic should be combined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation! How about _validate_experiment_id_type
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 I like it! :)
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Check that a user-provided experiment_id is either a string, int, or None and raise an | ||
exception if it isn't. | ||
""" | ||
if experiment_id is not None and not isinstance(experiment_id, (str, int)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if experiment_id is not None and not isinstance(experiment_id, (str, int)): | |
if not isinstance(experiment_id, (str, int, type(None))): |
Can we simplify this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind this comment. I think experiment_id is not None
is more straightforward than not isinstance(x, type(None))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left one more comment, otherwise LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @BenWilson2 !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you
Signed-off-by: Ben Wilson benjamin.wilson@databricks.com
What changes are proposed in this pull request?
Perform a validation check on fluent API start_run() to raise an exception if the type is not int or str to prevent tracking server retries with unhandled input validation. Fixes #5713
How is this patch tested?
additional unit test with validating exception raising on dict, set, list, and tuple inputs.
Does this PR change the documentation?
ci/circleci: build_doc
check. If it's successful, proceed to thenext step, otherwise fix it.
Details
on the right to open the job page of CircleCI.Artifacts
tab.docs/build/html/index.html
.Release Notes
Is this a user-facing change?
(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingInterface
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportLanguage
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesIntegrations
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notes