-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add generated run name to create run backend stores if not supplied #6736
Conversation
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
mlflow/store/tracking/rest_store.py
Outdated
if MLFLOW_RUN_NAME not in [tag.key for tag in tags]: | ||
tags.append(RunTag(MLFLOW_RUN_NAME, _generate_random_name())) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than adding this in rest_store, which is still a client-side abstraction, can we add the run name generation to the mlflow server
handler here:
mlflow/mlflow/server/handlers.py
Lines 666 to 683 in 7219771
def _create_run(): | |
request_message = _get_request_message( | |
CreateRun(), schema={"experiment_id": [_assert_string], "start_time": [_assert_intlike]} | |
) | |
tags = [RunTag(tag.key, tag.value) for tag in request_message.tags] | |
run = _get_tracking_store().create_run( | |
experiment_id=request_message.experiment_id, | |
user_id=request_message.user_id, | |
start_time=request_message.start_time, | |
tags=tags, | |
) | |
response_message = CreateRun.Response() | |
response_message.run.MergeFrom(run.to_proto()) | |
response = Response(mimetype="application/json") | |
response.set_data(message_to_json(response_message)) | |
return response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added (and removed from rest_store)!
mlflow/store/tracking/file_store.py
Outdated
# cli sends tags to the backend store as a dict | ||
elif isinstance(tags, dict) and not tags: | ||
if MLFLOW_RUN_NAME not in tags.keys(): | ||
tags = [RunTag(MLFLOW_RUN_NAME, _generate_random_name())] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure? It's not clear to me how line 592 would work if that were the case, unless the dict keys are RunTag
entities, in which case we should fix the CLI behavior rather than expanding the type handling for FileStore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100% due to trying to address a test failure (the test config in the unit test was incorrect). Fixed.
run.tags = [SqlTag(key=tag.key, value=tag.value) for tag in tags] if tags else [] | ||
if MLFLOW_RUN_NAME not in [tag.key for tag in tags]: | ||
run_name_tag = RunTag(MLFLOW_RUN_NAME, _generate_random_name()) | ||
if tags: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if tags
were None
, then line 543 would fail. I think we can get rid of the if / else and always call append()
. If there's any possibility of tags being None
, we should validate it prior to line 543.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BenWilson2 Truly spectacular! Just left a few small comments. Can't wait to ship this in once they're addressed.
mlflow/utils/name_utils.py
Outdated
predicate = random.choice(_GENERATOR_PREDICATES).lower() | ||
noun = random.choice(_GENERATOR_NOUNS).lower() | ||
num = random.randint(0, 10**integer_scale) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like these words. There aren't too too many of them, so we don't risk adding an excessive amount of text content to the MLflow library, and it's pretty easy to extend them later if desired. We can generate ~ 20k unique pairings of predicate / animal and 20 million unique run names, which seems sufficiently distinct :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BenWilson2 @dbczumar - I am not able to understand how we are guarantying no name collisions by just relying on random. Is that handled in some other code flow that I have missed or there is something that I am missing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Run name isn't a primary key. Why would a collision-free solution be required here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @BenWilson2 . Yes you are correct, i think i got confused that name is a primary key as duplicate experiment name was not allowed in mlflow. It is cleared now.
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
if MLFLOW_RUN_NAME not in [tag.key for tag in tags]: | ||
tags.append(RunTag(MLFLOW_RUN_NAME, _generate_random_name())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we document MLflow automatically generates a random run name when it's unspecified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for knowledge purpose. I know that we use all the diff types of stores where we need to set the run_name. Why are we setting it in the server/handler
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep! Added additional information in docs and added a note in pydoc for fluent.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome PR, @BenWilson2 ! Is there a limit on total number of chars for generated run names? It'd make display easier if we know the max length of autogen run names.
Wrote a script to check the longest name: def choose_max(strings):
return max(strings, key=len)
longest_name = "-".join(
[choose_max(_GENERATOR_PREDICATES), choose_max(_GENERATOR_NOUNS), str(10**3)]
)
print(longest_name)
print(len(longest_name))
(assuming we always use the default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me once #6736 (comment) is addressed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Before merging, can we also display the run name column by default in the UI now? It's currently hidden by default, so users can't see our awesome new run names unless they use the column selector to show the column.
Thanks @BenWilson2 !
I think @hubertzub-db is going to make that change on the FE in the new version of experiment management. Is it okay to wait for his change to go in or do we want to make that change in the old UI for the time being as well? Also i think according to the code here: https://src.dev.databricks.com/mlflow/mlflow/-/blob/mlflow/server/js/src/experiment-tracking/components/ExperimentRunsTableMultiColumnView2.js?L219&subtree=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work @BenWilson2 :)
Thank you for taking this on. Love it.
if MLFLOW_RUN_NAME not in [tag.key for tag in tags]: | ||
tags.append(RunTag(MLFLOW_RUN_NAME, _generate_random_name())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for knowledge purpose. I know that we use all the diff types of stores where we need to set the run_name. Why are we setting it in the server/handler
as well?
The run name column should show up after clearing the local storage in the browser :) |
For the REST interface :) (FYI, I originally put the change in the rest_store.py client-side logic before being educated by @dbczumar about how setting it in the backend server handler is more robust with fewer places to manage the logic) |
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Thanks, @harupy . 28 chars seem a bit long. Can we limit it to 20 or even 16? Essentially automatically reject any names longer than the max length and regenerate a new one in the logic. @BenWilson2 WDYT? |
@jinzhang21 I'm going to remove the longer noun names (replace them with shorter names) and provide a retry for elements max length of 20. I'll push the changes soon. |
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with the tiniest of doc nits!
Co-authored-by: Corey Zumar <39497902+dbczumar@users.noreply.github.com> Signed-off-by: Ben Wilson <39283302+BenWilson2@users.noreply.github.com>
mlflow/utils/name_utils.py
Outdated
max_iter = 10 | ||
i = 0 | ||
while True: | ||
name = _generate_string(sep, integer_scale) | ||
if len(name) <= max_length: | ||
return name | ||
elif i == max_iter: | ||
return name[:max_length] | ||
else: | ||
i += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_iter = 10 | |
i = 0 | |
while True: | |
name = _generate_string(sep, integer_scale) | |
if len(name) <= max_length: | |
return name | |
elif i == max_iter: | |
return name[:max_length] | |
else: | |
i += 1 | |
for _ in range(10): | |
name = _generate_string(sep, integer_scale) | |
if len(name) <= max_length: | |
return name | |
return _generate_string(sep, integer_scale)[:max_length] |
nit: just for simplifying the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
much better :)
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
…iendly-run-names Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
…lflow#6736) * add run name generator Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> * address test failures Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> * fixes and pr feedback Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> * fix tests using tuples as empty list placeholders Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> * try fix for r test failure Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> * add notes in docs about the auto-generation of run_name Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> * add max length logic and test Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> * Update docs/source/tracking.rst Co-authored-by: Corey Zumar <39497902+dbczumar@users.noreply.github.com> Signed-off-by: Ben Wilson <39283302+BenWilson2@users.noreply.github.com> * simplify retry logic Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> Signed-off-by: Ben Wilson <39283302+BenWilson2@users.noreply.github.com> Co-authored-by: Corey Zumar <39497902+dbczumar@users.noreply.github.com>
Signed-off-by: Ben Wilson benjamin.wilson@databricks.com
Related Issues/PRs
#xxx
What changes are proposed in this pull request?
Runs that are started and do not have a defined
run_name
associated with them will now have a generated name created and set as a tag to the run.The format of the generated name is: {predicate}-{noun}-{random integer}
How is this patch tested?
Modifications to existing test suites to validate the generated name and the persistence of overridden names supplied by users.
Does this PR change the documentation?
Details
link on thePreview docs
check.Release Notes
Is this a user-facing change?
New runs will now have a
run_name
generated for them if not supplied at run creation time.What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/pipelines
: Pipelines, Pipeline APIs, Pipeline configs, Pipeline Templatesarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingInterface
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportLanguage
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesIntegrations
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notes