Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add generated run name to create run backend stores if not supplied #6736

Merged
merged 10 commits into from
Sep 11, 2022

Conversation

BenWilson2
Copy link
Member

Signed-off-by: Ben Wilson benjamin.wilson@databricks.com

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

Runs that are started and do not have a defined run_name associated with them will now have a generated name created and set as a tag to the run.
The format of the generated name is: {predicate}-{noun}-{random integer}

How is this patch tested?

Modifications to existing test suites to validate the generated name and the persistence of overridden names supplied by users.

  • I have written tests (not required for typo or doc fix) and confirmed the proposed feature/bug-fix/change works.

Does this PR change the documentation?

  • No. You can skip the rest of this section.
  • Yes. Make sure the changed pages / sections render correctly by following the steps below.
  1. Click the Details link on the Preview docs check.
  2. Find the changed pages / sections and make sure they render correctly.

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

New runs will now have a run_name generated for them if not supplied at run creation time.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/pipelines: Pipelines, Pipeline APIs, Pipeline configs, Pipeline Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
@github-actions github-actions bot added area/tracking Tracking service, tracking client APIs, autologging rn/feature Mention under Features in Changelogs. labels Sep 8, 2022
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Comment on lines 188 to 190
if MLFLOW_RUN_NAME not in [tag.key for tag in tags]:
tags.append(RunTag(MLFLOW_RUN_NAME, _generate_random_name()))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than adding this in rest_store, which is still a client-side abstraction, can we add the run name generation to the mlflow server handler here:

def _create_run():
request_message = _get_request_message(
CreateRun(), schema={"experiment_id": [_assert_string], "start_time": [_assert_intlike]}
)
tags = [RunTag(tag.key, tag.value) for tag in request_message.tags]
run = _get_tracking_store().create_run(
experiment_id=request_message.experiment_id,
user_id=request_message.user_id,
start_time=request_message.start_time,
tags=tags,
)
response_message = CreateRun.Response()
response_message.run.MergeFrom(run.to_proto())
response = Response(mimetype="application/json")
response.set_data(message_to_json(response_message))
return response
?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added (and removed from rest_store)!

Comment on lines 595 to 598
# cli sends tags to the backend store as a dict
elif isinstance(tags, dict) and not tags:
if MLFLOW_RUN_NAME not in tags.keys():
tags = [RunTag(MLFLOW_RUN_NAME, _generate_random_name())]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure? It's not clear to me how line 592 would work if that were the case, unless the dict keys are RunTag entities, in which case we should fix the CLI behavior rather than expanding the type handling for FileStore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% due to trying to address a test failure (the test config in the unit test was incorrect). Fixed.

run.tags = [SqlTag(key=tag.key, value=tag.value) for tag in tags] if tags else []
if MLFLOW_RUN_NAME not in [tag.key for tag in tags]:
run_name_tag = RunTag(MLFLOW_RUN_NAME, _generate_random_name())
if tags:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if tags were None, then line 543 would fail. I think we can get rid of the if / else and always call append(). If there's any possibility of tags being None, we should validate it prior to line 543.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed!

Copy link
Collaborator

@dbczumar dbczumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BenWilson2 Truly spectacular! Just left a few small comments. Can't wait to ship this in once they're addressed.

cc @sunishsheth2009 @jinzhang21

Comment on lines 13 to 15
predicate = random.choice(_GENERATOR_PREDICATES).lower()
noun = random.choice(_GENERATOR_NOUNS).lower()
num = random.randint(0, 10**integer_scale)
Copy link
Collaborator

@dbczumar dbczumar Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like these words. There aren't too too many of them, so we don't risk adding an excessive amount of text content to the MLflow library, and it's pretty easy to extend them later if desired. We can generate ~ 20k unique pairings of predicate / animal and 20 million unique run names, which seems sufficiently distinct :D

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BenWilson2 @dbczumar - I am not able to understand how we are guarantying no name collisions by just relying on random. Is that handled in some other code flow that I have missed or there is something that I am missing here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run name isn't a primary key. Why would a collision-free solution be required here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @BenWilson2 . Yes you are correct, i think i got confused that name is a primary key as duplicate experiment name was not allowed in mlflow. It is cleared now.

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
@harupy
Copy link
Member

harupy commented Sep 9, 2022

How auto-generated names look like on MLflow UI:

image

image

Comment on lines +674 to +675
if MLFLOW_RUN_NAME not in [tag.key for tag in tags]:
tags.append(RunTag(MLFLOW_RUN_NAME, _generate_random_name()))
Copy link
Member

@harupy harupy Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we document MLflow automatically generates a random run name when it's unspecified?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for knowledge purpose. I know that we use all the diff types of stores where we need to set the run_name. Why are we setting it in the server/handler as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep! Added additional information in docs and added a note in pydoc for fluent.py

Copy link
Collaborator

@jinzhang21 jinzhang21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome PR, @BenWilson2 ! Is there a limit on total number of chars for generated run names? It'd make display easier if we know the max length of autogen run names.

@harupy
Copy link
Member

harupy commented Sep 9, 2022

@jinzhang21

Is there a limit on total number of chars for generated run names?

Wrote a script to check the longest name:

def choose_max(strings):
    return max(strings, key=len)


longest_name = "-".join(
    [choose_max(_GENERATOR_PREDICATES), choose_max(_GENERATOR_NOUNS), str(10**3)]
)
print(longest_name)
print(len(longest_name))
knowledgeable-crocodile-1000
28

(assuming we always use the default sep and integer_scale).

@harupy
Copy link
Member

harupy commented Sep 9, 2022

How knowledgeable-crocodile-1000 is displayed on UI:

image

Copy link
Member

@harupy harupy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me once #6736 (comment) is addressed!

Copy link
Collaborator

@dbczumar dbczumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Before merging, can we also display the run name column by default in the UI now? It's currently hidden by default, so users can't see our awesome new run names unless they use the column selector to show the column.

Thanks @BenWilson2 !

@sunishsheth2009
Copy link
Collaborator

sunishsheth2009 commented Sep 9, 2022

LGTM! Before merging, can we also display the run name column by default in the UI now? It's currently hidden by default, so users can't see our awesome new run names unless they use the column selector to show the column.

I think @hubertzub-db is going to make that change on the FE in the new version of experiment management. Is it okay to wait for his change to go in or do we want to make that change in the old UI for the time being as well?

Also i think according to the code here: https://src.dev.databricks.com/mlflow/mlflow/-/blob/mlflow/server/js/src/experiment-tracking/components/ExperimentRunsTableMultiColumnView2.js?L219&subtree=true
It is shown by default unless I am missing something.

Copy link
Collaborator

@sunishsheth2009 sunishsheth2009 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @BenWilson2 :)
Thank you for taking this on. Love it.

Comment on lines +674 to +675
if MLFLOW_RUN_NAME not in [tag.key for tag in tags]:
tags.append(RunTag(MLFLOW_RUN_NAME, _generate_random_name()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for knowledge purpose. I know that we use all the diff types of stores where we need to set the run_name. Why are we setting it in the server/handler as well?

@harupy
Copy link
Member

harupy commented Sep 9, 2022

@dbczumar

LGTM! Before merging, can we also display the run name column by default in the UI now? It's currently hidden by default, so users can't see our awesome new run names unless they use the column selector to show the column.

The run name column should show up after clearing the local storage in the browser :)

@BenWilson2
Copy link
Member Author

Just for knowledge purpose. I know that we use all the diff types of stores where we need to set the run_name. Why are we setting it in the server/handler as well?

For the REST interface :) (FYI, I originally put the change in the rest_store.py client-side logic before being educated by @dbczumar about how setting it in the backend server handler is more robust with fewer places to manage the logic)

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
@jinzhang21
Copy link
Collaborator

How knowledgeable-crocodile-1000 is displayed on UI:

image

Thanks, @harupy . 28 chars seem a bit long. Can we limit it to 20 or even 16? Essentially automatically reject any names longer than the max length and regenerate a new one in the logic. @BenWilson2 WDYT?

@BenWilson2
Copy link
Member Author

How knowledgeable-crocodile-1000 is displayed on UI:
image

Thanks, @harupy . 28 chars seem a bit long. Can we limit it to 20 or even 16? Essentially automatically reject any names longer than the max length and regenerate a new one in the logic. @BenWilson2 WDYT?

@jinzhang21 I'm going to remove the longer noun names (replace them with shorter names) and provide a retry for elements max length of 20. I'll push the changes soon.

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
docs/source/tracking.rst Outdated Show resolved Hide resolved
Copy link
Collaborator

@dbczumar dbczumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with the tiniest of doc nits!

Co-authored-by: Corey Zumar <39497902+dbczumar@users.noreply.github.com>
Signed-off-by: Ben Wilson <39283302+BenWilson2@users.noreply.github.com>
Comment on lines 21 to 30
max_iter = 10
i = 0
while True:
name = _generate_string(sep, integer_scale)
if len(name) <= max_length:
return name
elif i == max_iter:
return name[:max_length]
else:
i += 1
Copy link
Member

@harupy harupy Sep 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
max_iter = 10
i = 0
while True:
name = _generate_string(sep, integer_scale)
if len(name) <= max_length:
return name
elif i == max_iter:
return name[:max_length]
else:
i += 1
for _ in range(10):
name = _generate_string(sep, integer_scale)
if len(name) <= max_length:
return name
return _generate_string(sep, integer_scale)[:max_length]

nit: just for simplifying the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much better :)

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
…iendly-run-names

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
@BenWilson2 BenWilson2 enabled auto-merge (squash) September 11, 2022 02:34
@BenWilson2 BenWilson2 merged commit d0f763d into master Sep 11, 2022
nnethery pushed a commit to nnethery/mlflow that referenced this pull request Feb 1, 2024
…lflow#6736)

* add run name generator

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

* address test failures

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

* fixes and pr feedback

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

* fix tests using tuples as empty list placeholders

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

* try fix for r test failure

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

* add notes in docs about the auto-generation of run_name

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

* add max length logic and test

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

* Update docs/source/tracking.rst

Co-authored-by: Corey Zumar <39497902+dbczumar@users.noreply.github.com>
Signed-off-by: Ben Wilson <39283302+BenWilson2@users.noreply.github.com>

* simplify retry logic

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <39283302+BenWilson2@users.noreply.github.com>
Co-authored-by: Corey Zumar <39497902+dbczumar@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tracking Tracking service, tracking client APIs, autologging rn/feature Mention under Features in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants