Add generated run name to create run backend stores if not supplied #6736

BenWilson2 · 2022-09-08T14:51:28Z

Signed-off-by: Ben Wilson benjamin.wilson@databricks.com

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

Runs that are started and do not have a defined run_name associated with them will now have a generated name created and set as a tag to the run.
The format of the generated name is: {predicate}-{noun}-{random integer}

How is this patch tested?

Modifications to existing test suites to validate the generated name and the persistence of overridden names supplied by users.

I have written tests (not required for typo or doc fix) and confirmed the proposed feature/bug-fix/change works.

Does this PR change the documentation?

No. You can skip the rest of this section.
Yes. Make sure the changed pages / sections render correctly by following the steps below.

Click the Details link on the Preview docs check.
Find the changed pages / sections and make sure they render correctly.

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

New runs will now have a run_name generated for them if not supplied at run creation time.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

Interface

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

dbczumar · 2022-09-08T17:49:59Z

mlflow/store/tracking/rest_store.py

+        if MLFLOW_RUN_NAME not in [tag.key for tag in tags]:
+            tags.append(RunTag(MLFLOW_RUN_NAME, _generate_random_name()))
+


Rather than adding this in rest_store, which is still a client-side abstraction, can we add the run name generation to the mlflow server handler here:

mlflow/mlflow/server/handlers.py

Lines 666 to 683 in 7219771

def _create_run():

request_message = _get_request_message(

CreateRun(), schema={"experiment_id": [_assert_string], "start_time": [_assert_intlike]}

)

tags = [RunTag(tag.key, tag.value) for tag in request_message.tags]

run = _get_tracking_store().create_run(

experiment_id=request_message.experiment_id,

user_id=request_message.user_id,

start_time=request_message.start_time,

tags=tags,

)

response_message = CreateRun.Response()

response_message.run.MergeFrom(run.to_proto())

response = Response(mimetype="application/json")

response.set_data(message_to_json(response_message))

return response

?

added (and removed from rest_store)!

dbczumar · 2022-09-08T18:38:31Z

mlflow/store/tracking/file_store.py

+        # cli sends tags to the backend store as a dict
+        elif isinstance(tags, dict) and not tags:
+            if MLFLOW_RUN_NAME not in tags.keys():
+                tags = [RunTag(MLFLOW_RUN_NAME, _generate_random_name())]


Are you sure? It's not clear to me how line 592 would work if that were the case, unless the dict keys are RunTag entities, in which case we should fix the CLI behavior rather than expanding the type handling for FileStore.

100% due to trying to address a test failure (the test config in the unit test was incorrect). Fixed.

dbczumar · 2022-09-08T18:39:56Z

mlflow/store/tracking/sqlalchemy_store.py

-            run.tags = [SqlTag(key=tag.key, value=tag.value) for tag in tags] if tags else []
+            if MLFLOW_RUN_NAME not in [tag.key for tag in tags]:
+                run_name_tag = RunTag(MLFLOW_RUN_NAME, _generate_random_name())
+                if tags:


if tags were None, then line 543 would fail. I think we can get rid of the if / else and always call append(). If there's any possibility of tags being None, we should validate it prior to line 543.

dbczumar

@BenWilson2 Truly spectacular! Just left a few small comments. Can't wait to ship this in once they're addressed.

cc @sunishsheth2009 @jinzhang21

dbczumar · 2022-09-08T18:42:01Z

mlflow/utils/name_utils.py

+    predicate = random.choice(_GENERATOR_PREDICATES).lower()
+    noun = random.choice(_GENERATOR_NOUNS).lower()
+    num = random.randint(0, 10**integer_scale)


I like these words. There aren't too too many of them, so we don't risk adding an excessive amount of text content to the MLflow library, and it's pretty easy to extend them later if desired. We can generate ~ 20k unique pairings of predicate / animal and 20 million unique run names, which seems sufficiently distinct :D

@BenWilson2 @dbczumar - I am not able to understand how we are guarantying no name collisions by just relying on random. Is that handled in some other code flow that I have missed or there is something that I am missing here.

Run name isn't a primary key. Why would a collision-free solution be required here?

Thanks @BenWilson2 . Yes you are correct, i think i got confused that name is a primary key as duplicate experiment name was not allowed in mlflow. It is cleared now.

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

harupy · 2022-09-09T01:15:22Z

How auto-generated names look like on MLflow UI:

harupy · 2022-09-09T01:35:51Z

mlflow/server/handlers.py

+    if MLFLOW_RUN_NAME not in [tag.key for tag in tags]:
+        tags.append(RunTag(MLFLOW_RUN_NAME, _generate_random_name()))


Can we document MLflow automatically generates a random run name when it's unspecified?

Just for knowledge purpose. I know that we use all the diff types of stores where we need to set the run_name. Why are we setting it in the server/handler as well?

yep! Added additional information in docs and added a note in pydoc for fluent.py

jinzhang21

Awesome PR, @BenWilson2 ! Is there a limit on total number of chars for generated run names? It'd make display easier if we know the max length of autogen run names.

harupy · 2022-09-09T02:25:15Z

@jinzhang21

Is there a limit on total number of chars for generated run names?

Wrote a script to check the longest name:

def choose_max(strings):
    return max(strings, key=len)


longest_name = "-".join(
    [choose_max(_GENERATOR_PREDICATES), choose_max(_GENERATOR_NOUNS), str(10**3)]
)
print(longest_name)
print(len(longest_name))

knowledgeable-crocodile-1000
28

(assuming we always use the default sep and integer_scale).

harupy · 2022-09-09T02:26:17Z

How knowledgeable-crocodile-1000 is displayed on UI:

harupy

Looks good to me once #6736 (comment) is addressed!

dbczumar

LGTM! Before merging, can we also display the run name column by default in the UI now? It's currently hidden by default, so users can't see our awesome new run names unless they use the column selector to show the column.

Thanks @BenWilson2 !

sunishsheth2009 · 2022-09-09T05:16:57Z

LGTM! Before merging, can we also display the run name column by default in the UI now? It's currently hidden by default, so users can't see our awesome new run names unless they use the column selector to show the column.

I think @hubertzub-db is going to make that change on the FE in the new version of experiment management. Is it okay to wait for his change to go in or do we want to make that change in the old UI for the time being as well?

Also i think according to the code here: https://src.dev.databricks.com/mlflow/mlflow/-/blob/mlflow/server/js/src/experiment-tracking/components/ExperimentRunsTableMultiColumnView2.js?L219&subtree=true
It is shown by default unless I am missing something.

sunishsheth2009

Awesome work @BenWilson2 :)
Thank you for taking this on. Love it.

sunishsheth2009 · 2022-09-09T05:19:50Z

mlflow/server/handlers.py

+    if MLFLOW_RUN_NAME not in [tag.key for tag in tags]:
+        tags.append(RunTag(MLFLOW_RUN_NAME, _generate_random_name()))


Just for knowledge purpose. I know that we use all the diff types of stores where we need to set the run_name. Why are we setting it in the server/handler as well?

harupy · 2022-09-09T09:09:29Z

@dbczumar

LGTM! Before merging, can we also display the run name column by default in the UI now? It's currently hidden by default, so users can't see our awesome new run names unless they use the column selector to show the column.

The run name column should show up after clearing the local storage in the browser :)

BenWilson2 · 2022-09-09T13:56:05Z

Just for knowledge purpose. I know that we use all the diff types of stores where we need to set the run_name. Why are we setting it in the server/handler as well?

For the REST interface :) (FYI, I originally put the change in the rest_store.py client-side logic before being educated by @dbczumar about how setting it in the backend server handler is more robust with fewer places to manage the logic)

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

jinzhang21 · 2022-09-09T16:13:10Z

How knowledgeable-crocodile-1000 is displayed on UI:

Thanks, @harupy . 28 chars seem a bit long. Can we limit it to 20 or even 16? Essentially automatically reject any names longer than the max length and regenerate a new one in the logic. @BenWilson2 WDYT?

BenWilson2 · 2022-09-09T18:17:35Z

How knowledgeable-crocodile-1000 is displayed on UI:

Thanks, @harupy . 28 chars seem a bit long. Can we limit it to 20 or even 16? Essentially automatically reject any names longer than the max length and regenerate a new one in the logic. @BenWilson2 WDYT?

@jinzhang21 I'm going to remove the longer noun names (replace them with shorter names) and provide a retry for elements max length of 20. I'll push the changes soon.

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

docs/source/tracking.rst

dbczumar

LGTM with the tiniest of doc nits!

Co-authored-by: Corey Zumar <39497902+dbczumar@users.noreply.github.com> Signed-off-by: Ben Wilson <39283302+BenWilson2@users.noreply.github.com>

harupy · 2022-09-10T04:47:54Z

mlflow/utils/name_utils.py

+    max_iter = 10
+    i = 0
+    while True:
+        name = _generate_string(sep, integer_scale)
+        if len(name) <= max_length:
+            return name
+        elif i == max_iter:
+            return name[:max_length]
+        else:
+            i += 1


Suggested change

max_iter = 10

i = 0

while True:

name = _generate_string(sep, integer_scale)

if len(name) <= max_length:

return name

elif i == max_iter:

return name[:max_length]

else:

i += 1

for _ in range(10):

name = _generate_string(sep, integer_scale)

if len(name) <= max_length:

return name

return _generate_string(sep, integer_scale)[:max_length]

nit: just for simplifying the code.

much better :)

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

…iendly-run-names Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

…lflow#6736) * add run name generator Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> * address test failures Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> * fixes and pr feedback Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> * fix tests using tuples as empty list placeholders Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> * try fix for r test failure Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> * add notes in docs about the auto-generation of run_name Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> * add max length logic and test Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> * Update docs/source/tracking.rst Co-authored-by: Corey Zumar <39497902+dbczumar@users.noreply.github.com> Signed-off-by: Ben Wilson <39283302+BenWilson2@users.noreply.github.com> * simplify retry logic Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> Signed-off-by: Ben Wilson <39283302+BenWilson2@users.noreply.github.com> Co-authored-by: Corey Zumar <39497902+dbczumar@users.noreply.github.com>

add run name generator

d505c8b

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

github-actions bot added area/tracking Tracking service, tracking client APIs, autologging rn/feature Mention under Features in Changelogs. labels Sep 8, 2022

address test failures

f8d6b72

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

dbczumar reviewed Sep 8, 2022

View reviewed changes

BenWilson2 added 3 commits September 8, 2022 16:18

fixes and pr feedback

55bbedd

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

fix tests using tuples as empty list placeholders

ce60c61

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

try fix for r test failure

e5cfe5c

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

harupy reviewed Sep 9, 2022

View reviewed changes

jinzhang21 reviewed Sep 9, 2022

View reviewed changes

harupy approved these changes Sep 9, 2022

View reviewed changes

dbczumar approved these changes Sep 9, 2022

View reviewed changes

sunishsheth2009 approved these changes Sep 9, 2022

View reviewed changes

add notes in docs about the auto-generation of run_name

50687fd

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

add max length logic and test

bc42617

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

jinzhang21 approved these changes Sep 9, 2022

View reviewed changes

dbczumar reviewed Sep 9, 2022

View reviewed changes

docs/source/tracking.rst Outdated Show resolved Hide resolved

dbczumar approved these changes Sep 9, 2022

View reviewed changes

Update docs/source/tracking.rst

76a8112

Co-authored-by: Corey Zumar <39497902+dbczumar@users.noreply.github.com> Signed-off-by: Ben Wilson <39283302+BenWilson2@users.noreply.github.com>

harupy reviewed Sep 10, 2022

View reviewed changes

BenWilson2 added 2 commits September 10, 2022 21:50

simplify retry logic

c4079c5

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

Merge branch 'friendly-run-names' of github.com:mlflow/mlflow into fr…

ed352ed

…iendly-run-names Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

BenWilson2 enabled auto-merge (squash) September 11, 2022 02:34

BenWilson2 merged commit d0f763d into master Sep 11, 2022

BenWilson2 mentioned this pull request Nov 17, 2023

[FR] Renaming a run has no immediate visual effects #10371

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add generated run name to create run backend stores if not supplied #6736

Add generated run name to create run backend stores if not supplied #6736

BenWilson2 commented Sep 8, 2022

dbczumar Sep 8, 2022

BenWilson2 Sep 8, 2022

dbczumar Sep 8, 2022

BenWilson2 Sep 8, 2022

dbczumar Sep 8, 2022

BenWilson2 Sep 8, 2022

dbczumar left a comment

dbczumar Sep 8, 2022 •

edited

thefifthhead Sep 19, 2022 •

edited

BenWilson2 Sep 19, 2022

thefifthhead Sep 20, 2022

harupy commented Sep 9, 2022 •

edited

harupy Sep 9, 2022 •

edited

sunishsheth2009 Sep 9, 2022

BenWilson2 Sep 9, 2022

jinzhang21 left a comment

harupy commented Sep 9, 2022

harupy commented Sep 9, 2022 •

edited

harupy left a comment

dbczumar left a comment

sunishsheth2009 commented Sep 9, 2022 •

edited

sunishsheth2009 left a comment

sunishsheth2009 Sep 9, 2022

harupy commented Sep 9, 2022 •

edited

BenWilson2 commented Sep 9, 2022

jinzhang21 commented Sep 9, 2022

BenWilson2 commented Sep 9, 2022

dbczumar left a comment

harupy Sep 10, 2022 •

edited

BenWilson2 Sep 11, 2022

		if MLFLOW_RUN_NAME not in [tag.key for tag in tags]:
		tags.append(RunTag(MLFLOW_RUN_NAME, _generate_random_name()))

	def _create_run():
	request_message = _get_request_message(
	CreateRun(), schema={"experiment_id": [_assert_string], "start_time": [_assert_intlike]}
	)

	tags = [RunTag(tag.key, tag.value) for tag in request_message.tags]
	run = _get_tracking_store().create_run(
	experiment_id=request_message.experiment_id,
	user_id=request_message.user_id,
	start_time=request_message.start_time,
	tags=tags,
	)

	response_message = CreateRun.Response()
	response_message.run.MergeFrom(run.to_proto())
	response = Response(mimetype="application/json")
	response.set_data(message_to_json(response_message))
	return response

Add generated run name to create run backend stores if not supplied #6736

Add generated run name to create run backend stores if not supplied #6736

Conversation

BenWilson2 commented Sep 8, 2022

Related Issues/PRs

What changes are proposed in this pull request?

How is this patch tested?

Does this PR change the documentation?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbczumar left a comment

Choose a reason for hiding this comment

dbczumar Sep 8, 2022 • edited

Choose a reason for hiding this comment

thefifthhead Sep 19, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harupy commented Sep 9, 2022 • edited

harupy Sep 9, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jinzhang21 left a comment

Choose a reason for hiding this comment

harupy commented Sep 9, 2022

harupy commented Sep 9, 2022 • edited

harupy left a comment

Choose a reason for hiding this comment

dbczumar left a comment

Choose a reason for hiding this comment

sunishsheth2009 commented Sep 9, 2022 • edited

sunishsheth2009 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harupy commented Sep 9, 2022 • edited

BenWilson2 commented Sep 9, 2022

jinzhang21 commented Sep 9, 2022

BenWilson2 commented Sep 9, 2022

dbczumar left a comment

Choose a reason for hiding this comment

harupy Sep 10, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbczumar Sep 8, 2022 •

edited

thefifthhead Sep 19, 2022 •

edited

harupy commented Sep 9, 2022 •

edited

harupy Sep 9, 2022 •

edited

harupy commented Sep 9, 2022 •

edited

sunishsheth2009 commented Sep 9, 2022 •

edited

harupy commented Sep 9, 2022 •

edited

harupy Sep 10, 2022 •

edited