[FR] set_experiment not safe for concurrency #10334

amueller · 2023-11-09T17:35:38Z

Willingness to contribute

Yes. I would be willing to contribute this feature with guidance from the MLflow community.

Proposal Summary

Currently calling set_experiment with a new experiment name in multiple processes in parallel can lead to concurrency issues, in that all the processes will look for the experiment, not find it, try to create it, and all but one will find that it has already been created and error out.
It would be nice to catch the error and try to retrieve the experiment again once it already exists.

Motivation

What is the use case for this feature?

I'm doing hyper-parameter optimization in parallel, and I create the experiment on the workers. I could also ensure the experiment is created prior to calling the workers on the head node, but it seem like this would be easy to do as a feature in MLFlow.

Why is this use case valuable to support for MLflow users in general?

Having multiple workers try to access a non-existing experiment in parallel seems like a common scenario.

Why is this use case valuable to support for your project(s) or organization?

Currently I do a retry in my code. That seems the wrong place to do the retry. I'd rather not have mlflow specific logic in my application.

Why is it currently difficult to achieve this use case?

Not very difficult, but it seems more appropriate to support in mlflow directly (according to my understanding of mlflow scope, which is not very extensive)

Details

No response

What component(s) does this bug affect?

What interface(s) does this bug affect?

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

What language(s) does this bug affect?

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

What integration(s) does this bug affect?

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

The text was updated successfully, but these errors were encountered:

github-actions · 2023-11-17T00:12:37Z

@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.

amueller added the enhancement New feature or request label Nov 9, 2023

github-actions bot added the area/tracking Tracking service, tracking client APIs, autologging label Nov 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FR] set_experiment not safe for concurrency #10334

[FR] set_experiment not safe for concurrency #10334

amueller commented Nov 9, 2023

What is the use case for this feature?

Why is this use case valuable to support for MLflow users in general?

Why is this use case valuable to support for your project(s) or organization?

Why is it currently difficult to achieve this use case?

github-actions bot commented Nov 17, 2023

[FR] set_experiment not safe for concurrency #10334

[FR] set_experiment not safe for concurrency #10334

Comments

amueller commented Nov 9, 2023

Willingness to contribute

Proposal Summary

Motivation

What is the use case for this feature?

Why is this use case valuable to support for MLflow users in general?

Why is this use case valuable to support for your project(s) or organization?

Why is it currently difficult to achieve this use case?

Details

What component(s) does this bug affect?

What interface(s) does this bug affect?

What language(s) does this bug affect?

What integration(s) does this bug affect?

github-actions bot commented Nov 17, 2023