[BUG] start_run() modifies tags dictionary #5190

matheusMoreno · 2021-12-21T19:35:49Z

Willingness to contribute

The MLflow Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the MLflow code base?

Yes. I can contribute a fix for this bug independently.
Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
No. I cannot contribute a bug fix at this time.

System information

Have I written custom code (as opposed to using a stock example script provided in MLflow): No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 20.04.3 LTS
MLflow installed from (source or binary): binary (pip)
MLflow version (run mlflow --version): 1.22.0
Python version: 3.9.5
npm version, if running the dev UI: -
Exact command to reproduce: -

Describe the problem

If you pass a dictionary variable to the start_run() function, it may modify the dictionary. On child runs, the mlflow.parentRunId tag is set and updated on the variable, making it persistent for later (non-nested) runs.

Code to reproduce issue

import mlflow

GLOBAL_TAGS = {'custom': 'tag'}

with mlflow.start_run(tags=GLOBAL_TAGS) as run:
    run_id = run.info.run_id
    assert run.data.tags.get('mlflow.parentRunId') is None      # Should be None
    with mlflow.start_run(tags=GLOBAL_TAGS, nested=True) as child_run:
        assert child_run.data.tags.get('mlflow.parentRunId') == run_id

with mlflow.start_run(tags=GLOBAL_TAGS) as run:
    assert run.data.tags.get('mlflow.parentRunId') is None      # Should be None

This code raises the following error on my machine:

Traceback (most recent call last):
  File "/home/matheus/mlflow_bug/bug.py", line 14, in <module>
    assert run.data.tags.get('mlflow.parentRunId') is run_id
AssertionError

Other info / logs

I believe the issue can be fixed by changing the following lines:

mlflow/mlflow/tracking/fluent.py

Lines 287 to 289 in 0fa849a

    
           tags = context_registry.resolve_tags(user_specified_tags) 
        
           active_run_obj = client.create_run(experiment_id=exp_id_for_run, tags=tags)

tags is a parameter of the start_run() function; it should not be overwritten. By changing the variable name to something else, the problem will hopefully be solved. I will investigate further and open a PR if I'm able to fix the issue.

What component(s), interfaces, languages, and integrations does this bug affect?

Components

Interface

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

The text was updated successfully, but these errors were encountered:

matheusMoreno added the bug Something isn't working label Dec 21, 2021

github-actions bot added the area/tracking Tracking service, tracking client APIs, autologging label Dec 21, 2021

matheusMoreno mentioned this issue Dec 21, 2021

Do not modify tags dict on start_run() #5191

Merged

29 tasks

harupy closed this as completed in #5191 Dec 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] start_run() modifies tags dictionary #5190

[BUG] start_run() modifies tags dictionary #5190

matheusMoreno commented Dec 21, 2021

[BUG] start_run() modifies tags dictionary #5190

[BUG] start_run() modifies tags dictionary #5190

Comments

matheusMoreno commented Dec 21, 2021

Willingness to contribute

System information

Describe the problem

Code to reproduce issue

Other info / logs

What component(s), interfaces, languages, and integrations does this bug affect?