Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] start_run() modifies tags dictionary #5190

Closed
2 of 23 tasks
matheusMoreno opened this issue Dec 21, 2021 · 0 comments · Fixed by #5191
Closed
2 of 23 tasks

[BUG] start_run() modifies tags dictionary #5190

matheusMoreno opened this issue Dec 21, 2021 · 0 comments · Fixed by #5191
Labels
area/tracking Tracking service, tracking client APIs, autologging bug Something isn't working

Comments

@matheusMoreno
Copy link
Contributor

Willingness to contribute

The MLflow Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the MLflow code base?

  • Yes. I can contribute a fix for this bug independently.
  • Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
  • No. I cannot contribute a bug fix at this time.

System information

  • Have I written custom code (as opposed to using a stock example script provided in MLflow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 20.04.3 LTS
  • MLflow installed from (source or binary): binary (pip)
  • MLflow version (run mlflow --version): 1.22.0
  • Python version: 3.9.5
  • npm version, if running the dev UI: -
  • Exact command to reproduce: -

Describe the problem

If you pass a dictionary variable to the start_run() function, it may modify the dictionary. On child runs, the mlflow.parentRunId tag is set and updated on the variable, making it persistent for later (non-nested) runs.

Code to reproduce issue

import mlflow

GLOBAL_TAGS = {'custom': 'tag'}

with mlflow.start_run(tags=GLOBAL_TAGS) as run:
    run_id = run.info.run_id
    assert run.data.tags.get('mlflow.parentRunId') is None      # Should be None
    with mlflow.start_run(tags=GLOBAL_TAGS, nested=True) as child_run:
        assert child_run.data.tags.get('mlflow.parentRunId') == run_id

with mlflow.start_run(tags=GLOBAL_TAGS) as run:
    assert run.data.tags.get('mlflow.parentRunId') is None      # Should be None

This code raises the following error on my machine:

Traceback (most recent call last):
  File "/home/matheus/mlflow_bug/bug.py", line 14, in <module>
    assert run.data.tags.get('mlflow.parentRunId') is run_id
AssertionError

Other info / logs

I believe the issue can be fixed by changing the following lines:

tags = context_registry.resolve_tags(user_specified_tags)
active_run_obj = client.create_run(experiment_id=exp_id_for_run, tags=tags)

tags is a parameter of the start_run() function; it should not be overwritten. By changing the variable name to something else, the problem will hopefully be solved. I will investigate further and open a PR if I'm able to fix the issue.

What component(s), interfaces, languages, and integrations does this bug affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations
@matheusMoreno matheusMoreno added the bug Something isn't working label Dec 21, 2021
@github-actions github-actions bot added the area/tracking Tracking service, tracking client APIs, autologging label Dec 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tracking Tracking service, tracking client APIs, autologging bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant