Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: MLflow recipe still erroring out because of logging png file twice #11651

Open
5 of 23 tasks
Saugat168 opened this issue Apr 8, 2024 · 2 comments
Open
5 of 23 tasks
Labels
area/artifacts Artifact stores and artifact logging area/examples Example code area/recipes MLflow Recipes, Recipes APIs, Recipes configs, Recipe Templates bug Something isn't working integrations/databricks Databricks integrations

Comments

@Saugat168
Copy link

Saugat168 commented Apr 8, 2024

Issues Policy acknowledgement

  • I have read and agree to submit bug reports in accordance with the issues policy

Where did you encounter this bug?

Local machine

Willingness to contribute

No. I cannot contribute a bug fix at this time.

MLflow version

  • Client: 2.11.3
  • Tracking server: >=2.0.0 (unclear how to find the information from Databricks)

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 22.04.2 LTS (WSL on Windows)
  • Python version: 3.10
  • yarn version, if running the dev UI: N/A

Describe the problem

This is the same issue as listed in this bug (which has been closed): https://github.com/mlflow/mlflow/issues/8047

So, I have cloned the classification example: https://github.com/mlflow/recipes-examples/tree/main/classification

I am using VS Code extension for Databricks ,

No matter if I run the commands in the notebook
projectroot/classification/notebooks/databricks.py

or
mlflow recipes run --profile databricks

I run into the problem in the training step itself.

<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 409 File already exists, cannot overwrite: &apos;/Shared/mlops_experiments/086c91cbe49f40558113010bd2008abd/artifacts/training_confusion_matrix.png&apos;</title>
</head>
<body><h2>HTTP ERROR 409 File already exists, cannot overwrite: &apos;/Shared/mlops_experiments/086c91cbe49f40558113010bd2008abd/artifacts/training_confusion_matrix.png&apos;</h2>
<table>
<tr><th>URI:</th><td>/dbfs/Shared/mlops_experiments/086c91cbe49f40558113010bd2008abd/artifacts/training_confusion_matrix.png</td></tr>
<tr><th>STATUS:</th><td>409</td></tr>
<tr><th>MESSAGE:</th><td>File already exists, cannot overwrite: &apos;/Shared/mlops_experiments/086c91cbe49f40558113010bd2008abd/artifacts/training_confusion_matrix.png&apos;</td></tr>
<tr><th>SERVLET:</th><td>-</td></tr>
</table>

</body>
</html>

In the previous bug reported, it was exactly the same error. Though the fix was apparently made by introducing a prefix to differentiate between train and evaluate, the problem is, within the train itself, confusion_matrix.png is logged twice (like was also reported in the earlier problem.

I tried changing the .venv/lib/python3.10/site-packages/mlflow/recipes/steps/train.py file (line 368) by

mlflow.autolog(disable=True) and that worked for the train step, but then the same problem started happening for the evaluate step

Initial error

traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/recipes/step.py", line 132, in run
    self.step_card = self._run(output_directory=output_directory)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/recipes/steps/train.py", line 488, in _run
    eval_result = mlflow.evaluate(
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/models/evaluation/base.py", line 2102, in evaluate
    evaluate_result = _evaluate(
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/models/evaluation/base.py", line 1252, in _evaluate
    eval_result = evaluator.evaluate(
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/models/evaluation/default_evaluator.py", line 1966, in evaluate
    evaluation_result = self._evaluate(model, is_baseline_model=False)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/models/evaluation/default_evaluator.py", line 1862, in _evaluate
    self._log_artifacts()
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/models/evaluation/default_evaluator.py", line 1674, in _log_artifacts
    self._log_confusion_matrix()
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/models/evaluation/default_evaluator.py", line 1357, in _log_confusion_matrix
    self._log_image_artifact(
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/models/evaluation/default_evaluator.py", line 769, in _log_image_artifact
    mlflow.log_artifact(artifact_file_local_path)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/tracking/fluent.py", line 1057, in log_artifact
    MlflowClient().log_artifact(run_id, local_path, artifact_path)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/tracking/client.py", line 1189, in log_artifact
    self._tracking_client.log_artifact(run_id, local_path, artifact_path)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/tracking/_tracking_service/client.py", line 560, in log_artifact
    artifact_repo.log_artifact(local_path, artifact_path)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/store/artifact/dbfs_artifact_repo.py", line 117, in log_artifact
    self._databricks_api_request(
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/store/artifact/dbfs_artifact_repo.py", line 62, in _databricks_api_request
    return http_request_safe(host_creds=host_creds, endpoint=endpoint, method=method, **kwargs)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/utils/rest_utils.py", line 145, in http_request_safe
    return verify_rest_response(response, endpoint)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/utils/rest_utils.py", line 158, in verify_rest_response
    raise MlflowException(
mlflow.exceptions.MlflowException: API request to endpoint /dbfs/Shared/mlops_experiments/2e3c9e5a38164e9ab83c880264faa2d5/artifacts/training_confusion_matrix.png failed with error code 409 != 200. Response body: '<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 409 File already exists, cannot overwrite: &apos;/Shared/mlops_experiments/2e3c9e5a38164e9ab83c880264faa2d5/artifacts/training_confusion_matrix.png&apos;</title>
</head>
<body><h2>HTTP ERROR 409 File already exists, cannot overwrite: &apos;/Shared/mlops_experiments/2e3c9e5a38164e9ab83c880264faa2d5/artifacts/training_confusion_matrix.png&apos;</h2>
<table>
<tr><th>URI:</th><td>/dbfs/Shared/mlops_experiments/2e3c9e5a38164e9ab83c880264faa2d5/artifacts/training_confusion_matrix.png</td></tr>
<tr><th>STATUS:</th><td>409</td></tr>
<tr><th>MESSAGE:</th><td>File already exists, cannot overwrite: &apos;/Shared/mlops_experiments/2e3c9e5a38164e9ab83c880264faa2d5/artifacts/training_confusion_matrix.png&apos;</td></tr>
<tr><th>SERVLET:</th><td>-</td></tr>
</table>

</body>
</html>

Error after changing changing the .venv/lib/python3.10/site-packages/mlflow/recipes/steps/train.py file (line 368) by

mlflow.autolog(disable=True)

Please note that I also added some extra logging statements for debugging below

Run MLflow Recipe step: evaluate
2024/04/08 14:51:52 INFO mlflow.recipes.step: Running step evaluate...
2024/04/08 14:51:56 INFO mlflow.models.evaluation.base: Evaluating the model with the default evaluator.
2024/04/08 14:51:56 INFO mlflow.models.evaluation.default_evaluator: Computing model predictions.
2024/04/08 14:51:56 INFO mlflow.models.evaluation.default_evaluator: The evaluation dataset is inferred as binary dataset, positive label is 1, negative label is 0.
2024/04/08 14:51:56 INFO mlflow.models.evaluation.default_evaluator: Testing metrics on first row...
2024/04/08 14:51:56 INFO mlflow.models.evaluation.default_evaluator: Debugging...
2024/04/08 14:51:56 INFO mlflow.models.evaluation.default_evaluator: prefix in artifact_file_local_path:val_
2024/04/08 14:51:56 INFO mlflow.models.evaluation.default_evaluator: Debugging...
2024/04/08 14:51:56 INFO mlflow.models.evaluation.default_evaluator: artifact_file_local_path:/tmp/tmpnhxqoytp/val_confusion_matrix.png
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/recipes/step.py", line 132, in run
    self.step_card = self._run(output_directory=output_directory)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/recipes/steps/evaluate.py", line 214, in _run
    eval_result = mlflow.evaluate(
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/models/evaluation/base.py", line 2102, in evaluate
    evaluate_result = _evaluate(
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/models/evaluation/base.py", line 1252, in _evaluate
    eval_result = evaluator.evaluate(
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/models/evaluation/default_evaluator.py", line 1966, in evaluate
    evaluation_result = self._evaluate(model, is_baseline_model=False)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/models/evaluation/default_evaluator.py", line 1862, in _evaluate
    self._log_artifacts()
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/models/evaluation/default_evaluator.py", line 1674, in _log_artifacts
    self._log_confusion_matrix()
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/models/evaluation/default_evaluator.py", line 1357, in _log_confusion_matrix
    self._log_image_artifact(
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/models/evaluation/default_evaluator.py", line 769, in _log_image_artifact
    mlflow.log_artifact(artifact_file_local_path)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/tracking/fluent.py", line 1057, in log_artifact
    MlflowClient().log_artifact(run_id, local_path, artifact_path)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/tracking/client.py", line 1189, in log_artifact
    self._tracking_client.log_artifact(run_id, local_path, artifact_path)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/tracking/_tracking_service/client.py", line 560, in log_artifact
    artifact_repo.log_artifact(local_path, artifact_path)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/store/artifact/dbfs_artifact_repo.py", line 117, in log_artifact
    self._databricks_api_request(
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/store/artifact/dbfs_artifact_repo.py", line 62, in _databricks_api_request
    return http_request_safe(host_creds=host_creds, endpoint=endpoint, method=method, **kwargs)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/utils/rest_utils.py", line 145, in http_request_safe
    return verify_rest_response(response, endpoint)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/utils/rest_utils.py", line 158, in verify_rest_response
    raise MlflowException(
mlflow.exceptions.MlflowException: API request to endpoint /dbfs/Shared/mlops_experiments/********/artifacts/val_confusion_matrix.png failed with error code 409 != 200. Response body: '<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 409 File already exists, cannot overwrite: &apos;/Shared/mlops_experiments/************/artifacts/val_confusion_matrix.png&apos;</title>
</head>

strangely, the same fix of disabling autologging manually didn't work for evaluate.py (.venv/lib/python3.10/site-packages/mlflow/recipes/steps/evaluate.py

Tracking information

System information: Linux #1 SMP Fri Jan 27 02:56:13 UTC 2023
Python version: 3.10.6
MLflow version: 2.11.3
MLflow module location: /home/**/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/__init__.py
Tracking URI: databricks
Registry URI: databricks
MLflow environment variables: 
  MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR: false
  MLFLOW_EXPERIMENT_NAME: /Users/***@***.com/experiments/wine_classification_experiment
  MLFLOW_TRACKING_URI: databricks
MLflow dependencies: 
  Flask: 3.0.3
  Jinja2: 3.1.3
  alembic: 1.13.1
  azure-storage-file-datalake: 12.14.0
  boto3: 1.34.79
  botocore: 1.34.79
  click: 8.1.7
  cloudpickle: 3.0.0
  docker: 7.0.0
  entrypoints: 0.4
  gitpython: 3.1.43
  google-cloud-storage: 2.16.0
  graphene: 3.3
  gunicorn: 21.2.0
  importlib-metadata: 7.1.0
  markdown: 3.6
  matplotlib: 3.8.4
  numpy: 1.26.4
  packaging: 23.2
  pandas: 2.2.1
  protobuf: 4.25.3
  pyarrow: 15.0.2
  pytz: 2024.1
  pyyaml: 6.0.1
  querystring-parser: 1.2.4
  requests: 2.31.0
  scikit-learn: 1.4.1.post1
  scipy: 1.13.0
  sqlalchemy: 2.0.29
  sqlparse: 0.4.4

Code to reproduce issue

git clone https://github.com/mlflow/recipes-examples.git

cd recipes-examples/
python -m venv env
source env/bin/activate
pip install -r requirements.txt

cd classification/

# Configure MLflow to communicate with a Databricks-hosted tracking server
export MLFLOW_TRACKING_URI=databricks
# Specify the workspace hostname and token
export DATABRICKS_HOST="..."
export DATABRICKS_TOKEN="...."

# also changed profiles/databricks.yaml to set tracking and model registry uri
experiment:
  name: "/Users/***@***.com/experiments/wine_classifier_experiment"
  tracking_uri: "databricks"
  artifact_location: "dbfs:/Shared/mlops_experiments"

model_registry:
  registry_uri: "databricks-uc" 
  model_name: "sandbox.tests.red_wine_classifier"


mlflow recipes run --profile databricks

Stack trace

2024/04/08 16:53:58 INFO mlflow.recipes.recipe: Creating MLflow Recipe 'classification' with profile: 'databricks'
2024/04/08 16:53:58 INFO mlflow.utils.databricks_utils: No workspace ID specified; if your Databricks workspaces share the same host URL, you may want to specify the workspace ID (along with the host information in the secret manager) for run lineage tracking. For more details on how to specify this information in the secret manager, please refer to the Databricks MLflow documentation.
2024/04/08 16:53:58 INFO mlflow.utils.databricks_utils: No workspace ID specified; if your Databricks workspaces share the same host URL, you may want to specify the workspace ID (along with the host information in the secret manager) for run lineage tracking. For more details on how to specify this information in the secret manager, please refer to the Databricks MLflow documentation.
2024/04/08 16:53:58 INFO mlflow.utils.databricks_utils: No workspace ID specified; if your Databricks workspaces share the same host URL, you may want to specify the workspace ID (along with the host information in the secret manager) for run lineage tracking. For more details on how to specify this information in the secret manager, please refer to the Databricks MLflow documentation.
Run MLflow Recipe step: ingest
2024/04/08 16:53:59 INFO mlflow.recipes.step: Running step ingest...
Loading dataset CSV using `pandas.read_csv()` with default arguments and assumed index column 0 which may not produce the desired schema. If the schema is not correct, you can adjust it by modifying the `load_file_as_dataframe()` function in `steps/ingest.py`
Loading dataset CSV using `pandas.read_csv()` with default arguments and assumed index column 0 which may not produce the desired schema. If the schema is not correct, you can adjust it by modifying the `load_file_as_dataframe()` function in `steps/ingest.py`
Run MLflow Recipe step: split
2024/04/08 16:54:00 INFO mlflow.recipes.step: Running step split...
/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/numpy/core/fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.
  return bound(*args, **kwds)
/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/recipes/steps/split.py:133: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
  return data_subset.applymap(func)
/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/recipes/steps/split.py:133: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
  return data_subset.applymap(func)
/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/recipes/steps/split.py:133: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
  return data_subset.applymap(func)
/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/recipes/steps/split.py:133: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
  return data_subset.applymap(func)
Run MLflow Recipe step: transform
2024/04/08 16:54:01 INFO mlflow.recipes.step: Running step transform...
Run MLflow Recipe step: train
2024/04/08 16:54:02 INFO mlflow.recipes.step: Running step train...
2024/04/08 16:54:04 INFO mlflow.recipes.steps.train: Training data has less than 5000 rows, skipping rebalancing.
2024/04/08 16:54:13 WARNING mlflow.sklearn: Model was missing function: predict. Not logging python_function flavor!
2024/04/08 16:54:14 WARNING mlflow.models.model: Model logged without a signature. Signatures will be required for upcoming model registry features as they validate model inputs and denote the expected schema of model outputs. Please visit https://www.mlflow.org/docs/2.11.3/models.html#set-signature-on-logged-model for instructions on setting a model signature on your logged model.
2024/04/08 16:54:30 INFO mlflow.models.evaluation.base: Evaluating the model with the default evaluator.
2024/04/08 16:54:30 INFO mlflow.models.evaluation.default_evaluator: Computing model predictions.
2024/04/08 16:54:30 INFO mlflow.models.evaluation.default_evaluator: The evaluation dataset is inferred as binary dataset, positive label is 1, negative label is 0.
2024/04/08 16:54:30 INFO mlflow.models.evaluation.default_evaluator: Testing metrics on first row...
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/recipes/step.py", line 132, in run
    self.step_card = self._run(output_directory=output_directory)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/recipes/steps/train.py", line 486, in _run
    eval_result = mlflow.evaluate(
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/models/evaluation/base.py", line 2102, in evaluate
    evaluate_result = _evaluate(
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/models/evaluation/base.py", line 1252, in _evaluate
    eval_result = evaluator.evaluate(
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/models/evaluation/default_evaluator.py", line 1962, in evaluate
    evaluation_result = self._evaluate(model, is_baseline_model=False)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/models/evaluation/default_evaluator.py", line 1858, in _evaluate
    self._log_artifacts()
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/models/evaluation/default_evaluator.py", line 1670, in _log_artifacts
    self._log_confusion_matrix()
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/models/evaluation/default_evaluator.py", line 1353, in _log_confusion_matrix
    self._log_image_artifact(
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/models/evaluation/default_evaluator.py", line 765, in _log_image_artifact
    mlflow.log_artifact(artifact_file_local_path)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/tracking/fluent.py", line 1057, in log_artifact
    MlflowClient().log_artifact(run_id, local_path, artifact_path)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/tracking/client.py", line 1189, in log_artifact
    self._tracking_client.log_artifact(run_id, local_path, artifact_path)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/tracking/_tracking_service/client.py", line 560, in log_artifact
    artifact_repo.log_artifact(local_path, artifact_path)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/store/artifact/dbfs_artifact_repo.py", line 117, in log_artifact
    self._databricks_api_request(
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/store/artifact/dbfs_artifact_repo.py", line 62, in _databricks_api_request
    return http_request_safe(host_creds=host_creds, endpoint=endpoint, method=method, **kwargs)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/utils/rest_utils.py", line 145, in http_request_safe
    return verify_rest_response(response, endpoint)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/utils/rest_utils.py", line 158, in verify_rest_response
    raise MlflowException(
mlflow.exceptions.MlflowException: API request to endpoint /dbfs/Shared/mlops_experiments/086c91cbe49f40558113010bd2008abd/artifacts/training_confusion_matrix.png failed with error code 409 != 200. Response body: '<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 409 File already exists, cannot overwrite: &apos;/Shared/mlops_experiments/086c91cbe49f40558113010bd2008abd/artifacts/training_confusion_matrix.png&apos;</title>
</head>
<body><h2>HTTP ERROR 409 File already exists, cannot overwrite: &apos;/Shared/mlops_experiments/086c91cbe49f40558113010bd2008abd/artifacts/training_confusion_matrix.png&apos;</h2>
<table>
<tr><th>URI:</th><td>/dbfs/Shared/mlops_experiments/086c91cbe49f40558113010bd2008abd/artifacts/training_confusion_matrix.png</td></tr>
<tr><th>STATUS:</th><td>409</td></tr>
<tr><th>MESSAGE:</th><td>File already exists, cannot overwrite: &apos;/Shared/mlops_experiments/086c91cbe49f40558113010bd2008abd/artifacts/training_confusion_matrix.png&apos;</td></tr>
<tr><th>SERVLET:</th><td>-</td></tr>
</table>

</body>
</html>
'
make: *** [Makefile:40: steps/train/outputs/run_id] Error 1
Traceback (most recent call last):
  File "/home/***/work/mlops-poc/.venv/bin/mlflow", line 8, in <module>
    sys.exit(cli())
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/recipes/cli.py", line 44, in run
    Recipe(profile=profile).run(step)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/recipes/classification/v1/recipe.py", line 266, in run
    return super().run(step=step)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/recipes/recipe.py", line 111, in run
    raise MlflowException(
mlflow.exceptions.MlflowException: Failed to run recipe 'classification':
The following error occurred while running step 'Step:train':
Traceback (most recent call last):
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/recipes/step.py", line 132, in run
    self.step_card = self._run(output_directory=output_directory)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/recipes/steps/train.py", line 486, in _run
    eval_result = mlflow.evaluate(
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/models/evaluation/base.py", line 2102, in evaluate
    evaluate_result = _evaluate(
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/models/evaluation/base.py", line 1252, in _evaluate
    eval_result = evaluator.evaluate(
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/models/evaluation/default_evaluator.py", line 1962, in evaluate
    evaluation_result = self._evaluate(model, is_baseline_model=False)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/models/evaluation/default_evaluator.py", line 1858, in _evaluate
    self._log_artifacts()
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/models/evaluation/default_evaluator.py", line 1670, in _log_artifacts
    self._log_confusion_matrix()
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/models/evaluation/default_evaluator.py", line 1353, in _log_confusion_matrix
    self._log_image_artifact(
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/models/evaluation/default_evaluator.py", line 765, in _log_image_artifact
    mlflow.log_artifact(artifact_file_local_path)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/tracking/fluent.py", line 1057, in log_artifact
    MlflowClient().log_artifact(run_id, local_path, artifact_path)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/tracking/client.py", line 1189, in log_artifact
    self._tracking_client.log_artifact(run_id, local_path, artifact_path)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/tracking/_tracking_service/client.py", line 560, in log_artifact
    artifact_repo.log_artifact(local_path, artifact_path)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/store/artifact/dbfs_artifact_repo.py", line 117, in log_artifact
    self._databricks_api_request(
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/store/artifact/dbfs_artifact_repo.py", line 62, in _databricks_api_request
    return http_request_safe(host_creds=host_creds, endpoint=endpoint, method=method, **kwargs)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/utils/rest_utils.py", line 145, in http_request_safe
    return verify_rest_response(response, endpoint)
  File "/home/***/work/mlops-poc/.venv/lib/python3.10/site-packages/mlflow/utils/rest_utils.py", line 158, in verify_rest_response
    raise MlflowException(
mlflow.exceptions.MlflowException: API request to endpoint /dbfs/Shared/mlops_experiments/086c91cbe49f40558113010bd2008abd/artifacts/training_confusion_matrix.png failed with error code 409 != 200. Response body: '<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 409 File already exists, cannot overwrite: &apos;/Shared/mlops_experiments/086c91cbe49f40558113010bd2008abd/artifacts/training_confusion_matrix.png&apos;</title>
</head>
<body><h2>HTTP ERROR 409 File already exists, cannot overwrite: &apos;/Shared/mlops_experiments/086c91cbe49f40558113010bd2008abd/artifacts/training_confusion_matrix.png&apos;</h2>
<table>
<tr><th>URI:</th><td>/dbfs/Shared/mlops_experiments/086c91cbe49f40558113010bd2008abd/artifacts/training_confusion_matrix.png</td></tr>
<tr><th>STATUS:</th><td>409</td></tr>
<tr><th>MESSAGE:</th><td>File already exists, cannot overwrite: &apos;/Shared/mlops_experiments/086c91cbe49f40558113010bd2008abd/artifacts/training_confusion_matrix.png&apos;</td></tr>
<tr><th>SERVLET:</th><td>-</td></tr>
</table>

</body>
</html>

Other info / logs

Commenting out the upload step gets the Recipe to run, but

def _log_image_artifact(
    self,
    do_plot,
    artifact_name,
):
    from matplotlib import pyplot

    artifact_file_name = f"{artifact_name}.png"
    artifact_file_local_path = self.temp_dir.path(artifact_file_name)

    try:
        pyplot.clf()
        do_plot()
        pyplot.savefig(artifact_file_local_path, bbox_inches="tight")
    finally:
        pyplot.close(pyplot.gcf())

    #mlflow.log_artifact(artifact_file_local_path)
    artifact = ImageEvaluationArtifact(uri=mlflow.get_artifact_uri(artifact_file_name))
    artifact._load(artifact_file_local_path)
    self.artifacts[artifact_name] = artifact

then gets into trouble with registering the model in Unity Catalog of databricks as the tag with . in the value is not allowed

then gets into trouble with registering the model in Unity Catalog of databricks as the tag

mlflow.exceptions.RestException: INVALID_PARAMETER_VALUE: Tag name mlflow.source.type is not valid

then I had to go in .venv/lib/python3.10/site-packages/mlflow/utils/mlflow_tags.py and change the . with _

MLFLOW_SOURCE_TYPE = "mlflow_source_type"  # "mlflow.source.type"
MLFLOW_RECIPE_TEMPLATE_NAME ="mlflow_pipeline_template_name" #  "mlflow.pipeline.template.name"

What component(s) does this bug affect?

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/deployments: MLflow Deployments client APIs, server, and third-party Deployments integrations
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

What language(s) does this bug affect?

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations
@Saugat168 Saugat168 added the bug Something isn't working label Apr 8, 2024
@github-actions github-actions bot added area/artifacts Artifact stores and artifact logging area/examples Example code area/recipes MLflow Recipes, Recipes APIs, Recipes configs, Recipe Templates integrations/databricks Databricks integrations labels Apr 8, 2024
@harupy
Copy link
Member

harupy commented Apr 10, 2024

@Saugat168 Thanks for reporting this! It looks like we need to disable logging artifacts in autolog to avoid collisions.

Copy link

@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/artifacts Artifact stores and artifact logging area/examples Example code area/recipes MLflow Recipes, Recipes APIs, Recipes configs, Recipe Templates bug Something isn't working integrations/databricks Databricks integrations
Projects
None yet
Development

No branches or pull requests

2 participants