-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] MLflow Project Runs Failing on Azure Databricks with "Invalid backend config JSON" Error #8981
Comments
I've also raised an issue with the Documentations incase it is a Databricks issue rather than a MLFlow issue: MicrosoftDocs/azure-docs#111832 |
@mlflow/mlflow-team Please assign a maintainer and start triaging this issue. |
Hi @DJSaunders1997, could you try passing the content of your cluster-spec.json as a string directly in cli to --backend-config to see if it works? The error message shows this parameter is not correctly loaded. |
Hi thanks for your response. I ran the command in git bash, using the latest spark_version runtime:
and I still get the same error:
Is there a different way I should represent the cluster-spec as a string? Python apiI've also attempted to run projects using the python api
Where the backend_config has been attempted as a:
all of which give the same error on databricks. Do you have any other ideas of what I should try next? Thanks! |
@DJSaunders1997 Could you open a ticket to Azure Databricks support team? They should be able to help you on it :) |
Hi. Is there any progress on this? Any workaround? Thanks. |
The JSONDecodeError occurs, because MLflow uses Windows commandline escaping to assemble a command that is then executed on databricks in a bash. More info in #10811 As a workaround, it is possible to call the import shlex
import mlflow
mlflow.projects.databricks.quote = shlex.quote # HACK to fix the encoding of the mlflow backend configuration
mlflow.run(...) |
Issues Policy acknowledgement
Willingness to contribute
Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
MLflow version
Client:
$ mlflow --version
mlflow, version 2.4.1
System information
Describe the problem
Following Azure Databricks MLFlow Project Documentation
I've followed the official Azure Databricks instructions on how to configure MLFlow to run projects, and copied the example cluster-spec.json: https://learn.microsoft.com/en-us/azure/databricks/mlflow/projects .
cluster-spec.json
The mlflow appears to be working as I can run commands such as
$ mlflow experiments search
and get results from my DataBricks tracking server.Running the command:
successfully creates a job run in DataBricks.
However this job fails to run on Databricks, due to an issue installing MLFlow:
I believe this failure is due to the incompatibility of the Databricks runtime 7.x and the latest version of MLFlow
https://docs.databricks.com/release-notes/runtime/releases.html#mlflow-compatibility-matrix
Bumping up Databricks runtime
From the compatibility Matrix I've ammended my cluster-spec.json to the latest runtime which should be compatible with the latest MLFlow version.
However this run also fails:
Full Standard error trace
The 'Invalid backend config JSON' error would suggest this is in an issue with the cluster spec, however the cluster was created without issue and it's only the job run that has failed.
The same error is also shown when running with the latest ml cluster
13.2.x-cpu-ml-scala2.12
I'm not sure what else to try.
Let me know if there's any more info I can give regarding this issue :)
Tracking information
Code to reproduce issue
Stack trace
Full Standard error trace
Other info / logs
Standard output trace
What component(s) does this bug affect?
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/recipes
: Recipes, Recipe APIs, Recipe configs, Recipe Templatesarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingWhat interface(s) does this bug affect?
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportWhat language(s) does this bug affect?
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesWhat integration(s) does this bug affect?
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrationsThe text was updated successfully, but these errors were encountered: