Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Window shell-escaping of databricks run command with bash escaping #10811

Merged
merged 3 commits into from Jan 12, 2024
Merged

Replace Window shell-escaping of databricks run command with bash escaping #10811

merged 3 commits into from Jan 12, 2024

Conversation

wolpl
Copy link
Contributor

@wolpl wolpl commented Jan 11, 2024

🛠 DevTools 🛠

Open in GitHub Codespaces

Install mlflow from this PR

pip install git+https://github.com/mlflow/mlflow.git@refs/pull/10811/merge

Checkout with GitHub CLI

gh pr checkout 10811

Related Issues/PRs

#8981 MLflow Project Runs Failing on Azure Databricks with "Invalid backend config JSON" Error

What changes are proposed in this pull request?

Launching an MLflow projects run on databricks from Windows currently results in the run failing with a JSONDecodeError, as mentioned in the Issue linked above.
The reason for the crash is that when constructing the MLflow run command, that is later executed on databricks, a shell escaping is applied to the parameters. For this, the quote() function from mlflow.utils.string_utils is used, which applies a Windows command line escaping on Windows machines:

def quote(s):
    return mslex_quote(s) if os.name == "nt" else shlex.quote(s)

Because the command is then sent to databricks and executed on a Linux machine, the parameters cannot be parsed and the run fails.

Consequently, when assembling the mlflow run command for databricks, the bash escaping should always be used, regardless of the current OS.

How is this PR tested?

  • Existing unit/integration tests
  • New unit/integration tests
  • Manual tests

The mlflow_run_cmd variable in

mlflow_run_cmd = " ".join([quote(elem) for elem in mlflow_run_arr])
as of prior to this change is e.g. set to (when running on Windows):

mlflow run /databricks/mlflow/projects/d0500... -c ^"{\^"_mlflow_local_backend_run_id\^": \^"af4...\^"}^"

Now it is:

mlflow run /databricks/mlflow/projects/d0500.... -c '{"_mlflow_local_backend_run_id": "4d0..."}'

In both commands I omitted the entrypoint and shortened the run ID and project ID. It is clear, that the upper command cannot be parsed by the bash on databricks. It is formatted for a Windows command line,

Does this PR require documentation update?

  • No. You can skip the rest of this section.
  • Yes. I've updated:
    • Examples
    • API references
    • Instructions

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

Fix MLflow projects runs on databricks, that are launched from Windows, crashing with a JSONDecodeError

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/deployments: MLflow Deployments client APIs, server, and third-party Deployments integrations
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

…aping

Signed-off-by: wolpl <18027263+wolpl@users.noreply.github.com>
@github-actions github-actions bot added area/projects MLproject format, project running backends area/windows Issue is unique to windows. integrations/databricks Databricks integrations rn/bug-fix Mention under Bug Fixes in Changelogs. labels Jan 11, 2024
Copy link

github-actions bot commented Jan 11, 2024

Documentation preview for 4964321 will be available here when this CircleCI job completes successfully.

More info

@BenWilson2
Copy link
Member

@wolpl great catch! Triggered CI for the change :)

@harupy
Copy link
Member

harupy commented Jan 12, 2024

@mlflow-automation autoformat

Copy link
Member

@harupy harupy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@harupy harupy merged commit e50495d into mlflow:master Jan 12, 2024
36 checks passed
@wolpl wolpl deleted the fix-databricks-run-from-windows branch January 12, 2024 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/projects MLproject format, project running backends area/windows Issue is unique to windows. integrations/databricks Databricks integrations rn/bug-fix Mention under Bug Fixes in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants