Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for windows traversal attack #10647

Merged
merged 4 commits into from
Dec 8, 2023

Conversation

BenWilson2
Copy link
Member

@BenWilson2 BenWilson2 commented Dec 8, 2023

🛠 DevTools 🛠

Open in GitHub Codespaces

Install mlflow from this PR

pip install git+https://github.com/mlflow/mlflow.git@refs/pull/10647/merge

Checkout with GitHub CLI

gh pr checkout 10647

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

Changed from posixpath.basename to os.path.basename to protect in Windows

How is this PR tested?

  • Existing unit/integration tests
  • New unit/integration tests
  • Manual tests

Does this PR require documentation update?

  • No. You can skip the rest of this section.
  • Yes. I've updated:
    • Examples
    • API references
    • Instructions

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

Provide a fix for a potential windows traversal attack when using the mlflow.data API to load files in an Windows environment.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/deployments: MLflow Deployments client APIs, server, and third-party Deployments integrations
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Copy link

github-actions bot commented Dec 8, 2023

Documentation preview for 260ec7f will be available here when this CircleCI job completes successfully.

More info

@github-actions github-actions bot added area/server-infra MLflow Tracking server backend area/windows Issue is unique to windows. rn/bug-fix Mention under Bug Fixes in Changelogs. labels Dec 8, 2023
@@ -70,8 +69,8 @@ def load(self, dst_path=None) -> str:
f"Invalid filename in Content-Disposition header: {basename}. "
"It must be a file name, not a path."
)
elif path is not None and len(posixpath.basename(path)) > 0:
basename = posixpath.basename(path)
elif path is not None and len(os.path.basename(path)) > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does len(os.path.basename(path)) > 0 check?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this check basename is not an empty string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to not mess with the original implementation for this fix.

It's definitely an odd way of handling assignment of the basename. I'll adjust the logic to simplify a bit (without touching too much) just to make this more readable and prevent repeated unnecessary checks on that path.

@@ -70,8 +69,8 @@ def load(self, dst_path=None) -> str:
f"Invalid filename in Content-Disposition header: {basename}. "
"It must be a file name, not a path."
)
elif path is not None and len(posixpath.basename(path)) > 0:
basename = posixpath.basename(path)
elif path is not None and len(os.path.basename(path)) > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ai Can you guess what len(os.path.basename(path)) > 0 is for?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harupy

The len(os.path.basename(path)) > 0 code is checking if the base name of the file path, i.e., the file or directory name at the end of the path, is not empty. If the length of this base name is greater than 0, that means there is a valid file or directory name present.

Click here to see the usage data
{
  "prompt_tokens": 72,
  "completion_tokens": 68,
  "total_tokens": 140,
  "estimated_cost_in_usd": 0.00624
}

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Copy link
Member

@harupy harupy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's ask the reporter to validate the fix.

Co-authored-by: Harutaka Kawamura <hkawamura0130@gmail.com>
Signed-off-by: Ben Wilson <39283302+BenWilson2@users.noreply.github.com>
Co-authored-by: Harutaka Kawamura <hkawamura0130@gmail.com>
Signed-off-by: Ben Wilson <39283302+BenWilson2@users.noreply.github.com>
@BenWilson2
Copy link
Member Author

Received confirmation from researcher that this PR fixes the security issue. Merging.

@BenWilson2 BenWilson2 merged commit 1c6309f into mlflow:master Dec 8, 2023
36 checks passed
@BenWilson2 BenWilson2 deleted the fix-path-traversal-windows branch December 8, 2023 15:15
harupy added a commit that referenced this pull request Dec 14, 2023
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <39283302+BenWilson2@users.noreply.github.com>
Co-authored-by: Harutaka Kawamura <hkawamura0130@gmail.com>
harupy added a commit that referenced this pull request Dec 14, 2023
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <39283302+BenWilson2@users.noreply.github.com>
Co-authored-by: Harutaka Kawamura <hkawamura0130@gmail.com>
harupy added a commit that referenced this pull request Dec 14, 2023
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <39283302+BenWilson2@users.noreply.github.com>
Co-authored-by: Harutaka Kawamura <hkawamura0130@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/server-infra MLflow Tracking server backend area/windows Issue is unique to windows. rn/bug-fix Mention under Bug Fixes in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants