Disable '..' in query string for artifact URI #10653

B-Step62 · 2023-12-08T06:44:02Z

🛠 DevTools 🛠

Install mlflow from this PR

pip install git+https://github.com/mlflow/mlflow.git@refs/pull/10653/merge

Checkout with GitHub CLI

gh pr checkout 10653

What changes are proposed in this pull request?

Solving path traversal vulnerability.

Problem

When users create experiment with base artifact URI like http://foo/bar, MLflow appends /{{run_id}}/artifact sub path to it when saving/reading each run's artifact, such as `http://foo/bar/{{run_id}}/artifact". Every GET request for artifacts are validated so that the requested path are under this path, effectively prevents attackers to read any files outside the that run directory.

However, there is one hack to bypass this, which is query string. When MLflow appends /{{run_id}}/artifact, it will be inserted before query string of the specified artifact root. For example, if the artifact root is http://foo/bar?a=a, the run's artifact URI will be ``http://foo/bar{{run_id}}/artifact?a=a`.

This allows path traversal by adding malformed query string like "../../../../etc", which results in run's artifact location to be http://foo/bar{{run_id}}/artifact?../../../../etc, which is then resolved to /etc as a local path.

Solution

This PR resolves this by explicitly validating query string passed as a part of artifact URI. It simply check if the query string contains ".." or not (with decoding).

There were some alternatives considered:

Ignore query string entirely when appending {{run_id}}/artifacts => I'm afraid that some cases we need query string to access artifact location e.g. S3 ARN can have a region as query string.
Remove all string after "?" when resolving local path. => "?" is a valid character in some OS.

Currently users can specify any query strings to artifact URI, for example, "http:///?/../path".

How is this PR tested?

Existing unit/integration tests
New unit/integration tests
Manual tests

Validated the fix prevents experiment creation with malformed query string:

curl -X POST -H 'Content-Type: application/json' -d '{"name": "poc", "artifact_location": "http:///??/../../../../../../../../../../../../../../etc/"}' 'http://127.0.0.1:5000/ajax-api/2.0/mlflow/experiments/create'

{"error_code": "INVALID_PARAMETER_VALUE", "message": "Invalid query string"}

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

Interface

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

github-actions · 2023-12-08T06:44:19Z

Documentation preview for 3ca7f44 will be available here when this CircleCI job completes successfully.

More info

Ignore this comment if this PR does not change the documentation.
It takes a few minutes for the preview to be available.
The preview is updated when a new commit is pushed to this PR.
This comment was created by https://github.com/mlflow/mlflow/actions/runs/7138521782.

harupy · 2023-12-08T06:53:13Z

mlflow/utils/uri.py

+def _decode(url):
+    # Keep decoding until the url stops changing
+    while True:
+        decoded = urllib.parse.unquote(url)
+        if decoded == url:
+            return url
+        url = decoded


/ai any chance that this function goes into an infinite loop?

@harupy

Yes, there is a chance that this function could go into an infinite loop if urllib.parse.unquote(url) continually returns a value that is not equal to the original url. However, this is extremely unlikely due to how urllib.parse.unquote() operates. It decodes percent-encoded characters in a URL into their corresponding characters. In practice, it means that with each loop, the URL should become less percent-encoded, and eventually, it should be fully decoded so that decoded == url and the function would then exit the loop. But if there's a strange behavior with unquote() or the input url constantly changes outside this function in a multithreaded environment it might cause an infinite loop.

Click here to see the usage data

{ "prompt_tokens": 91, "completion_tokens": 149, "total_tokens": 240, "estimated_cost_in_usd": 0.01167 }

/ai Can we use recursion instead?

@harupy

Yes, it is possible to re-implement this function using recursion. Here would be a possible solution:

def _decode(url): decoded = urllib.parse.unquote(url) if decoded == url: return url else: return _decode(decoded)

This version of the function does essentially the same thing as the original - It decodes the URL, and continue to decode the result until decoding does not produce any changes.

Remember though, Python has a maximum recursion depth (usually 1000). While in normal URL decoding cases you would likely never reach near it, malformed or malicious strings may cause a maximum recursion depth exceeded error. Proper error handling or a depth check should be added if this is a concern.

Click here to see the usage data

{ "prompt_tokens": 256, "completion_tokens": 151, "total_tokens": 407, "estimated_cost_in_usd": 0.01674 }

I'd use this:

for _ in range(100): decoded = urllib.parse.unquote(url) if decoded == url: return url url = decoded raise ValueError("Failed to decode") # No chance for an infinite loop

I feel like 10 is enough 🤔 Who would encode a path to such complex stuff?

Sounds good. 10 should be enough.

Makes sense, will limit it to 10:)

If we reach 10, I think that's some malicious URI.

serena-ruan

LGTM!

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

harupy

LGTM

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

Disable '..' in query string for artifact URI

cf02002

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

B-Step62 requested review from harupy, serena-ruan and daniellok-db December 8, 2023 06:44

github-actions bot added area/artifacts Artifact stores and artifact logging rn/bug-fix Mention under Bug Fixes in Changelogs. labels Dec 8, 2023

harupy reviewed Dec 8, 2023

View reviewed changes

serena-ruan approved these changes Dec 8, 2023

View reviewed changes

Prevent inifite decode loop

3ca7f44

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

B-Step62 force-pushed the fix_ldi branch from 8bb75f1 to 3ca7f44 Compare December 8, 2023 07:15

harupy approved these changes Dec 8, 2023

View reviewed changes

B-Step62 merged commit b4b0b8a into mlflow:master Dec 11, 2023
36 checks passed

harupy mentioned this pull request Dec 13, 2023

Run python3 dev/update_mlflow_versions.py pre-release ... #10679

Merged

44 tasks

harupy pushed a commit that referenced this pull request Dec 14, 2023

Disable '..' in query string for artifact URI (#10653)

556f1a0

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

harupy pushed a commit that referenced this pull request Dec 14, 2023

Disable '..' in query string for artifact URI (#10653)

c3b3cae

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

harupy pushed a commit that referenced this pull request Dec 14, 2023

Disable '..' in query string for artifact URI (#10653)

0f416e1

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

B-Step62 deleted the fix_ldi branch January 10, 2024 01:19

Haxatron mentioned this pull request Jan 24, 2024

Validate fragment and URL params #10880

Open

37 tasks

daniellok-db mentioned this pull request Apr 23, 2024

Add a parse/unparse step when validating whether a URI is safe #11800

Merged

39 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable '..' in query string for artifact URI #10653

Disable '..' in query string for artifact URI #10653

B-Step62 commented Dec 8, 2023 •

edited by github-actions bot

Loading

github-actions bot commented Dec 8, 2023 •

edited

Loading

harupy Dec 8, 2023

mlflow bot Dec 8, 2023

harupy Dec 8, 2023

mlflow bot Dec 8, 2023

harupy Dec 8, 2023 •

edited

Loading

serena-ruan Dec 8, 2023

harupy Dec 8, 2023

B-Step62 Dec 8, 2023

harupy Dec 8, 2023

serena-ruan left a comment

harupy left a comment

Disable '..' in query string for artifact URI #10653

Disable '..' in query string for artifact URI #10653

Conversation

B-Step62 commented Dec 8, 2023 • edited by github-actions bot Loading

Install mlflow from this PR

Checkout with GitHub CLI

What changes are proposed in this pull request?

Problem

Solution

How is this PR tested?

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

github-actions bot commented Dec 8, 2023 • edited Loading

harupy Dec 8, 2023

Choose a reason for hiding this comment

mlflow bot Dec 8, 2023

Choose a reason for hiding this comment

harupy Dec 8, 2023

Choose a reason for hiding this comment

mlflow bot Dec 8, 2023

Choose a reason for hiding this comment

harupy Dec 8, 2023 • edited Loading

Choose a reason for hiding this comment

serena-ruan Dec 8, 2023

Choose a reason for hiding this comment

harupy Dec 8, 2023

Choose a reason for hiding this comment

B-Step62 Dec 8, 2023

Choose a reason for hiding this comment

harupy Dec 8, 2023

Choose a reason for hiding this comment

serena-ruan left a comment

Choose a reason for hiding this comment

harupy left a comment

Choose a reason for hiding this comment

B-Step62 commented Dec 8, 2023 •

edited by github-actions bot

Loading

github-actions bot commented Dec 8, 2023 •

edited

Loading

harupy Dec 8, 2023 •

edited

Loading