Skip to content

[security] fix(files): contain stored upload paths#92

Merged
mbakgun merged 1 commit into
heymrun:mainfrom
Hinotoi-agent:fix/file-upload-path-containment
May 10, 2026
Merged

[security] fix(files): contain stored upload paths#92
mbakgun merged 1 commit into
heymrun:mainfrom
Hinotoi-agent:fix/file-upload-path-containment

Conversation

@Hinotoi-agent
Copy link
Copy Markdown
Contributor

Summary

This PR fixes a file-storage path traversal issue in the manual upload flow. The upload endpoint accepted the client-controlled multipart filename and passed it directly into the stored path under the configured file-storage root.

The patch now rejects filenames with path components, validates resolved paths before disk access, converts invalid upload filenames into a 400 response, and adds focused regression coverage for POSIX traversal, Windows-style separators, valid uploads, and legacy escaped storage paths.

Security issues covered

Issue Severity Affected surface Fix
Authenticated upload filename traversal can escape the configured file-storage root High POST /api/files/upload and shared file-storage helpers Reject path-component filenames and enforce resolved-path containment

Before this PR

  • upload_file() passed UploadFile.filename directly to store_file().
  • store_file() built relative_path = f"{owner_id}/{file_uuid}/{filename}" without rejecting .., /, \\, drive prefixes, or NUL bytes.
  • The write target was computed as _storage_root() / relative_path without checking that the resolved destination stayed under the storage root.
  • get_file_path() also trusted GeneratedFile.storage_path, so legacy or malicious stored paths could resolve outside the storage root.

After this PR

  • Stored filenames must be simple filenames, not paths.
  • POSIX traversal, Windows-style separators, drive/root prefixes, empty filenames, . / .., and NUL bytes are rejected.
  • Disk paths are resolved and verified to remain under the configured storage root before write/read/delete helpers use them.
  • Manual uploads return 400 Bad Request for invalid filenames instead of allowing the error to become an internal server error.

Why this matters

A filename in a multipart upload is attacker-controlled input. When it is appended into a filesystem path without normalization or containment checks, an authenticated user can write arbitrary bytes outside the intended storage directory, limited by the backend process permissions.

Depending on deployment layout and writable paths, that can corrupt application data, overwrite app-owned files, poison served content, or become code execution if an executed or reloaded path is writable.

Attack flow

  1. An attacker signs in as any user.
  2. The attacker sends POST /api/files/upload with a multipart filename such as ../../../poc-owned-by-attacker.txt.
  3. The backend appends that filename under {storage_root}/{owner_id}/{file_uuid}/.
  4. Without containment, the resolved path escapes the configured storage root and writes attacker-controlled bytes as the backend process user.

Affected code

  • backend/app/api/files.py
    • upload_file()
  • backend/app/services/file_storage.py
    • store_file()
    • get_file_path()
    • delete_file() through get_file_path()

Root cause

  • The upload filename crossed from an HTTP client boundary into a filesystem path without being treated as untrusted.
  • The storage layer constructed paths by string interpolation instead of limiting storage filenames to a single path segment.
  • The storage layer did not verify resolved-path containment before filesystem operations.

CVSS assessment

  • CVSS v3.1: 7.6 High
  • Vector: CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:L/I:H/A:L
  • Rationale: exploitation is network-reachable for an authenticated low-privilege user and requires no user interaction. The primary impact is integrity because attacker-controlled bytes can be written outside the intended storage directory; confidentiality and availability impacts are deployment-dependent but plausible through app-owned file overwrite or corruption.

Safe reproduction steps

On vulnerable code, the issue can be reproduced with a local harness that uses the same storage path construction:

  1. Configure a temporary storage root.
  2. Call the storage path construction with an authenticated owner id, a generated file id, and filename ../../../poc-owned-by-attacker.txt.
  3. Write bytes through the computed path.
  4. Observe the file being created outside the temporary storage root.

This PR adds regression tests that exercise this behavior safely without writing outside the temporary test directory.

Expected vulnerable behavior

On vulnerable code, a filename containing traversal segments resolves outside the configured file-storage root and writes attacker-controlled content there.

Changes in this PR

  • Add filename validation for storage filenames.
  • Reject path components and traversal before any disk write.
  • Add a resolved-path containment helper for storage paths.
  • Use the containment helper in store_file() and get_file_path().
  • Return 400 Bad Request for invalid manual upload filenames.
  • Add regression tests for traversal rejection, Windows-style path rejection, safe-file persistence, and legacy escaped path rejection.

Files changed

Category Files What changed
API handling backend/app/api/files.py Converts storage validation failures from manual upload into a 400 response
Storage hardening backend/app/services/file_storage.py Adds filename validation and resolved-path containment
Tests backend/tests/test_file_storage.py Adds focused storage path safety regression tests

Maintainer impact

This keeps existing safe filenames working while refusing filenames that behave as paths. Stored files still live under the same {owner_id}/{file_uuid}/{filename} layout for valid filenames.

The main behavior change is that clients attempting to upload path-like filenames now receive a client error instead of having those names accepted into storage paths.

Fix rationale

The storage boundary should not depend on callers remembering to sanitize filenames. The storage layer is the last common point before disk access, so it now enforces both input shape and resolved-path containment.

The containment check also protects reads/deletes through get_file_path() if older rows or future bugs introduce unsafe storage_path values.

Type of change

  • Security fix
  • Bug fix
  • Tests added or updated
  • Documentation update
  • Breaking change

Test plan

Commands run:

cd backend
uv run pytest tests/test_file_storage.py
uv run ruff check app/services/file_storage.py app/api/files.py tests/test_file_storage.py
uv run ruff format --check app/services/file_storage.py app/api/files.py tests/test_file_storage.py
ENCRYPTION_KEY=<64-byte-test-key> ./run_tests.sh

Result:

  • tests/test_file_storage.py: 4 passed
  • Backend test suite: 742 passed
  • Ruff check/format check passed for touched files

Note: ./check.sh could not complete in this local environment because bun is not installed (./check.sh: line 10: bun: command not found).

Disclosure notes

This PR is intentionally bounded to file-storage path containment for generated/manual-uploaded files. It does not claim broader sandboxing or protection for unrelated filesystem access paths.

@mbakgun
Copy link
Copy Markdown
Contributor

mbakgun commented May 10, 2026

Thanks ! 🙏

@mbakgun mbakgun merged commit 835843e into heymrun:main May 10, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants