Skip to content

fix(security): reject symlinks/hardlinks in BaseFileComponent TAR extraction (GHSA-ccv6-r384-xp75)#12945

Merged
erichare merged 1 commit intorelease-1.9.2from
security/ghsa-ccv6-r384-xp75-tar-symlink
Apr 30, 2026
Merged

fix(security): reject symlinks/hardlinks in BaseFileComponent TAR extraction (GHSA-ccv6-r384-xp75)#12945
erichare merged 1 commit intorelease-1.9.2from
security/ghsa-ccv6-r384-xp75-tar-symlink

Conversation

@erichare
Copy link
Copy Markdown
Collaborator

@erichare erichare commented Apr 30, 2026

Summary

Closes the arbitrary-file-read → RCE chain reported in the security advisory GHSA-ccv6-r384-xp75.

BaseFileComponent._unpack_bundle._safe_extract_tar (in src/lfx/src/lfx/base/data/base_file.py) only validated that a TAR member's name did not escape the extract directory. Symlinks were accepted, so a member like leak -> /home/langflow/.langflow/secret_key was extracted untouched. The post-extraction temp_dir_path.iterdir() walk in _unpack_and_collect_files then handed the link to process_files(), whose concrete implementations (FileComponent, DoclingInline/Remote, NvidiaIngest, VideoFile, Unstructured) call path.read_bytes() — which follows the link.

The reporter's exploit chain:

  1. Build a TAR containing a single symlink at langflow_secret -> ~/.langflow/secret_key.
  2. Upload through any of the affected components into a flow with a vector store + chatbot.
  3. Ask the chatbot to recite the document → JWT signing key leaks.
  4. Forge a JWT for an admin user, create a flow with the Python interpreter node, achieve RCE on the host.

Why every supported Python is vulnerable

Python's tarfile only made the safe data_filter the default in Python 3.14 (PEP 706). The project's pyproject.toml declares requires-python = ">=3.10,<3.14", so on every supported interpreter the legacy fully_trusted filter was in effect and symlinks survived extraction. (3.12/3.13 emit a DeprecationWarning but still extract everything.)

Fix

  • _safe_extract_tar now rejects symbolic-link, hard-link, FIFO, and device-node members with a clear ValueError, allowing only regular files and directories.
  • _unpack_and_collect_files filters is_symlink() entries out of both the post-extraction iterdir() and the recursive directory rglob walk, so a future bundle format that doesn't go through _safe_extract_tar cannot reintroduce the issue.
  • No public API changes. All six affected components inherit the fix automatically.

Tests

src/lfx/tests/unit/base/data/test_base_file_unpack.py (new file, 9 tests):

  • TAR with absolute-path symlink → ValueError, nothing extracted
  • TAR with ../-escape symlink → ValueError
  • TAR with hardlink → ValueError
  • TAR with FIFO/device member → ValueError
  • Benign TAR with regular files only → still extracts cleanly
  • Benign ZIP → still extracts cleanly
  • Defense-in-depth: planted symlink in extracted dir is filtered out before reaching process_files()
  • Unsupported bundle format still raises clearly
  • End-to-end PoC repro: tarfile.add of a real on-disk symlink → refused

Test plan

  • cd src/lfx && uv run pytest tests/unit/base/data/ -xvs — 57 passed, 1 skipped (unrelated)
  • uv run ruff check clean on changed files
  • Pre-commit hooks (ruff, ruff-format, secrets scan) pass
  • Manual repro before/after fix using the advisory PoC archive (recommended for reviewer)

Affected components

All six listed in the advisory inherit BaseFileComponent and are covered by this single fix:

  • src/lfx/src/lfx/components/files_and_knowledge/file.py (FileComponent — Read File)
  • src/lfx/src/lfx/components/docling/docling_inline.py (DoclingInlineComponent)
  • src/lfx/src/lfx/components/docling/docling_remote.py (DoclingRemoteComponent)
  • src/lfx/src/lfx/components/nvidia/nvidia_ingest.py (NvidiaIngestComponent)
  • src/lfx/src/lfx/components/twelvelabs/video_file.py (VideoFileComponent)
  • src/lfx/src/lfx/components/unstructured/unstructured.py (UnstructuredComponent)

Credit: Ori Lahav, Security Researcher @ Rubrik Inc.

Summary by CodeRabbit

Release Notes

  • Bug Fixes
    • Enhanced security for bundle and archive extraction to prevent symlink traversal attacks. Archive processing now rejects symlinks, hardlinks, and special files, with stricter validation of extracted contents to protect against unsafe operations.

@github-actions github-actions Bot added the bug Something isn't working label Apr 30, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 30, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5982220d-ff27-40ab-84b6-4d7c47c83b70

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

Security improvements to bundle extraction paths defensively prevent symlink traversal and unsafe TAR member extraction. Implementation filters symlinks and non-regular files during directory recursion and post-extraction processing. TAR extraction now strictly rejects symlinks, hardlinks, and non-regular members. Comprehensive unit tests verify rejection of unsafe members and successful extraction of regular files.

Changes

Cohort / File(s) Summary
Bundle Extraction Security Hardening
src/lfx/src/lfx/base/data/base_file.py
Added defensive filtering in directory recursion and bundle extraction to exclude symlinks and non-regular files. _safe_extract_tar() now rejects symlink, hardlink, and non-regular members, raising ValueError. Updated docstring to reflect stricter safety requirements.
Bundle Extraction Tests
src/lfx/tests/unit/base/data/test_base_file_unpack.py
New test module covering TAR/ZIP bundle extraction with comprehensive validation of security constraints. Tests verify rejection of symlinks (absolute/relative targets), hardlinks, FIFOs, and non-TAR/ZIP inputs; asserts extraction directory remains empty on error; validates successful extraction of regular files and post-extraction symlink filtering.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 7 | ❌ 2

❌ Failed checks (2 inconclusive)

Check name Status Explanation Resolution
Test Quality And Coverage ❓ Inconclusive Unable to locate the test file and implementation file mentioned in the PR summary in the current repository. Verify that the files are committed and accessible; then provide their contents for evaluation of test coverage and quality.
Test File Naming And Structure ❓ Inconclusive Unable to retrieve test file content due to missing shell execution capability in current environment. Provide the actual test file content or use a system with shell access to retrieve src/lfx/tests/unit/base/data/test_base_file_unpack.py
✅ Passed checks (7 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly describes the main security fix: rejecting symlinks and hardlinks in BaseFileComponent TAR extraction, with a reference to the specific CVE. This accurately reflects the primary changes across all modified files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Test Coverage For New Implementations ✅ Passed PR includes comprehensive test coverage for security fixes with 236-line test file containing 9 unit tests covering symlink rejection, hardlink rejection, FIFO rejection, benign extraction, post-extraction filtering, and unsupported formats.
Excessive Mock Usage Warning ✅ Passed Test file demonstrates excellent design with minimal mocks, using real tarfile/zipfile objects and filesystem operations to verify security-critical behavior.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch security/ghsa-ccv6-r384-xp75-tar-symlink

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 30, 2026

Codecov Report

❌ Patch coverage is 85.71429% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 53.01%. Comparing base (bc16a40) to head (107e405).
⚠️ Report is 3 commits behind head on release-1.9.2.

Files with missing lines Patch % Lines
src/lfx/src/lfx/base/data/base_file.py 85.71% 1 Missing ⚠️

❌ Your project status has failed because the head coverage (50.21%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@                Coverage Diff                @@
##           release-1.9.2   #12945      +/-   ##
=================================================
- Coverage          53.16%   53.01%   -0.16%     
=================================================
  Files               2031     2031              
  Lines             184675   183971     -704     
  Branches           28934    26213    -2721     
=================================================
- Hits               98189    97538     -651     
+ Misses             85382    85326      -56     
- Partials            1104     1107       +3     
Flag Coverage Δ
backend 56.29% <ø> (+<0.01%) ⬆️
frontend 52.92% <ø> (-0.24%) ⬇️
lfx 50.21% <85.71%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/lfx/src/lfx/base/data/base_file.py 41.48% <85.71%> (+5.39%) ⬆️

... and 242 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Apr 30, 2026
@erichare erichare requested review from Adam-Aghili and Jkavia April 30, 2026 16:00
@erichare erichare changed the base branch from main to release-1.9.2 April 30, 2026 16:00
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/lfx/src/lfx/base/data/base_file.py (1)

764-775: Consider adding explicit symlink handling in _safe_extract_zip for consistency with the TAR handler.

Python's ZipFile.extract() does not create filesystem symlinks from ZIP entries—they are extracted as regular files—and the post-extraction filter at lines 726–730 already removes any symlinks that slip through. However, adding an explicit check similar to the TAR handler (which rejects symlink and hardlink members) would provide defense-in-depth and make the security intent explicit. This remains optional since the current behavior is safe.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lfx/src/lfx/base/data/base_file.py` around lines 764 - 775, Update
_safe_extract_zip to explicitly detect and reject symlink entries like the TAR
handler: iterate over bundle.infolist() to get ZipInfo objects, skip mac
resource-fork names as before, inspect each ZipInfo's external_attr (check Unix
file type bits: ((zi.external_attr >> 16) & 0o170000) == 0o120000) and raise
ValueError("Attempted Path Traversal in ZIP File: {member}") or a similar
rejection if it is a symlink (and optionally reject other non-regular file
types), then perform the path traversal check on the target path and call
bundle.extract only for allowed entries; reference the function name
_safe_extract_zip and use ZipInfo objects instead of namelist().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/lfx/src/lfx/base/data/base_file.py`:
- Around line 764-775: Update _safe_extract_zip to explicitly detect and reject
symlink entries like the TAR handler: iterate over bundle.infolist() to get
ZipInfo objects, skip mac resource-fork names as before, inspect each ZipInfo's
external_attr (check Unix file type bits: ((zi.external_attr >> 16) & 0o170000)
== 0o120000) and raise ValueError("Attempted Path Traversal in ZIP File:
{member}") or a similar rejection if it is a symlink (and optionally reject
other non-regular file types), then perform the path traversal check on the
target path and call bundle.extract only for allowed entries; reference the
function name _safe_extract_zip and use ZipInfo objects instead of namelist().

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e4a8722e-f5eb-4379-8dbb-38a0428aa7ed

📥 Commits

Reviewing files that changed from the base of the PR and between 7640ce6 and d73abf3.

📒 Files selected for processing (2)
  • src/lfx/src/lfx/base/data/base_file.py
  • src/lfx/tests/unit/base/data/test_base_file_unpack.py

…raction (GHSA-ccv6-r384-xp75)

`BaseFileComponent._unpack_bundle._safe_extract_tar` accepted any TAR member
type and only checked that `output_dir / member.name` did not escape the
extract dir. That check was performed before extraction, so a symlink whose
*target* was an absolute path (or `../` escape) was extracted untouched.
Once on disk the link was iterated by `temp_dir_path.iterdir()` and handed
to `process_files()`, whose concrete implementations (FileComponent,
DoclingInline/Remote, NvidiaIngest, VideoFile, Unstructured) call
`path.read_bytes()` and follow the link to read arbitrary host files.

The reporter's exploit chain leaks `~/.langflow/secret_key`, forges a JWT
for an admin user, and then runs arbitrary code through the Python
interpreter node, achieving RCE.

Python's `tarfile` only defaults to the safe `data` filter on Python 3.14,
which langflow's `requires-python = ">=3.10,<3.14"` excludes — so every
supported interpreter was vulnerable.

Fix:
- `_safe_extract_tar` now rejects symbolic-link, hard-link, FIFO, and
  device-node members with a `ValueError` and only extracts regular files
  and directories.
- `_unpack_and_collect_files` skips any `is_symlink()` entries from the
  extracted bundle directory and from recursive directory walks as
  defense-in-depth in case a future bundle format slips a link through.
- New `tests/unit/base/data/test_base_file_unpack.py` covers symlink (abs
  + relative escape), hardlink, FIFO rejection, benign tar/zip extraction,
  the post-extraction symlink filter, and an end-to-end repro mirroring
  the advisory PoC (real filesystem symlink → tarfile.add).

Refs: https://github.com/langflow-ai/langflow/security/advisories/GHSA-ccv6-r384-xp75
@erichare erichare force-pushed the security/ghsa-ccv6-r384-xp75-tar-symlink branch from d73abf3 to 107e405 Compare April 30, 2026 16:01
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Apr 30, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Frontend Unit Test Coverage Report

Coverage Summary

Lines Statements Branches Functions
Coverage: 35%
35.26% (40427/114627) 68.08% (5580/8196) 35.87% (940/2620)

Unit Test Results

Tests Skipped Failures Errors Time
4007 0 💤 0 ❌ 0 🔥 7m 6s ⏱️

@erichare erichare requested a review from jordanrfrazier April 30, 2026 16:56
@github-actions github-actions Bot added the lgtm This PR has been approved by a maintainer label Apr 30, 2026
@erichare erichare added this pull request to the merge queue Apr 30, 2026
Merged via the queue into release-1.9.2 with commit cb06a66 Apr 30, 2026
192 of 195 checks passed
@erichare erichare deleted the security/ghsa-ccv6-r384-xp75-tar-symlink branch April 30, 2026 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working lgtm This PR has been approved by a maintainer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants