fix(security): reject symlinks/hardlinks in BaseFileComponent TAR extraction (GHSA-ccv6-r384-xp75)#12945
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughSecurity improvements to bundle extraction paths defensively prevent symlink traversal and unsafe TAR member extraction. Implementation filters symlinks and non-regular files during directory recursion and post-extraction processing. TAR extraction now strictly rejects symlinks, hardlinks, and non-regular members. Comprehensive unit tests verify rejection of unsafe members and successful extraction of regular files. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 7 | ❌ 2❌ Failed checks (2 inconclusive)
✅ Passed checks (7 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
❌ Your project status has failed because the head coverage (50.21%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## release-1.9.2 #12945 +/- ##
=================================================
- Coverage 53.16% 53.01% -0.16%
=================================================
Files 2031 2031
Lines 184675 183971 -704
Branches 28934 26213 -2721
=================================================
- Hits 98189 97538 -651
+ Misses 85382 85326 -56
- Partials 1104 1107 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/lfx/src/lfx/base/data/base_file.py (1)
764-775: Consider adding explicit symlink handling in_safe_extract_zipfor consistency with the TAR handler.Python's
ZipFile.extract()does not create filesystem symlinks from ZIP entries—they are extracted as regular files—and the post-extraction filter at lines 726–730 already removes any symlinks that slip through. However, adding an explicit check similar to the TAR handler (which rejects symlink and hardlink members) would provide defense-in-depth and make the security intent explicit. This remains optional since the current behavior is safe.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lfx/src/lfx/base/data/base_file.py` around lines 764 - 775, Update _safe_extract_zip to explicitly detect and reject symlink entries like the TAR handler: iterate over bundle.infolist() to get ZipInfo objects, skip mac resource-fork names as before, inspect each ZipInfo's external_attr (check Unix file type bits: ((zi.external_attr >> 16) & 0o170000) == 0o120000) and raise ValueError("Attempted Path Traversal in ZIP File: {member}") or a similar rejection if it is a symlink (and optionally reject other non-regular file types), then perform the path traversal check on the target path and call bundle.extract only for allowed entries; reference the function name _safe_extract_zip and use ZipInfo objects instead of namelist().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/lfx/src/lfx/base/data/base_file.py`:
- Around line 764-775: Update _safe_extract_zip to explicitly detect and reject
symlink entries like the TAR handler: iterate over bundle.infolist() to get
ZipInfo objects, skip mac resource-fork names as before, inspect each ZipInfo's
external_attr (check Unix file type bits: ((zi.external_attr >> 16) & 0o170000)
== 0o120000) and raise ValueError("Attempted Path Traversal in ZIP File:
{member}") or a similar rejection if it is a symlink (and optionally reject
other non-regular file types), then perform the path traversal check on the
target path and call bundle.extract only for allowed entries; reference the
function name _safe_extract_zip and use ZipInfo objects instead of namelist().
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e4a8722e-f5eb-4379-8dbb-38a0428aa7ed
📒 Files selected for processing (2)
src/lfx/src/lfx/base/data/base_file.pysrc/lfx/tests/unit/base/data/test_base_file_unpack.py
…raction (GHSA-ccv6-r384-xp75) `BaseFileComponent._unpack_bundle._safe_extract_tar` accepted any TAR member type and only checked that `output_dir / member.name` did not escape the extract dir. That check was performed before extraction, so a symlink whose *target* was an absolute path (or `../` escape) was extracted untouched. Once on disk the link was iterated by `temp_dir_path.iterdir()` and handed to `process_files()`, whose concrete implementations (FileComponent, DoclingInline/Remote, NvidiaIngest, VideoFile, Unstructured) call `path.read_bytes()` and follow the link to read arbitrary host files. The reporter's exploit chain leaks `~/.langflow/secret_key`, forges a JWT for an admin user, and then runs arbitrary code through the Python interpreter node, achieving RCE. Python's `tarfile` only defaults to the safe `data` filter on Python 3.14, which langflow's `requires-python = ">=3.10,<3.14"` excludes — so every supported interpreter was vulnerable. Fix: - `_safe_extract_tar` now rejects symbolic-link, hard-link, FIFO, and device-node members with a `ValueError` and only extracts regular files and directories. - `_unpack_and_collect_files` skips any `is_symlink()` entries from the extracted bundle directory and from recursive directory walks as defense-in-depth in case a future bundle format slips a link through. - New `tests/unit/base/data/test_base_file_unpack.py` covers symlink (abs + relative escape), hardlink, FIFO rejection, benign tar/zip extraction, the post-extraction symlink filter, and an end-to-end repro mirroring the advisory PoC (real filesystem symlink → tarfile.add). Refs: https://github.com/langflow-ai/langflow/security/advisories/GHSA-ccv6-r384-xp75
d73abf3 to
107e405
Compare
Summary
Closes the arbitrary-file-read → RCE chain reported in the security advisory GHSA-ccv6-r384-xp75.
BaseFileComponent._unpack_bundle._safe_extract_tar(insrc/lfx/src/lfx/base/data/base_file.py) only validated that a TAR member's name did not escape the extract directory. Symlinks were accepted, so a member likeleak -> /home/langflow/.langflow/secret_keywas extracted untouched. The post-extractiontemp_dir_path.iterdir()walk in_unpack_and_collect_filesthen handed the link toprocess_files(), whose concrete implementations (FileComponent,DoclingInline/Remote,NvidiaIngest,VideoFile,Unstructured) callpath.read_bytes()— which follows the link.The reporter's exploit chain:
langflow_secret -> ~/.langflow/secret_key.Why every supported Python is vulnerable
Python's
tarfileonly made the safedata_filterthe default in Python 3.14 (PEP 706). The project'spyproject.tomldeclaresrequires-python = ">=3.10,<3.14", so on every supported interpreter the legacyfully_trustedfilter was in effect and symlinks survived extraction. (3.12/3.13 emit aDeprecationWarningbut still extract everything.)Fix
_safe_extract_tarnow rejects symbolic-link, hard-link, FIFO, and device-node members with a clearValueError, allowing only regular files and directories._unpack_and_collect_filesfiltersis_symlink()entries out of both the post-extractioniterdir()and the recursive directoryrglobwalk, so a future bundle format that doesn't go through_safe_extract_tarcannot reintroduce the issue.Tests
src/lfx/tests/unit/base/data/test_base_file_unpack.py(new file, 9 tests):ValueError, nothing extracted../-escape symlink →ValueErrorValueErrorValueErrorprocess_files()tarfile.addof a real on-disk symlink → refusedTest plan
cd src/lfx && uv run pytest tests/unit/base/data/ -xvs— 57 passed, 1 skipped (unrelated)uv run ruff checkclean on changed filesAffected components
All six listed in the advisory inherit
BaseFileComponentand are covered by this single fix:src/lfx/src/lfx/components/files_and_knowledge/file.py(FileComponent — Read File)src/lfx/src/lfx/components/docling/docling_inline.py(DoclingInlineComponent)src/lfx/src/lfx/components/docling/docling_remote.py(DoclingRemoteComponent)src/lfx/src/lfx/components/nvidia/nvidia_ingest.py(NvidiaIngestComponent)src/lfx/src/lfx/components/twelvelabs/video_file.py(VideoFileComponent)src/lfx/src/lfx/components/unstructured/unstructured.py(UnstructuredComponent)Credit: Ori Lahav, Security Researcher @ Rubrik Inc.
Summary by CodeRabbit
Release Notes