🔒 Fix path traversal vulnerability in local_file source#77
Conversation
Added validation in `get_value` to ensure `path` components do not contain `ParentDir`, `RootDir`, or `Prefix` elements before joining them with `self.root_path`. This prevents attackers from accessing files outside the specified root path. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
recoco-docs | 1ae455b | Commit Preview URL Branch Preview URL |
Mar 13 2026, 04:18 AM |
There was a problem hiding this comment.
Pull request overview
This PR mitigates a path traversal issue in the local_file source by rejecting unsafe key paths before reading from disk.
Changes:
- Adds component-based validation to reject
.., absolute paths (/), and Windows drive/prefix paths (C:). - Keeps existing file-include pattern checks, returning
NonExistencefor rejected paths.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let path_obj = Path::new(path); | ||
|
|
||
| // Prevent path traversal vulnerabilities by verifying the path | ||
| // doesn't contain parent directory or absolute components. | ||
| if path_obj.components().any(|c| { | ||
| matches!( | ||
| c, | ||
| std::path::Component::ParentDir | ||
| | std::path::Component::RootDir | ||
| | std::path::Component::Prefix(_) | ||
| ) | ||
| }) || !self.pattern_matcher.is_file_included(path) { |
|
@copilot open a new pull request to apply changes based on the comments in this thread |
|
@bashandbone I've opened a new pull request, #83, to work on those changes. Once the pull request is ready, I'll request review from you. |
|
🤖 Hi @bashandbone, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
I have updated the PR to also canonicalize both the joined path and the |
Added validation in `get_value` to ensure `path` components do not contain `ParentDir`, `RootDir`, or `Prefix` elements before joining them with `self.root_path`. This prevents attackers from accessing files outside the specified root path. Also mitigates symlink-based path traversal by canonicalizing and checking boundaries to ensure the canonicalized target path starts with the canonicalized root path. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
There was a problem hiding this comment.
This Pull Request effectively addresses a path traversal vulnerability in the local_file source by implementing component-based path validation and boundary checks using canonicalization. These changes significantly improve the security posture of the local file ingestion by preventing unauthorized access to files outside the specified root directory.
🔍 General Feedback
- Security: The combination of denylisting unsafe path components (
.., absolute paths) and verifying canonical paths against the root is a robust defense-in-depth approach. - Performance: The frequent use of synchronous
std::fs::canonicalizeon the root path in theget_valuemethod introduces unnecessary I/O overhead. Caching this value would be a beneficial optimization. - Async Hygiene: Moving towards
tokio::fsfor all I/O operations in thisasyncexecutor will help ensure that the runtime threads are never blocked by file system operations.
|
|
||
| // Prevent path traversal vulnerabilities by verifying the path | ||
| // doesn't contain parent directory or absolute components. | ||
| if path_obj.components().any(|c| { |
There was a problem hiding this comment.
🟢 You can use path_obj.is_absolute() to check for RootDir and Prefix components more idiomatically. The current approach with components().any(...) is correct as it also handles ParentDir in the same pass, so this is just a minor stylistic suggestion.
| let path = self.root_path.join(path); | ||
|
|
||
| // Mitigate symlink-based path traversal by canonicalizing and checking boundaries | ||
| let root_canon = match std::fs::canonicalize(&self.root_path) { |
There was a problem hiding this comment.
🟡 Calculating the canonical root path on every call to get_value is inefficient as it involves redundant I/O. Since root_path is constant for the lifetime of the Executor, it should be pre-calculated and stored in the Executor struct during construction in build_executor.
Additionally, std::fs::canonicalize is a synchronous operation that can block the async runtime thread. Consider using tokio::fs::canonicalize(...).await instead.
| } | ||
| Err(e) => Err(e)?, | ||
| }; | ||
|
|
There was a problem hiding this comment.
🟡 std::fs::canonicalize is a synchronous I/O operation that can block the async executor thread. Since this function is async, it is recommended to use tokio::fs::canonicalize(&path).await to maintain responsiveness.
Added validation in `get_value` to ensure `path` components do not contain `ParentDir`, `RootDir`, or `Prefix` elements before joining them with `self.root_path`. This prevents attackers from accessing files outside the specified root path. Also mitigates symlink-based path traversal by canonicalizing and checking boundaries to ensure the canonicalized target path starts with the canonicalized root path. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
* Initial plan * fix: add symlink-safe path validation using canonicalize in local_file source Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> --------- Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
|
@jules please address the comments/suggestions in this thread |
I have addressed all the PR comments. The canonicalization of |
Added validation in `get_value` to ensure `path` components do not contain `ParentDir`, `RootDir`, or `Prefix` elements before joining them with `self.root_path`. This prevents attackers from accessing files outside the specified root path. Also mitigates symlink-based path traversal by canonicalizing and checking boundaries to ensure the canonicalized target path starts with the canonicalized root path. Cached root path canonicalization in Executor and swapped to tokio async canonicalize. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
🎯 What: A path traversal vulnerability in
⚠️ Risk: An attacker with control over the
crates/recoco-core/src/ops/sources/local_file.rsallows reading arbitrary files outside theroot_path.keyparameter could craft paths (e.g.,../../etc/passwdor/etc/passwd) that escape the intended root directory, potentially leading to unauthorized disclosure of sensitive system files.🛡️ Solution: Modified
get_valueto validate the path by iterating through its components usingstd::path::Path::new(path).components(). The code now explicitly checks and rejects paths containingParentDir(..),RootDir(/), orPrefix(e.g.,C:) components before attempting to join the path with theroot_path. If an unsafe path is provided, it gracefully returnsNonExistence.PR created automatically by Jules for task 17074725847278981384 started by @bashandbone