New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plumb symlink support through the Pants engine #16844
Conversation
[ci skip-build-wheels]
# Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/rust/engine/fs/src/directory.rs
Outdated
/// Return a pair of Vecs of the file paths and directory paths in this DigestTrie, each in | ||
/// sorted order. | ||
/// | ||
/// TODO: This should probably be implemented directly by consumers via `walk`, since they | ||
/// can directly allocate the collections that they need. | ||
pub fn files_and_directories(&self) -> (Vec<PathBuf>, Vec<PathBuf>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I split this into 3 implementations, as only one caller needed all the items at the same time, and it was for __repr__
, so perf doesn't matter that much.
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
@@ -227,6 +238,7 @@ pub trait SnapshotOps: Clone + Send + Sync + 'static { | |||
directory::Entry::File(f) => { | |||
files.insert(path.to_owned(), f.digest()); | |||
} | |||
directory::Entry::Symlink(_) => todo!(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any change here depends on what (if any) changes are needed to the signature of DigestTrie::from_path_stats
to support symlinks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
src/rust/engine/fs/src/directory.rs
Outdated
} | ||
} | ||
} | ||
|
||
// TODO: `PathStat` owns its path, which means it can't be used via recursive slicing. See | ||
// whether these types can be merged. | ||
enum TypedPath<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that this code hasn't changed suggests that PathStat
, which is the type produced while VFS::expand
walks the filesystem has not been changed.
If you get tests passing without that change, then I think that what you will have accomplished is that our VFS code (which is what matches globs in the repo for PathGlobs
->Digest
, and which captures the outputs of processes from sandboxes) will never actually create a DigestTrie
/Digest
/Snapshot
containing a symlink, BUT when one was created synthetically, it would present as such in memory, and be successfully materialized.
That's probably a good breaking point for a first PR, so you might want to avoid dipping into VFS
yet if you can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah just started to muck with fs_test
and noticed this behavior.
I like the idea of splitting into multiple PRs.
# Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
Opening for review. LMK if we should tackle the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opening for review. LMK if we should tackle the path_stats in this PR, or wait for follow-up.
I'm fine with the from_path_stats
changes going to a separate PR. I would like to see some tests in this PR for the changes made in this PR.
# Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot!
Regarding absolute symlinks: it should just amount to ignoring symlinks dests which are: Path::new(...).is_absolute()
(https://doc.rust-lang.org/nightly/std/path/struct.Path.html#method.is_absolute) in walk
?
src/rust/engine/fs/src/directory.rs
Outdated
// We don't return a Result, so log and move on | ||
warn!("Exceeded the maximum link depth while traversing links. Stopping traversal."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably include some information on the path that was traversed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing we'd want to trim the path. Suggestion on some Rust-y code that'd do that?
Yeah I just hit the buzzer before I implemented absolute paths. It's up next. |
Absolute paths are donezo. Only problem I think is a symlink to a single |
Ah I think it's a combo of handling a symlink with a bare ".", And peeking for a "." In the components loop in entry_helper. |
Well that was painful... |
Hehe my abbreviated SHA for the commit "im dead inside" ends in |
path_so_far.push(component); | ||
let component = component.as_os_str(); | ||
logical_path.push(component); | ||
if component == Component::ParentDir { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This very well could be a while
loop, but the perf seems negligible and I'm tired of staring at the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for persevering here, and for being so thorough with test coverage!
use crate::MAX_LINK_DEPTH; | ||
use hashing::EMPTY_DIGEST; | ||
use std::path::{Path, PathBuf}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this... I'm lightly embarassed not to have had rust side tests for walk
/entry
already... but on the other hand, it used to be a lot simpler, heh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having these tests is the only way I survived honestly. Turnaround time for changes went from minutes to seconds, and the ability to log/debug easily was invaluable.
I was not about to half-ass this. Especially because bugs here would absolutely wreck havoc on plugin authors and the end-user. |
Hmm wasn't expecting |
Hahahaha it was a |
Oh one thing to note is when we're walking a Trie as symlink oblivious, directory symlinks are not passed through the function as directories. E.g. self/foo.bar might be an entry for a file file while self is not a directory entry. In the case that self is symlink. Cc @stuhood |
fn walk_too_many_links_subdir() { | ||
let tree = make_tree(vec![ | ||
TypedPath::File { | ||
path: Path::new("a/file.txt"), | ||
is_executable: false, | ||
}, | ||
TypedPath::Link { | ||
path: Path::new("a/self"), | ||
target: Path::new("."), | ||
}, | ||
]); | ||
assert_walk( | ||
&tree, | ||
(0..MAX_LINK_DEPTH) | ||
.into_iter() | ||
.map(|n| ("a/".to_string() + &"self/".repeat(n.into()) + "file.txt")) | ||
.collect::<Vec<_>>(), | ||
vec!["".to_string(), "a".to_string()], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah @stuhood here's what I'm talking about. a/self/file.txt
is a file (entry is for a/file.txt
, but the only directory entries are root and a
.
I think conceptually it makes sense. There is no entry for a/self
. But an interesting quirk.
This plumbs everything needed to support symlinks through the engine without actually having them be picked up by path globbing (which will come later)