Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plumb symlink support through the Pants engine #16844

Merged
merged 49 commits into from Nov 4, 2022

Conversation

thejcannon
Copy link
Member

@thejcannon thejcannon commented Sep 13, 2022

This plumbs everything needed to support symlinks through the engine without actually having them be picked up by path globbing (which will come later)

Tom Dyas and others added 5 commits September 12, 2022 12:29
[ci skip-build-wheels]
# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
Copy link
Member Author

@thejcannon thejcannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdyas or @stuhood can y'all comment on the todo!()s? Anything from actual code to copy+paste to high-level description of how to handle that case would be very nice 😄

Comment on lines 507 to 512
/// Return a pair of Vecs of the file paths and directory paths in this DigestTrie, each in
/// sorted order.
///
/// TODO: This should probably be implemented directly by consumers via `walk`, since they
/// can directly allocate the collections that they need.
pub fn files_and_directories(&self) -> (Vec<PathBuf>, Vec<PathBuf>) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I split this into 3 implementations, as only one caller needed all the items at the same time, and it was for __repr__, so perf doesn't matter that much.

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
src/python/pants/engine/fs.py Outdated Show resolved Hide resolved
src/rust/engine/fs/src/directory.rs Outdated Show resolved Hide resolved
src/rust/engine/fs/src/directory.rs Outdated Show resolved Hide resolved
src/rust/engine/fs/src/lib.rs Outdated Show resolved Hide resolved
src/rust/engine/fs/store/src/lib.rs Outdated Show resolved Hide resolved
@@ -227,6 +238,7 @@ pub trait SnapshotOps: Clone + Send + Sync + 'static {
directory::Entry::File(f) => {
files.insert(path.to_owned(), f.digest());
}
directory::Entry::Symlink(_) => todo!(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any change here depends on what (if any) changes are needed to the signature of DigestTrie::from_path_stats to support symlinks.

src/rust/engine/process_execution/src/remote_cache.rs Outdated Show resolved Hide resolved
src/rust/engine/process_execution/src/remote_cache.rs Outdated Show resolved Hide resolved
src/rust/engine/process_execution/src/remote_cache.rs Outdated Show resolved Hide resolved
src/rust/engine/fs/src/directory.rs Outdated Show resolved Hide resolved
Copy link
Sponsor Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

src/rust/engine/fs/store/src/lib.rs Outdated Show resolved Hide resolved
src/rust/engine/fs/store/src/lib.rs Outdated Show resolved Hide resolved
src/rust/engine/fs/store/src/lib.rs Outdated Show resolved Hide resolved
src/rust/engine/fs/store/src/lib.rs Show resolved Hide resolved
src/rust/engine/fs/src/directory.rs Outdated Show resolved Hide resolved
}
}
}

// TODO: `PathStat` owns its path, which means it can't be used via recursive slicing. See
// whether these types can be merged.
enum TypedPath<'a> {
Copy link
Sponsor Member

@stuhood stuhood Sep 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that this code hasn't changed suggests that PathStat, which is the type produced while VFS::expand walks the filesystem has not been changed.

If you get tests passing without that change, then I think that what you will have accomplished is that our VFS code (which is what matches globs in the repo for PathGlobs->Digest, and which captures the outputs of processes from sandboxes) will never actually create a DigestTrie/Digest/Snapshot containing a symlink, BUT when one was created synthetically, it would present as such in memory, and be successfully materialized.

That's probably a good breaking point for a first PR, so you might want to avoid dipping into VFS yet if you can.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah just started to muck with fs_test and noticed this behavior.

I like the idea of splitting into multiple PRs.

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@thejcannon thejcannon changed the title WIP: Add symlink support to the Pants engine WIP: Plumb symlink support through the Pants engine Sep 13, 2022
# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@thejcannon thejcannon marked this pull request as ready for review September 14, 2022 01:20
@thejcannon
Copy link
Member Author

Opening for review. LMK if we should tackle the path_stats in this PR, or wait for follow-up.

Copy link
Contributor

@tdyas tdyas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opening for review. LMK if we should tackle the path_stats in this PR, or wait for follow-up.

I'm fine with the from_path_stats changes going to a separate PR. I would like to see some tests in this PR for the changes made in this PR.

src/rust/engine/src/nodes.rs Outdated Show resolved Hide resolved
# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@thejcannon thejcannon added the category:bugfix Bug fixes for released features label Sep 14, 2022
# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
[ci skip-build-wheels]
Copy link
Sponsor Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

Regarding absolute symlinks: it should just amount to ignoring symlinks dests which are: Path::new(...).is_absolute() (https://doc.rust-lang.org/nightly/std/path/struct.Path.html#method.is_absolute) in walk?

Comment on lines 663 to 664
// We don't return a Result, so log and move on
warn!("Exceeded the maximum link depth while traversing links. Stopping traversal.");
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably include some information on the path that was traversed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing we'd want to trim the path. Suggestion on some Rust-y code that'd do that?

@thejcannon
Copy link
Member Author

Yeah I just hit the buzzer before I implemented absolute paths. It's up next.

@thejcannon
Copy link
Member Author

thejcannon commented Oct 28, 2022

Absolute paths are donezo. Only problem I think is a symlink to a single ".". I thin it needs to be handled very specially, possibly in multiple places (where we know what the preceeding path comp is).

@thejcannon
Copy link
Member Author

Ah I think it's a combo of handling a symlink with a bare ".", And peeking for a "." In the components loop in entry_helper.

@thejcannon
Copy link
Member Author

Well that was painful...

@thejcannon
Copy link
Member Author

Hehe my abbreviated SHA for the commit "im dead inside" ends in d3ad

src/rust/engine/fs/src/directory.rs Outdated Show resolved Hide resolved
path_so_far.push(component);
let component = component.as_os_str();
logical_path.push(component);
if component == Component::ParentDir {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This very well could be a while loop, but the perf seems negligible and I'm tired of staring at the code.

Copy link
Sponsor Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for persevering here, and for being so thorough with test coverage!

use crate::MAX_LINK_DEPTH;
use hashing::EMPTY_DIGEST;
use std::path::{Path, PathBuf};

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this... I'm lightly embarassed not to have had rust side tests for walk/entry already... but on the other hand, it used to be a lot simpler, heh.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having these tests is the only way I survived honestly. Turnaround time for changes went from minutes to seconds, and the ability to log/debug easily was invaluable.

@thejcannon
Copy link
Member Author

Thanks a lot for persevering here, and for being so thorough with test coverage!

I was not about to half-ass this. Especially because bugs here would absolutely wreck havoc on plugin authors and the end-user.

@thejcannon
Copy link
Member Author

Hmm wasn't expecting test_digest_entries_handles_symlinks to timeout ❗

@thejcannon
Copy link
Member Author

Hahahaha it was a panic! from me forgetting about symlink-aware code 😂

@thejcannon
Copy link
Member Author

thejcannon commented Nov 3, 2022

Oh one thing to note is when we're walking a Trie as symlink oblivious, directory symlinks are not passed through the function as directories.

E.g. self/foo.bar might be an entry for a file file while self is not a directory entry. In the case that self is symlink.

Cc @stuhood

Comment on lines +353 to +370
fn walk_too_many_links_subdir() {
let tree = make_tree(vec![
TypedPath::File {
path: Path::new("a/file.txt"),
is_executable: false,
},
TypedPath::Link {
path: Path::new("a/self"),
target: Path::new("."),
},
]);
assert_walk(
&tree,
(0..MAX_LINK_DEPTH)
.into_iter()
.map(|n| ("a/".to_string() + &"self/".repeat(n.into()) + "file.txt"))
.collect::<Vec<_>>(),
vec!["".to_string(), "a".to_string()],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah @stuhood here's what I'm talking about. a/self/file.txt is a file (entry is for a/file.txt, but the only directory entries are root and a.

I think conceptually it makes sense. There is no entry for a/self. But an interesting quirk.

@thejcannon thejcannon merged commit e9ae0a3 into pantsbuild:main Nov 4, 2022
@thejcannon thejcannon deleted the symlinks branch November 4, 2022 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:bugfix Bug fixes for released features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants