Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider preserving mtimes of archives #7590

Closed
SimonSapin opened this issue Nov 14, 2019 · 16 comments · Fixed by #7935
Closed

Consider preserving mtimes of archives #7590

SimonSapin opened this issue Nov 14, 2019 · 16 comments · Fixed by #7935
Labels
A-filesystem Area: issues with filesystems A-rebuild-detection Area: rebuild detection and fingerprinting C-bug Category: bug

Comments

@SimonSapin
Copy link
Contributor

#7465 disabled preserving modification times when extracting tarballs.

Some crates have a build script that runs make or some other tool that look at mtimes to decide whether to (re)build some generated files. Some of them ship with some files pre-generated, in order to reduce system dependencies for users.

This seems to be the case of https://crates.io/crates/harfbuzz-sys. In recent versions of Cargo, it intermittently fails to build if Ragel is not available: servo/servo#24611

Although this situation is maybe not common, preserving mtimes would avoid this kind of failure or at least make them deterministic. Should #7465 be reverted?

@jdm
Copy link

jdm commented Nov 15, 2019

servo/servo#24611 is hitting us extremely hard and making it difficult for any Servo pull requests to merge. Can we please revert #7465?

@SimonSapin
Copy link
Contributor Author

CC @rust-lang/cargo

@alexcrichton
Copy link
Member

I'm sort of curious as to why a revert is being pursued here? This sounds like a bug in a build script or a broken assertion that wasn't really guaranteed to be upheld anyway. The fix being proposed is to revert this change which forces every single user of Cargo to run what's 99.9% of the time these extraneous syscalls to update mtime information. It seems like a better fix is something like servo/rust-harfbuzz#173 which fixes this at the source of the crate, since this is localized to just that crate.

I can understand the desire of "something broke please revert what broke things" but I'd like to push back at least a little bit and ask if a fix for the crate itself can be shipped. Landing a fix in cargo will take quite some time to actually propagate to nightly so it's not like reverting this will be a quick fix either, where shipping a new version of the sys crate and updating to that would fix the issue immediately.

@jdm
Copy link

jdm commented Nov 18, 2019

Why is it a bug in the build script? The build script invokes a make build, and Makefiles make decisions based on file mtimes. Source release packages like fontconfig and harfbuzz are set up so that the make builds work out of the box without regenerating files unnecessarily, and so the files that are present in the rust source builds of these packages maintain the same mtimes as the original source releases.

@alexcrichton
Copy link
Member

So again, I'm just basically writing down my 2 cents. If you push back against the 2 cents that's fine, I'll just be quiet here. Also again I think it's fine to revert this and push it up, I don't have time to do so myself though.

I personally consider this a bug in the build script because it's relying on the mtime of files being extracted from a tarball which to me is just plain werid. That's a personal opinion, though, so please don't take that as the view of the cargo team or something like that. It seems to me that the build system of the crate here can be improved to be more robust and not be so fragile as to rely on mtimes.

@lukaslueg
Copy link
Contributor

FWIW I consider the preservation of mtimes common behavior, exactly for purposes like this.

Things like this usually break due to other reasons, like weird shared network drivers where mtimes are ignored/not set immediately. Yet if your build breaks due to this you usually know who is to blame.

@SimonSapin
Copy link
Contributor Author

We’ve landed a work-around for the case of Ragel in harfbuzz-sys. Preserving mtimes would still avoid this entire class of issues. I don’t have a strong opinion whether that’s worth the cost of these extra syscalls.

@pnkfelix
Copy link
Member

Sorry for butting in, but I wanted to ask: Why don't we make an option in the Cargo.toml for crates that want to opt-into mtime-preserving behavior for that crate?

That way, crates that have a build.rs script that depend on mtime-preservation can ask for it, while the presumed more common case of not needing it can be made the default so that most crates will reap the reduced syscall count?

@SimonSapin
Copy link
Contributor Author

Cargo.toml itself is in the tarball that we’re extracting with or without mtimes. So we’d first have to extract without mtimes, then (optionally) scan the tarball again just to apply mtimes. Or maybe do an initial scan to extract just Cargo.toml. Not impossible, but more tricky than passing a boolean to the tar crate and use its existing functionality.

@lukaslueg
Copy link
Contributor

I had noticed in another context that it would be nice if cargo package placed the manifest as the very first entry in the tarball. My original intention was to make it faster to find the manifest when dealing with packaged crates for metadata-extraction; this may be another use case.

Consider cargo package places the manifest as the very first entry during cargo package. When unpacking a crate:

  • We check if the first entry in the archive is exactly /Cargo.toml. If it is, we can take a peek at it (using a less robust parsing, we don't care about [cfg] and friends). If preserve_mtimes is true, we do as told.
  • If the first entry in the archive is not /Cargo.toml, if preserve_mtimes is false or not present, we skip setting mtimes.

That way currently packaged crates get what is today's behavior of not preserving mtimes, yet having a slight(?) performance benefit when unpacking. This is also what crates get that don't care about mtimes. If a crate needs them, it can use preserve_mtimes. Future versions of cargo will honor that flag as a documented feature.

@ehuss
Copy link
Contributor

ehuss commented Nov 20, 2019

If someone wants to send a PR to revert this, I think we would be OK with it. It would help to back it up with numbers that it doesn't negatively impact performance on slower file systems.

@ehuss ehuss added A-filesystem Area: issues with filesystems A-rebuild-detection Area: rebuild detection and fingerprinting labels Dec 3, 2019
@cwndrws
Copy link
Contributor

cwndrws commented Feb 26, 2020

Not sure if this is still a thing that folks want to do, but Triage sent it to me, so figured I'd get the conversation started again with a PR. #7935

@ehuss
Copy link
Contributor

ehuss commented Feb 26, 2020

It would help to back it up with numbers that it doesn't negatively impact performance on slower file systems.

@cwndrws Can you provide the requested information on the performance impact?

@cwndrws
Copy link
Contributor

cwndrws commented Feb 27, 2020

@ehuss Are there are existing benchmarks that test this sort of performance? If not, what were you thinking of as a way of testing performance? Just running cargo on a project with a substantial number of dependencies, with and without the revert?

@ehuss
Copy link
Contributor

ehuss commented Feb 27, 2020

There are no existing benchmarks. To test it, I would take a largish project and delete $CARGO_HOME/registry/src and time cargo fetch. I would run it on a Windows machine with a magnetic hard drive as a probably worst-case test, but test on a few other platforms as well.

@cwndrws
Copy link
Contributor

cwndrws commented Mar 5, 2020

@ehuss I added some timing numbers to the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-filesystem Area: issues with filesystems A-rebuild-detection Area: rebuild detection and fingerprinting C-bug Category: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants