Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude the target directory from backups using CACHEDIR.TAG #8378

Merged
merged 13 commits into from Jul 2, 2020

Conversation

jstasiak
Copy link
Contributor

This patch follows the lead of #4386 (which excludes target directories
from Time Machine backups) and is motived by the same reasons listen
in #3884. CACHEDIR.TAG is an OS-independent mechanism supported by Borg,
restic, GNU Tar and other backup/archiving solutions.

See https://bford.info/cachedir/ for more information about the
specification. This has been discussed in Rust Internals earlier this
year[1] and it seems like it's an uncontroversial improvement so I went
ahead with the patch.

One thing I'm wondering is whether this should maybe cover the whole main target directory (right now it applies to target/debug, target/release etc. but not to target root).

[1] https://internals.rust-lang.org/t/pre-rfc-put-cachedir-tag-into-target/12262/11

This patch follows the lead of rust-lang#4386 (which excludes target directories
from Time Machine backups) and is motived by the same reasons listen
in rust-lang#3884. CACHEDIR.TAG is an OS-independent mechanism supported by Borg,
restic, GNU Tar and other backup/archiving solutions.

See https://bford.info/cachedir/ for more information about the
specification. This has been discussed in Rust Internals earlier this
year[1] and it seems like it's an uncontroversial improvement so I went
ahead with the patch.

[1] https://internals.rust-lang.org/t/pre-rfc-put-cachedir-tag-into-target/12262/11
@rust-highfive
Copy link

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @Eh2406 (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 18, 2020
@Eh2406
Copy link
Contributor

Eh2406 commented Jun 18, 2020

Josh you were active on the internals thread:
@bors r? @joshtriplett

@rust-highfive rust-highfive assigned joshtriplett and unassigned Eh2406 Jun 18, 2020
@jstasiak
Copy link
Contributor Author

Ah yeah, I forgot to mention Josh explicitly, thanks for that. Also I'll rerun the test suite locally and address the build failures.

@jstasiak
Copy link
Contributor Author

I'm unable to reproduce the resolver job failure (https://dev.azure.com/rust-lang/cargo/_build/results?buildId=32288&view=logs&j=f961518a-cff1-54aa-915a-1a77d90c89d9&t=b666365a-259d-50d7-5aed-f8ff536de640) locally (Mac OS 10.14), all tests pass for me and I'm kinda stuck on this.

@jstasiak
Copy link
Contributor Author

I also have the following concern: I imagine it's possible for cargo to be interrupted right after creating the target directory (whether we consider target/ or target/debug/) but before applying this backup-exclusion login. Then this logic

        if !dest.as_path_unlocked().exists() {
            dest.create_dir()?;
            exclude_from_backups(dest.as_path_unlocked());
        }

would not trigger on the re-run and the directory would be included in backups until next working directory cleanup/target directory recreation. However small the chance of that is I'm wondering if maybe the exclude_from_backups() step should always happen (assuming its idempotence). Granted, that would counter user's actions to actually make that directory included in backups (by removing the flags) so I don't have a strong opinion on this, maybe it's ok to be slightly flaky here.

@joshtriplett
Copy link
Member

I also have the following concern: I imagine it's possible for cargo to be interrupted right after creating the target directory (whether we consider target/ or target/debug/) but before applying this backup-exclusion login.

I can see a straightforward way to fix that. When creating a new target directory:

  • Create a temporary directory.
  • Create the CACHEDIR.TAG file in that temporary directory.
  • Rename the temporary directory to target.

@jstasiak
Copy link
Contributor Author

Yeah I decided against it before but you convinced me to give it a shot – I pushed a commit that does just what you suggest.

@jstasiak
Copy link
Contributor Author

FWIW I had a quick look around and the only other open source backup solution I found (I only looked at open source ones) that supports this out of the box is attic, others (like rsync and bacula) support this in a exclude-directory-if-a-file-named-like-this-exists-in-it way – so while my initial commit message may be technically correct I wanted the above to be known, I don't want to oversell this solution on false premises.

@joshtriplett
Copy link
Member

@jstasiak duplicity similarly has an --exclude-if-present option; I used that when I used duplicity, before I switched to restic.

@joshtriplett
Copy link
Member

joshtriplett commented Jun 24, 2020

Could you please add a test that confirms this creates CACHEDIR.TAG, and a second test that confirms this doesn't create CACHEDIR.TAG in an existing target dir?

@jstasiak
Copy link
Contributor Author

Sure, done. I wasn't sure if `testsuite/build.rs`` was the right place for them but I didn't find a better one, let me know if it's ok.

This commit changed things a bit: 6263d72.
@jstasiak
Copy link
Contributor Author

jstasiak commented Jul 1, 2020

I'm still wondering whether this shouldn't maybe cover the whole target root. Right now I have the following subdirectories in target in one of my projects:

target% du -d 1 -h
121M	./rls
 48M	./release
 67M	./doc
240M	./debug
476M	.

debug and release is covered by this, but rls and doc I've no idea. If they're not I'm happy to add a patch to cover doc in this or a separate PR, rls is a separate repository I think so I can open a PR there, but that'd still mean the target subdirectories are excluded, each one on its own, not the target wholesale.

@joshtriplett
Copy link
Member

@jstasiak I think it would be appropriate to add a CACHEDIR.TAG in doc, and for the code creating rls to do the same there.

If we were doing this from day 1, I would suggest covering all of target with a single CACHEDIR.TAG. But since we're introducing this now, when many other uses may have arisen for target, it does seem safer to handle only the target subdirs cargo creates, and let other tools handle the target subdirs they create.

@jstasiak
Copy link
Contributor Author

jstasiak commented Jul 1, 2020

Ok, fair. I added a patch to add this to target/doc as well + some reorganization to reuse the code that's already there. I wasn't sure about the name for the new create_dir_ function, feel free to change it or I can change it if there's a better suggestion.

If this is accepted I'll create a patch for RLS to match this.

@ehuss
Copy link
Contributor

ehuss commented Jul 1, 2020

I think it is reasonable to consider marking the entire target directory as a cache directory. Unfortunately I don't see much of a discussion about why the current behavior was chosen (#4386 and #3884). @alexcrichton do you have any opinion on marking the entire target directory as a cache directory?

I'm pretty certain we do not want to include the CACHEDIR.TAG in target/doc. That directory is a "final artifact" that is often used for publishing, which is definitely not a temporary cache. Usually that is done by copying or compressing the entire directory wholesale to some output, and we don't want people pull in that file by accident.

Also, I'd prefer to avoid adding additional tests for each command. I think it's fine to just have a single test that runs multiple commands as needed. (Each additional test has a cost, so it helps to avoid adding too many unless necessary.)

(I'm amused to see Josh has experience with this from 9 years ago.)

@jstasiak
Copy link
Contributor Author

jstasiak commented Jul 1, 2020

OK, I went ahead and added two more commits: one changing the logic to exclude whole target/ from backups (I didn't move the new created_dir_..._atomic function back into layout (now that it's used only there) as I didn't want to add even more lines to the diff but I can do that if it's desired. Given that the tests pass there should be some variant of this patch (cherry-picking involved) in this PR to get this merged, I hope. :)

PS. I also saw that Firefox thread ;)

@joshtriplett joshtriplett added the T-cargo Team: Cargo label Jul 1, 2020
@joshtriplett
Copy link
Member

@jstasiak Thank you for working on this, and for dealing with all the back-and-forth in requirements.

I think this is ready to merge.

@rfcbot merge

@rfcbot
Copy link
Collaborator

rfcbot commented Jul 1, 2020

Team member @joshtriplett has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period An FCP proposal has started, but not yet signed off. disposition-merge FCP with intent to merge final-comment-period FCP — a period for last comments before action is taken and removed proposed-final-comment-period An FCP proposal has started, but not yet signed off. labels Jul 1, 2020
@rfcbot
Copy link
Collaborator

rfcbot commented Jul 2, 2020

🔔 This is now entering its final comment period, as per the review above. 🔔

@ehuss
Copy link
Contributor

ehuss commented Jul 2, 2020

Thanks @jstasiak!

@bors r+

@bors
Copy link
Collaborator

bors commented Jul 2, 2020

📌 Commit 5f2ba2b has been approved by ehuss

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 2, 2020
@bors
Copy link
Collaborator

bors commented Jul 2, 2020

⌛ Testing commit 5f2ba2b with merge cf3bfc9...

@bors
Copy link
Collaborator

bors commented Jul 2, 2020

☀️ Test successful - checks-azure
Approved by: ehuss
Pushing cf3bfc9 to master...

@bors bors merged commit cf3bfc9 into rust-lang:master Jul 2, 2020
@jstasiak
Copy link
Contributor Author

jstasiak commented Jul 2, 2020

Thank you all for the comments and patience, I'm really glad this managed to land. :) I'm still slightly unsure if maybe errors on writing to CACHEDIR.TAG shouldn't be silenced (I followed the lead of Time Machine-specific code already there) but on the other hand I can't imagine too many scenarios in which creating and renaming a directory will work but writing a small number of bytes to a file inside the directory won't so I guess it's not worth worrying about this too much?

Thanks again!

@jstasiak jstasiak deleted the backups branch July 2, 2020 16:49
bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 3, 2020
Update cargo, rls

## cargo
14 commits in c26576f9adddd254b3dd63aecba176434290a9f6..fede83ccf973457de319ba6fa0e36ead454d2e20
2020-06-23 16:21:21 +0000 to 2020-07-02 21:51:34 +0000
- Fix overflow error on 32-bit. (rust-lang/cargo#8446)
- Exclude the target directory from backups using CACHEDIR.TAG (rust-lang/cargo#8378)
- CONTRIBUTING.md: Link to Zulip rather than Discord (rust-lang/cargo#8436)
- Update built-in help for features (rust-lang/cargo#8433)
- Update core-foundation requirement from 0.7.0 to 0.9.0 (rust-lang/cargo#8432)
- Parse `# env-dep` directives in dep-info files (rust-lang/cargo#8421)
- Move string interning to util (rust-lang/cargo#8419)
- Expose built cdylib artifacts in the Compilation structure (rust-lang/cargo#8418)
- Remove unused serde_derive dependency from the crates.io crate (rust-lang/cargo#8416)
- Remove unused remove_dir_all dependency (rust-lang/cargo#8412)
- Improve git error messages a bit (rust-lang/cargo#8409)
- Improve the description of Config.home_path (rust-lang/cargo#8408)
- Improve support for non-`master` main branches (rust-lang/cargo#8364)
- Document that OUT_DIR in JSON messages is an absolute path (rust-lang/cargo#8403)

## rls
2020-06-19 15:36:00 +0200 to 2020-06-30 23:34:52 +0200
- Update cargo (rust-lang/rls#1686)
jstasiak added a commit to jstasiak/rls that referenced this pull request Jul 12, 2020
Cargo excludes newly created target/ from backups[1]. This is an attempt
to make rls interact nicely with it and prevent build directories like
target/debug/ and target/release/ from polluting backups.

The cargo dependency is updated to the latest cargo in order for the
paths::create_dir_all_excluded_from_backups_atomic() function to become
available to rls.

[1] rust-lang/cargo#8378
@rfcbot rfcbot added finished-final-comment-period FCP complete and removed final-comment-period FCP — a period for last comments before action is taken labels Jul 12, 2020
Minoru added a commit to Minoru/dotfiles that referenced this pull request Sep 19, 2020
"target" is Cargo's (Rust's) build directory. Cargo recently learned to
put CACHEDIR.TAG in those directories[1], so I no longer need to exclude
them manually.

1. rust-lang/cargo#8378
jstasiak added a commit to jstasiak/cargo that referenced this pull request Dec 13, 2020
This follows rust-langGH-8378 which applies this technique to target directories
inside projects. With the patch two large-ish throwaway directories are
also excluded from backups.

I decided against excluding $CARGO_HOME/bin or $CARGO_HOME as a whole,
because there may be important binaries needed by a user to run their
system and there are credentials in the credentials file – I'm happy to
simplify this and exclude whole $CARGO_HOME though.
@ehuss ehuss added this to the 1.46.0 milestone Feb 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge FCP with intent to merge finished-final-comment-period FCP complete S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-cargo Team: Cargo
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants