New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removal of untracked files/directories and the current working directory #1085
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
newren
force-pushed
the
untracked_and_cwd_removal_and_misc
branch
from
September 7, 2021 22:02
9fd54a7
to
828fdd0
Compare
Both Johannes and I assumed (perhaps due to familiarity with rebase) that am --abort would return the user to a clean state. However, since am, unlike rebase, is intended to be used within a dirty working tree, --abort will only clean the files involved in the am operation. Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Noting that unpack_trees treats reset=1 & update=1 as license to nuke untracked files, I looked for code paths that use this combination and tried to generate testcases which demonstrated unintentional loss of untracked files and directories. I found several. I also include testcases for `git reset --{hard,merge,keep}`. A hard reset is perhaps the most direct test of unpack_tree's reset=1 behavior, but we cannot make `git reset --hard` preserve untracked files without some migration work. Also, the two commands `checkout --force` (because of the --force) and `read-tree --reset` (because it's plumbing and we need to keep it backward compatible) were left out as we expect those to continue removing untracked files and directories. Signed-off-by: Elijah Newren <newren@gmail.com>
Traditionally, unpack_trees_options->reset was used to signal that it was okay to delete any untracked files in the way. This was used by `git read-tree --reset`, but then started appearing in other places as well. However, many of the other uses should not be deleting untracked files in the way. Split this into two separate fields: reset_nuke_untracked reset_keep_untracked and, since many code paths in unpack_trees need to be followed for both of these flags, introduce a third one for convenience: reset_either which is simply an or-ing of the other two. Modify existing callers so that read-tree --reset reset --hard checkout --force continue using reset_nuke_untracked, but so that other callers, including am checkout without --force stash (though currently dead code; reset always had a value of 0) numerous callers from rebase/sequencer to reset_head() will use the new reset_keep_untracked field. Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
In the last few commits we focused on code in unpack-trees.c that mistakenly removed untracked files or directories. There may be more of those, but in this commit we change our focus: callers of toplevel commands that are expected to remove untracked files or directories. As noted previously, we have toplevel commands that are expected to delete untracked files such as 'read-tree --reset', 'reset --hard', and 'checkout --force'. However, that does not mean that other highlevel commands that happen to call these other commands thought about or conveyed to users the possibility that untracked files could be removed. Audit the code for such callsites, and add comments near existing callsites to mention whether these are safe or not. My auditing is somewhat incomplete, though; it skipped several cases: * git-rebase--preserve-merges.sh: is in the process of being deprecated/removed, so I won't leave a note that there are likely more bugs in that script. * contrib/git-new-workdir: why is the -f flag being used in a new empty directory?? It shouldn't hurt, but it seems useless. * git-p4.py: Don't see why -f is needed for a new dir (maybe it's not and is just superfluous), but I'm not at all familiar with the p4 stuff * git-archimport.perl: Don't care; arch is long since dead * git-cvs*.perl: Don't care; cvs is long since dead Signed-off-by: Elijah Newren <newren@gmail.com>
Some commands have traditionally also removed untracked files (or directories) that were in the way of a tracked file we needed. Document these cases. Signed-off-by: Elijah Newren <newren@gmail.com>
Numerous commands will remove empty working directories, especially if they are in the way of placing needed files. That is normally fine, but removing the current working directory can cause confusion for the user when they run subsequent commands. Add some tests checking for such problems. Signed-off-by: Elijah Newren <newren@gmail.com>
Removing the current working directory causes all subsequent git commands (and likely a number of non-git commands) run from that directory to get confused and fail with a message about being unable to read the current working directory. That confuses end users, particularly since the command they get the error from is not the one that caused the problem; the problem came from the side-effect of some previous command. We would like to avoid removing the current working directory; towards this end, introduce a new the_cwd variable that tracks the current working directory. Subsequent commits will make use of this new variable. Signed-off-by: Elijah Newren <newren@gmail.com>
In the past, when a directory needs to be removed to make room for a file, we have always errored out when that directory contains any untracked (but not ignored) files. Add an extra condition on that: also error out if the directory is the current working directory. Signed-off-by: Elijah Newren <newren@gmail.com>
When running e.g. `git reset --hard` from a subdirectory, and that subdirectory is in the way of adding needed files, bail with an error message. Note that we kind of a couple lines of duplicated code here; the last two lines of verify_clean_subdirectory() is the same as two of the lines added here. Trying to share that code would require moving it to the end of verify_absent_1(), making the reset_nuke_tracked case jump directly to it, and then have the multiple !reset_nuke_tracked cases not return early but run through that code. We do not want to simply check the cwd-in-the-way at the beginning of verify_absent_1() for the !reset_nuke_tracked case, because any error messages about untracked files still being in the directory should take precedence over error messages about removing the current working directory. Signed-off-by: Elijah Newren <newren@gmail.com>
symlinks has a pair of schedule_dir_for_removal() and remove_scheduled_dirs() functions that ensure that directories made empty by removing other files also themselves get removed. However, we want to exclude the current working directory and leave it around so that subsequent git commands (and non-git commands) that the user runs afterwards don't cause the user to get confused. Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
dir has a convenient remove_path() helper that will both remove a file in a directory and remove its containing directory if it becomes empty as a result of the removal, recursing all the way up. However, we do not want the current working directory to be removed, even if it becomes empty. dir also has a remove_dir_recursively() function which appears to mostly be used to remove metadata directories or temporary directories or submodules or worktrees. I am not sure if it needs to be protected against removing the current working directory, but did so for good measure. Signed-off-by: Elijah Newren <newren@gmail.com>
newren
force-pushed
the
untracked_and_cwd_removal_and_misc
branch
from
September 11, 2021 23:07
7020d99
to
6f772bb
Compare
I decided to split this review up into separate series: |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Here's a series of patches that started with a bug report about the current working directory being deleted by rebase[1], and is based on a slightly modified idea from Peff to make the code handle such cases more carefully[2]. While working on it, I discovered that my premise that codepaths already carefully protect untracked (but not ignored) files was incorrect, and another rebase bug report came in at the time about that issue as well[3]. I'm using Junio's declaration about
checkout -f
andreset --hard
(and also presuming that sinceread-tree --reset
is porcelain that its behavior should be left alone)[4] in this series. The fun didn't end there either; I also discovered some other bugs in am & stash as a side effect and which affected my ability to write testcases for those commands. And I noticed a few places that the documentation should be improved.So, a long series. I could potentially break it up into five (partially dependent) series if others prefer:
The first three series would be independent, and the last two would depend on all the previous ones. Let me know if you'd rather see them submitted this way.
SIDENOTE (since it was brought up in the threads I reference):
There's another related topic here of treating ignored files as precious instead of deleting them. I do not try to handle that here, but I believe that would actually be relatively easy to handle. If you leave unpack_trees_options->dir as NULL, then they are treated as precious (my original patch 8 made that mistake). There's a few other locations that already optionally set up unpack_trees_options->dir (a quick search for "overwrite_ignore" and "overwrite-ignore" will find them), so we'd just need to copy that optionality to these new callsites (and perhaps make a global core.overwrite_ignored config option to affect all of these). Of course, doing so would globally treat ignored files as precious rather than allowing them to be configured on a per-path bases, but honestly I think the idea of configuring ignored files as precious on a per-path basis sounds like insanity (we have enough bugs with untracked and ignored files without adding yet another type, tla/baz was excessively confusing to me due in part to the number of types of files and I'd rather not see such ideas ported to git, configuring per-path rules sounds like lots of work for end users to configure, etc.), so if someone wants to pursue the precious-ignored concept then I'd much rather see it done as a global setting. Just my $0.02.
[1] https://lore.kernel.org/git/xmqqv93n7q1v.fsf@gitster.g/
[2] https://lore.kernel.org/git/YS8eEtwQvF7TaLCb@coredump.intra.peff.net/
[3] https://lore.kernel.org/git/CABPp-BFyR19ch71W10oJDFuRX1OHzQ3si971pMn6dPtHKxJDXQ@mail.gmail.com/
[4] https://lore.kernel.org/git/xmqqr1e2ejs9.fsf@gitster.g/
cc: Johannes Sixt j6t@kdbg.org
cc: Jeff King peff@peff.net
cc: Taylor Blau me@ttaylorr.com
cc: Ævar Arnfjörð Bjarmason avarab@gmail.com
cc: Fedor Biryukov fedor.birjukov@gmail.com
cc: Yuri yuri@rawbw.com
cc: Philip Oakley philipoakley@iee.email