Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removal of untracked files/directories and the current working directory #1085

Closed
wants to merge 17 commits into from

Conversation

newren
Copy link
Contributor

@newren newren commented Sep 7, 2021

Here's a series of patches that started with a bug report about the current working directory being deleted by rebase[1], and is based on a slightly modified idea from Peff to make the code handle such cases more carefully[2]. While working on it, I discovered that my premise that codepaths already carefully protect untracked (but not ignored) files was incorrect, and another rebase bug report came in at the time about that issue as well[3]. I'm using Junio's declaration about checkout -f and reset --hard (and also presuming that since read-tree --reset is porcelain that its behavior should be left alone)[4] in this series. The fun didn't end there either; I also discovered some other bugs in am & stash as a side effect and which affected my ability to write testcases for those commands. And I noticed a few places that the documentation should be improved.

So, a long series. I could potentially break it up into five (partially dependent) series if others prefer:

  • Patches 1-3: fix up stash bugs I found while doing this work
  • Patches 4-5: document some am bugs and fix one of them
  • Patch 6: testcase cleanups
  • Patch 7-10: new testcases for nuking of untracked files, fixes for some them, and better docs
  • Patches 11-18: new testcases for nuking cwd and fixes for all these testcases
    The first three series would be independent, and the last two would depend on all the previous ones. Let me know if you'd rather see them submitted this way.

SIDENOTE (since it was brought up in the threads I reference):

There's another related topic here of treating ignored files as precious instead of deleting them. I do not try to handle that here, but I believe that would actually be relatively easy to handle. If you leave unpack_trees_options->dir as NULL, then they are treated as precious (my original patch 8 made that mistake). There's a few other locations that already optionally set up unpack_trees_options->dir (a quick search for "overwrite_ignore" and "overwrite-ignore" will find them), so we'd just need to copy that optionality to these new callsites (and perhaps make a global core.overwrite_ignored config option to affect all of these). Of course, doing so would globally treat ignored files as precious rather than allowing them to be configured on a per-path bases, but honestly I think the idea of configuring ignored files as precious on a per-path basis sounds like insanity (we have enough bugs with untracked and ignored files without adding yet another type, tla/baz was excessively confusing to me due in part to the number of types of files and I'd rather not see such ideas ported to git, configuring per-path rules sounds like lots of work for end users to configure, etc.), so if someone wants to pursue the precious-ignored concept then I'd much rather see it done as a global setting. Just my $0.02.

[1] https://lore.kernel.org/git/xmqqv93n7q1v.fsf@gitster.g/
[2] https://lore.kernel.org/git/YS8eEtwQvF7TaLCb@coredump.intra.peff.net/
[3] https://lore.kernel.org/git/CABPp-BFyR19ch71W10oJDFuRX1OHzQ3si971pMn6dPtHKxJDXQ@mail.gmail.com/
[4] https://lore.kernel.org/git/xmqqr1e2ejs9.fsf@gitster.g/

cc: Johannes Sixt j6t@kdbg.org
cc: Jeff King peff@peff.net
cc: Taylor Blau me@ttaylorr.com
cc: Ævar Arnfjörð Bjarmason avarab@gmail.com
cc: Fedor Biryukov fedor.birjukov@gmail.com
cc: Yuri yuri@rawbw.com
cc: Philip Oakley philipoakley@iee.email

@newren newren force-pushed the untracked_and_cwd_removal_and_misc branch from 9fd54a7 to 828fdd0 Compare September 7, 2021 22:02
Both Johannes and I assumed (perhaps due to familiarity with rebase)
that am --abort would return the user to a clean state.  However, since
am, unlike rebase, is intended to be used within a dirty working tree,
--abort will only clean the files involved in the am operation.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Noting that unpack_trees treats reset=1 & update=1 as license to nuke
untracked files, I looked for code paths that use this combination and
tried to generate testcases which demonstrated unintentional loss of
untracked files and directories.  I found several.

I also include testcases for `git reset --{hard,merge,keep}`.  A hard
reset is perhaps the most direct test of unpack_tree's reset=1 behavior,
but we cannot make `git reset --hard` preserve untracked files without
some migration work.

Also, the two commands `checkout --force` (because of the --force) and
`read-tree --reset` (because it's plumbing and we need to keep it
backward compatible) were left out as we expect those to continue
removing untracked files and directories.

Signed-off-by: Elijah Newren <newren@gmail.com>
Traditionally, unpack_trees_options->reset was used to signal that it
was okay to delete any untracked files in the way.  This was used by
`git read-tree --reset`, but then started appearing in other places as
well.  However, many of the other uses should not be deleting untracked
files in the way.  Split this into two separate fields:
   reset_nuke_untracked
   reset_keep_untracked
and, since many code paths in unpack_trees need to be followed for both
of these flags, introduce a third one for convenience:
   reset_either
which is simply an or-ing of the other two.

Modify existing callers so that
   read-tree --reset
   reset --hard
   checkout --force
continue using reset_nuke_untracked, but so that other callers,
including
   am
   checkout without --force
   stash  (though currently dead code; reset always had a value of 0)
   numerous callers from rebase/sequencer to reset_head()
will use the new reset_keep_untracked field.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
In the last few commits we focused on code in unpack-trees.c that
mistakenly removed untracked files or directories.  There may be more of
those, but in this commit we change our focus: callers of toplevel
commands that are expected to remove untracked files or directories.

As noted previously, we have toplevel commands that are expected to
delete untracked files such as 'read-tree --reset', 'reset --hard', and
'checkout --force'.  However, that does not mean that other highlevel
commands that happen to call these other commands thought about or
conveyed to users the possibility that untracked files could be removed.
Audit the code for such callsites, and add comments near existing
callsites to mention whether these are safe or not.

My auditing is somewhat incomplete, though; it skipped several cases:
  * git-rebase--preserve-merges.sh: is in the process of being
    deprecated/removed, so I won't leave a note that there are
    likely more bugs in that script.
  * contrib/git-new-workdir: why is the -f flag being used in a new
    empty directory??  It shouldn't hurt, but it seems useless.
  * git-p4.py: Don't see why -f is needed for a new dir (maybe it's
    not and is just superfluous), but I'm not at all familiar with
    the p4 stuff
  * git-archimport.perl: Don't care; arch is long since dead
  * git-cvs*.perl: Don't care; cvs is long since dead

Signed-off-by: Elijah Newren <newren@gmail.com>
Some commands have traditionally also removed untracked files (or
directories) that were in the way of a tracked file we needed.  Document
these cases.

Signed-off-by: Elijah Newren <newren@gmail.com>
Numerous commands will remove empty working directories, especially if
they are in the way of placing needed files.  That is normally fine, but
removing the current working directory can cause confusion for the user
when they run subsequent commands.  Add some tests checking for such
problems.

Signed-off-by: Elijah Newren <newren@gmail.com>
Removing the current working directory causes all subsequent git
commands (and likely a number of non-git commands) run from that
directory to get confused and fail with a message about being unable to
read the current working directory.  That confuses end users,
particularly since the command they get the error from is not the one
that caused the problem; the problem came from the side-effect of some
previous command.

We would like to avoid removing the current working directory; towards
this end, introduce a new the_cwd variable that tracks the current
working directory.  Subsequent commits will make use of this new
variable.

Signed-off-by: Elijah Newren <newren@gmail.com>
In the past, when a directory needs to be removed to make room for a
file, we have always errored out when that directory contains any
untracked (but not ignored) files.  Add an extra condition on that: also
error out if the directory is the current working directory.

Signed-off-by: Elijah Newren <newren@gmail.com>
When running e.g. `git reset --hard` from a subdirectory, and that
subdirectory is in the way of adding needed files, bail with an error
message.

Note that we kind of a couple lines of duplicated code here; the last
two lines of verify_clean_subdirectory() is the same as two of the lines
added here.  Trying to share that code would require moving it to the
end of verify_absent_1(), making the reset_nuke_tracked case jump
directly to it, and then have the multiple !reset_nuke_tracked cases not
return early but run through that code.  We do not want to simply check
the cwd-in-the-way at the beginning of verify_absent_1() for the
!reset_nuke_tracked case, because any error messages about untracked
files still being in the directory should take precedence over error
messages about removing the current working directory.

Signed-off-by: Elijah Newren <newren@gmail.com>
symlinks has a pair of schedule_dir_for_removal() and
remove_scheduled_dirs() functions that ensure that directories made
empty by removing other files also themselves get removed.  However, we
want to exclude the current working directory and leave it around so
that subsequent git commands (and non-git commands) that the user runs
afterwards don't cause the user to get confused.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
dir has a convenient remove_path() helper that will both remove a file
in a directory and remove its containing directory if it becomes empty
as a result of the removal, recursing all the way up.  However, we do
not want the current working directory to be removed, even if it becomes
empty.

dir also has a remove_dir_recursively() function which appears to mostly
be used to remove metadata directories or temporary directories or
submodules or worktrees.  I am not sure if it needs to be protected
against removing the current working directory, but did so for good
measure.

Signed-off-by: Elijah Newren <newren@gmail.com>
@newren newren force-pushed the untracked_and_cwd_removal_and_misc branch from 7020d99 to 6f772bb Compare September 11, 2021 23:07
@newren newren closed this Sep 12, 2021
@newren newren deleted the untracked_and_cwd_removal_and_misc branch December 31, 2021 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant