Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibility of data loss when using git commit --all with destructive hooks #1499

Closed
tchamberlin opened this issue Jun 9, 2020 · 2 comments
Labels

Comments

@tchamberlin
Copy link

tchamberlin commented Jun 9, 2020

I've noticed a troubling difference in behavior between commits performed via git add --all && git commit vs git commit --all with respect to destructive actions taken by hooks. In the former case, the files are left untouched on-disk in staging, but in the latter case there appears to be no way to recover them.

Of course we would hope that our hooks behave, but what if they don't, even if by accident? Here's two explicit examples of the differences in these two use cases when using an (obviously ridiculous) hook that deletes all files-to-be-committed.

Case 1 (git add --all && git commit)

When manually staging files via git add, destructive actions taken by scripts are reversible:

$ echo "this will NOT be lost forever" >> foo.py

$ git add foo.py 

$ git commit -m "this will leave foo.py staged"
Delete Everything........................................................Failed
- hook id: delete_everything
- files were modified by this hook

$ git status --short
MD foo.py

$ git checkout -- foo.py

$ cat foo.py 
print('hello world')
this will NOT be lost forever

Case 2 (git commit --all)

However, when committing via git commit --all, the contents cannot be recovered:

# Simple repo, with just a source file and the pre-commit configuration
$ ls -A
foo.py  .git  .pre-commit-config.yaml

$ cat foo.py 
print('hello world')

$ cat .pre-commit-config.yaml 
repos:
-   repo: https://github.com/tchamberlin/delete-everything-pre-commit
    rev: 14f47f9929795f80c281fc7e95c284a94feea733
    hooks:
    -   id: delete_everything

# Make a local modification
$ echo "this will be lost forever" >> foo.py 

$ cat foo.py
print("hello world")
this will be lost forever

# If we attempt to commit without first staging, then I can't see any way of recovering
# deleted files
$ git commit --all --message "foo.py will lose data that cannot be recovered"
Delete Everything........................................................Failed
- hook id: delete_everything
- files were modified by this hook

# Nothing staged, and file has been deleted
$ git status --short
 D foo.py

# Nothing stashed
$ git stash list

# Nothing useful in the cache:
$ ls -A ~/.cache/pre-commit/
db.db  .lock  README  repo4diajf1x

The wording of the option help text for git commit --all indicates that the files should still be staged prior to commit:

-a, --all
   Tell the command to automatically stage files that have been modified
   and deleted, but new files you have not told Git about are not
   affected.

But, perhaps it's a bit different internally?

Environment

  • OS: Ubuntu 18.04.4 LTS
  • git: 2.7.17
  • pre-commit: 2.4.0
  • Python: 3.8.2

Questions

  1. Am I understanding the situation correctly, or have I missed something (i.e. are the files stashed somewhere that I've missed)?
  2. Is there a workaround that you all have found for this?
  3. Is there anything that can be done to make these situations functionally equivalent on the pre-commit side, or are there simply implementation differences on the git side that make it impossible?

Related resources

pre-commit itself will never touch the staging area

#747

Modifying by default is by-design

pre-commit only runs on the staged contents of files by temporarily saving the contents of your files at commit time and stashing the unstaged changes while running hooks.

Thanks!

@asottile
Copy link
Member

asottile commented Jun 9, 2020

this doesn't have anything to do with pre-commit, this is just how git works 🤷

take your example but instead write a bad shell script at .git/hooks/pre-commit:

$ cat .git/hooks/pre-commit
#!/usr/bin/env bash
set -euxo pipefail
: BAI BAI
rm foo.py
exit 1
$ echo 'lost forever' > foo.py
$ git commit
+ : BAI BAI
+ rm foo.py
+ exit 1
$ git status
On branch master
Changes not staged for commit:
	deleted:    foo.py

no changes added to commit

if you saw output that looks like this it'd be pre-commit's doing as it manages partially-staged things:

[WARNING] Unstaged files detected.
[INFO] Stashing unstaged files to /home/asottile/.cache/pre-commit/patch1591732590.

in the case of git commit --all, things look to pre-commit as if everything is staged (indiscernible from actually being staged)

@tchamberlin
Copy link
Author

Thanks, that helps!

It turns out that the to-be-committed files are staged during git commit --all (which I finally thought to verify with a git status during the pre-commit hook). This of course also means that there are blobs floating around with the to-be-committed state of the files, and we can recover it via git fsck --lost-found or similar (and the original premise of my question is wrong 😄).

Anyway, thanks for the sanity check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

2 participants