Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New hook 'destroyed-symlinks' #511

Merged
merged 1 commit into from
Nov 18, 2020
Merged

Conversation

m-khvoinitsky
Copy link
Contributor

@m-khvoinitsky m-khvoinitsky commented Aug 2, 2020

The hook detects symlinks which are changed to regular files with a content of a path which that symlink was pointing to.
This usually happens on Windows in case when user without a permission for creating symlinks clones repository with symlinks.

Some implementation notes: bytes is used for filenames in order to avoid having any troubles with encodings. After all, the hook check for destroyed symlinks, not for bad file encoding. ;)

P. S. If you don't think that this hook would be useful for general audience, I'm fine with hosting it in the separate repo. Just wanted to try.

@m-khvoinitsky
Copy link
Contributor Author

No idea why coverage report is different for pypy, will investigate it.

@m-khvoinitsky
Copy link
Contributor Author

I've finally managed to find a time to dig into pypy issue. The reason was the following. In CPython in code like this:

print('something', file=open(somefile, 'a'))

file object losts its last reference it this function call and it is destroyed forcing file buffers to flush. So, for some reason, this doesn't happen in pypy. Workaround would be using explicit flush=True or using context manager which I did.
@asottile, please, take a look.

pre_commit_hooks/destroyed_symlinks.py Outdated Show resolved Hide resolved
).split(b'\0'):
splitted = line.split(b' ')
if splitted and splitted[0] == ORDINARY_CHANGED_ENTRIES_MARKER:
# variable names are taken from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these variable names are terrible, I don't particularly care if they match upstream let's make them readable

import sys
from operator import methodcaller
from subprocess import check_call
from subprocess import check_output
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rest of the project prefers not to from import from the standard library

def find_destroyed_symlinks(autofix: bool) -> Sequence[bytes]:
destroyed_links = []
for line in check_output(
['git', 'status', '--porcelain=v2', '-z'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre_commit_hooks/destroyed_symlinks.py Outdated Show resolved Hide resolved
# by something like trailing-whitespace and/or
# mixed-line-ending hooks so we need to go deeper
index_size = int(
check_output(['git', 'cat-file', '-s', hI]).strip(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a cmd_output helper in pre_commit_hooks/util.py that probably makes most of this easier to read

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    kwargs.setdefault('stderr', subprocess.PIPE)

This eats stderr output which is probably not the best idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's just the default, you can configure it otherwise. and actually it's probably better as a pipe so the errors only display when the tool breaks

index_size = int(
check_output(['git', 'cat-file', '-s', hI]).strip(),
)
# Most filesystems limit path length to 4096 bytes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

windows allows paths longer than this with UNC naming

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, there definitely should be a limit in order to avoid wasting RAM in case when symlink was deliberately replaced with big file, which, on the other hand is not good idea in git so I don't know what is better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can probably limit the read to the original symlink destination filename's length (that's how windows usually represents these on disk anyway iirc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can probably limit the read to the original symlink destination filename's length

Plus a few bytes for various newlines. That's a good idea.

if autofix:
check_call([
'git',
'update-index',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre-commit's stance is to never modify the staging area, I believe this does that so it can't be allowed -- printing this command seems fine though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, update-index is redundant here and even harmful — if the file was modified by some other hooks, symlinks would be broken. Instead, I should just unstage the file. Is this allowed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just leave it and print a command for the user to run

Copy link
Contributor Author

@m-khvoinitsky m-khvoinitsky Nov 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if this is opt-in feature (which it is)? May be rename --autofix to --unstage to make it more explicit?
Printing the command (and may be some help) if --autofix is not enabled is good idea though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's kinda two problems:

  1. pre-commit won't notice the differences
  2. it might get auto committed when wrong

if there's a way to modify the file to look like it's the symlink again in the unstaged portion that would be best

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed --autofix, now only the message with an instruction is displayed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there's a way to modify the file to look like it's the symlink again in the unstaged portion that would be best

If it was possible, git would have done this. That's a problem.

entry: destroyed-symlinks
language: python
types: [file]
pass_filenames: false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should still take filenames and filter the results to that -- that way during merge conflict resolution if someone else makes a mistake this doesn't blame the user performing the merge

source_repo = tmpdir.join('src')
os.makedirs(source_repo, exist_ok=True)
test_repo = tmpdir.join('test')
with source_repo.as_cwd():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that you can use git -C source_repo ... instead of as_cwd()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know but this way it's a bit more readable IMO.

Comment on lines 27 to 30
*itertools.chain(
('git', 'status', '--porcelain=v2', '-z', '--'),
files,
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be:

cmd_output('git', 'status', '--porcelain=v2', '-z', '--', *files)

README.md Outdated
Detects symlinks which are changed to regular files with a content of a path which that symlink was pointing to.
This usually happens on Windows in case when user without a permission for creating symlinks clones repository with symlinks.
The following argument is available:
- `--autofix` - unstage detected broken symlinks so they won't be commited.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this option is gone now

mode_HEAD == PERMS_LINK,
mode_index != PERMS_LINK,
mode_index != PERMS_NONEXIST,
)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all(( here is going to be slower than putting and to join these conditions



def normalize_content(content: bytes) -> bytes:
return content.rstrip()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably just inline this

print('You should unstage affected files:')
print(
' git reset HEAD -- {}'.format(
' '.join(map(shlex.quote, destroyed_links)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't use map / filter / reduce -- a list comprehension is more readable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tastes differ.

print(
'And retry commit. As a long term solution '
'you may try to explicitly tell git that your '
'envionment does not support symlinks:',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

environment

'you may try to explicitly tell git that your '
'envionment does not support symlinks:',
)
print(' git config --local core.symlinks false')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc --local is the default for this command

@@ -13,7 +15,11 @@ def added_files() -> Set[str]:
return set(cmd_output(*cmd).splitlines())


def cmd_output(*cmd: str, retcode: Optional[int] = 0, **kwargs: Any) -> str:
def cmd_output(
*cmd: Union[str, bytes],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mixing here used to have problems on windows, I'm not sure if it's still a problem in python3 though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted. It was needed in the previous version where everything was bytes. I don't have Windows nearby to check at the moment, however, tests weren't failing.
They use os.fsdecode (subprocess.py#L561) which, quoting the doc, decode the path-like filename from the filesystem encoding with 'surrogateescape' error handler, or 'strict' on Windows; return str unchanged (good bad example of mixing str and bytes).

… regular files with a content of a path which that symlink was pointing to; move zsplit to util
Copy link
Member

@asottile asottile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asottile asottile merged commit 4b863f1 into pre-commit:master Nov 18, 2020
@m-khvoinitsky
Copy link
Contributor Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants