Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Can't match files based on shebang language #257
Often we have scripts written in Python with no extension. Hooks usually miss these since they match on extension.
Would be nice if pre-commit could somehow read shebangs. Maybe something to think about when trying to solve #220 as well. Not sure how to best approach this yet.
I spent a little time trying to think of a good way to handle this, as well as #254 (ability for hooks to target types of objects, e.g. file/symlink/submodule) and #220 (ability for hooks to target "text" files and avoid list of "text" extensions).
Here's my suggestion:
Files get automatically tagged
Each file gets tagged with a list of tags describing it. For example, an executable file
Hooks target file types
Hooks can then specify
There's an added benefit in that we no longer need to duplicate the hook extensions in repos if we want to change the path.
Currently we have to do
I would suggest not including an
Making this happen is fairly easy (and I'd volunteer to do the work, since I really want to check files with shebangs):
It's pretty much backwards compatible. Nothing changes about
The main issue is if people upgrade hooks but don't upgrade pre-commit. I don't know a good solution here, unfortunately.
My thinking is the AND them. The list of objects to consider running hooks on is built as it is currently (no change; all objects that match the
In this way, hooks are entirely backwards compatible (except for symlinks/submodules, assuming the
I think this is also the most obvious from a developer's perspective. For
You can use
Overall: let's do it!
The problem with that is that you have to go check what the hook uses for
Agreed it's not a huge deal in practice but it does feel kinda yucky.
Just one clarification before I start coding. I'm envisioning updating a hook like
Basically, the idea would be to omit
This will break the case where people do
Is that acceptable? The good news is it only breaks if people autoupdate since we pin SHAs, and it is usually easy to update pre-commit.
The new version of pre-commit would still work with old repos.
I think we should require
My concern about removing
Agreed that we should still require either
Unfortunately the stdlib mimetype module looks at the file extension (path name) and nothing else.
I can understand if pre-commit isn't interested in handling files without extensions as it does make things more complex. I think it's probably a fairly common use-case, though.
My thinking is that we can write a fairly simple classifier that just reads file extensions and shebangs, and it will work great in almost every case. Here is my first attempt:
The tags look like:
The code is probably not entirely correct, but this is the spirit of what I'd like to do.
As you mentioned, it does require maintaining a mapping of extension to file type plus interpreter to file type. Maybe maintaining this in pre-commit doesn't make sense, and it would be better to package it? (I don't mind doing that.) Especially when I think about writing tests for this, packaging it seems to make some sense.