Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ocp-indent pre-commit hook #236

Closed
sim642 opened this issue May 20, 2021 · 19 comments
Closed

ocp-indent pre-commit hook #236

sim642 opened this issue May 20, 2021 · 19 comments
Assignees
Labels

Comments

@sim642
Copy link
Member

sim642 commented May 20, 2021

make dev sets up a Git pre-commit hook, which prevents committing any files with indentation that the tool considers "wrong":

analyzer/make.sh

Lines 102 to 110 in f99164f

;; dev)
echo "Installing opam packages for development..."
opam install utop ocaml-lsp-server ocp-indent ocamlformat ounit2 earlybird
# ocaml-lsp-server is needed for https://github.com/ocamllabs/vscode-ocaml-platform
echo "Be sure to adjust your vim/emacs config!"
echo "Installing Pre-commit hook..."
cd .git/hooks; ln -s ../../scripts/hooks/pre-commit; cd -
echo "Installing gem parallel (not needed for ./scripts/update_suite.rb -s)"
sudo gem install parallel

I deleted this hook locally already years ago and I guess many of you have done the same (or never even set it up). It was impossible to make tiny correctly indented changes because it also forced to fix the indentation of the rest of the file, which hadn't been touched for a long time (I guess ocp-indent cannot be told to just complain about the modified changes). This is clearly still an issue: vandah@4e8840c.

It forces the awful practice of doing logic changes together with completely unrelated formatting changes in the same commit. This obfuscates the Git history and makes Git blame more difficult to use, as one cannot just ignore formatting commits if they also contain logic changes: https://github.com/goblint/analyzer/blob/master/.git-blame-ignore-revs. Our suggested make dev setup should not be encouraging this practice!

I'm wondering what everyone's thoughts about this are, because it doesn't seem great that we recommend a pre-commit hook, which will immediately hinder development and just push everyone to remove the hook, which in itself leads again to more misindented code being committed.

One thought I have is to remove the hook, so locally you can make all the changes without being forced to combine unrelated formatting changes. Instead we could have a GitHub Actions job that runs ocp-indent. That would of course require fixing all the indentation problems that already are around and also agreeing that ocp-indent is the gold standard for our code style.

@sim642 sim642 added the cleanup label May 20, 2021
@michael-schwarz
Copy link
Member

I'm wondering what everyone's thoughts about this are

Honestly, I would just go with removing the hook. I think indenting as it currently is in the code base is not too bad, and there might be specific instances where breaking conventions as enforced by ocp-indent actually makes the code easier to understand.

@jerhard
Copy link
Member

jerhard commented May 20, 2021

I think having a common, automatable indentation style would be a good thing. From what I saw, the changes that ocp-indent makes to indentation are mostly minor, but it makes the indentation more consistent. It does not seem to insert line breaks depending on the syntax, which would limit flexibility.

That would of course require fixing all the indentation problems that already are around and also agreeing that ocp-indent is the gold standard for our code style.

Are there other indentation tools for OCaml? Not that I know of.

@vogler
Copy link
Collaborator

vogler commented May 20, 2021

It forces the awful practice of doing logic changes together with completely unrelated formatting changes in the same commit. This obfuscates the Git history and makes Git blame more difficult to use ...

Yes, true, also deleted it. Was too lazy to make it only apply to the diff.
Changing the xargs arg from

git diff --cached --name-only | grep -E ".*\.ml(i|l|y)?$" | xargs -I% bash -c 'diff <(git show :%) <(git show :% | ocp-indent) || (echo; echo "ocp-indent says you suck at indentation in % (diff shown above)."; false)' || (echo "You shall not commit!"; exit 1)

to

diff <(diff <(git show HEAD:%) <(git show :%)) <(diff <(git show HEAD:% | ocp-indent) <(git show :% | ocp-indent))

would work, but it seems pretty convoluted and maybe too expensive. It shows the diff between the to-be-committed changes and the same run through ocp-indent.
ocp-indent has an option --lines=RANGE, it's probably better to use that.
Sadly, it doesn't support multiple ranges (ocp-indent -l10-15 -l110-130 src/solvers/td3.ml), so you'd have to call it for all hunks of a file.

One thought I have is to remove the hook, so locally you can make all the changes without being forced to combine unrelated formatting changes. Instead we could have a GitHub Actions job that runs ocp-indent. That would of course require fixing all the indentation problems that already are around and also agreeing that ocp-indent is the gold standard for our code style.

I think it's good to have it run on the changes you want to commit.
Idk how well it would work to have it run as an action. I could imagine that it also becomes tedious since you might commit and push something, then it creates a commit that fixes indention, you continue working, want to push again, but can't, pull and get merge conflicts. Also, when you want to revert a commit, you then have to revert the indent-commit etc.

Probably best to run ocp-indent on everything when we feel like it's a good time, and in-between rely on the hook to at least keep the changes well-indented.
I think last time I ran it on everything, there also were some changes that I didn't want to stage since they made the code less readable or split it up too much - don't remember what it was.

@vogler
Copy link
Collaborator

vogler commented May 20, 2021

Are there other indentation tools for OCaml? Not that I know of.

There's also https://github.com/ocaml-ppx/ocamlformat, but that's probably too disruptive since it just pretty-prints the parse tree and adds the comments.

@sim642
Copy link
Member Author

sim642 commented May 20, 2021

I think last time I ran it on everything, there also were some changes that I didn't want to stage since they made the code less readable or split it up too much - don't remember what it was.

This is exactly why I was wondering if ocp-indent should be our gold standard. Because if there are places where we don't agree, it's very inconvenient. You cannot then just blindly reindent the whole project then. It's also a major problem for using the pre-commit hook: it's almost impossible to make changes to a file which contains such indentation that ocp-indent doesn't agree with. To do that you'd have to remove the hook, commit and then re-add the hook.

Not sure if ocp-indent has any means to disable it for some section of code (e.g. with comments). If it does, then that would help. If it doesn't, then enforcing the style is problematic.

Idk how well it would work to have it run as an action. I could imagine that it also becomes tedious since you might commit and push something, then it creates a commit that fixes indention

With Actions I just meant it would complain in a CI job, but wouldn't actually make new commits automatically. You'd see the job having failed because of indentation problems and then you can fix it yourself. Due to the aforementioned indentation disagreements, maybe it would be better if the CI job always succeeded (doesn't cause failure) but just outputs warnings. Of course that could mean nobody looks at them, although it would at least be better than all regular developers removing the hook and nobody seeing the output.

Or if we make the ocp-indent CI job do the slightly more expensive thing of only looking at different modified ranges, it could also cause failures to prevent new misindentation being introduced. Although it would add an extra level of complication since you could push many commits at once while the CI job would only run on the last one.

Honestly, I would just go with removing the hook. I think indenting as it currently is in the code base is not too bad

I agree with this. It hasn't been a big issue without using the hook, so we'd probably be fine without. And if anyone notices definitely misindented code in PRs or just browsing the code, it's not hard to manually make those fixes as they're found.
ocp-indent is often more pedantic than what would also be completely acceptable and readable, and exists in the codebase already.

@vogler
Copy link
Collaborator

vogler commented May 20, 2021

It's also a major problem for using the pre-commit hook: it's almost impossible to make changes to a file which contains such indentation that ocp-indent doesn't agree with.

If it only diffs the staged changes as in my comment above, it shouldn't be a major problem. You just can't commit any changes that don't comply with ocp-indent.

To do that you'd have to remove the hook, commit and then re-add the hook.

There's git commit -n.

Not sold on CI since you only find out after push which is too late and causes more overhead than pre-commit.
Adding it as some further CI check doesn't hurt, but IMO it'd better if it were avoided in the first place.
We can also configure stuff we don't agree with: http://manpages.ubuntu.com/manpages/bionic/man1/ocp-indent.1.html#configuration
AFAIR it was fine mostly - we just have some code like

if x then y else
z

where z is many lines that would be indented.
Maybe this kind of code should just be indented or refactored by pulling z out.

@sim642
Copy link
Member Author

sim642 commented May 21, 2021

If it only diffs the staged changes as in my comment above, it shouldn't be a major problem. You just can't commit any changes that don't comply with ocp-indent.

I'm for updating the pre-commit hook to do that, then we could hopefully keep things more or less as is. It might motivate me to also add it back if it doesn't cause extra inconvenience.

There's git commit -n.

TIL. Although if the updated hook works, then this hopefully wouldn't really be needed. Although maybe it's a good idea to mention this command somewhere, maybe in docs or just the hook (or even its output).

Not sold on CI since you only find out after push which is too late and causes more overhead than pre-commit.
Adding it as some further CI check doesn't hurt, but IMO it'd better if it were avoided in the first place.

The reason I was thinking of CI was because as many people might have the hook already removed, they could still keep pushing misindented code. But unless we reindent everything to match ocp-indent (which I think we don't have to do), the CI would just keep complaining about the already existing misindented code.

@vogler
Copy link
Collaborator

vogler commented May 21, 2021

Used the above. A bit of a hack since it shows the diff of diffs, but should be good enough.
Maybe a good time to restore the symlink and report if it misbehaves.
If no, we can also add it as a GitHub action.

@michael-schwarz
Copy link
Member

Now that we have the hook back, we can close this, right?

@sim642
Copy link
Member Author

sim642 commented May 26, 2021

I will document something about it and I think then I can close the issue for now.

If any complications arise again, we can discuss them further in this issue and reopen when it seems necessary to try some other solution.

@sim642
Copy link
Member Author

sim642 commented May 26, 2021

Well, this didn't take long... The hook doesn't behave well with merge commits:

  1. The changes of a merge commit are all the changes being merged, so if any of the commits being merged themselves had indentation problems, then the hook will also prevent merging.
  2. When that happens or there are conflicts, then after resolving those the natural thing to do is git merge --continue, which makes the merge commit. But that command doesn't allow --no-verify! Only the default (without --continue) variant allows --no-verify.

So I still ended up temporarily removing the hook. Not sure how often this would be an issue though.

Right now I was merging master into #196 to fix the conflicts. The misindentation things coming from master are related to all the cleanup changes I've been making, where I didn't fix existing indentation problems. But since those changes show up in the diff, ocp-indent starts complaining about a hundred places since the cleanup changes were very spread around, touching almost all printables and lattices. So the hook might annoy others' attempts at merging master into their branches as well, just so you know.

@sim642
Copy link
Member Author

sim642 commented May 26, 2021

Another fun bug of the hook: it complains about misindented code that's being deleted, because that's also in the diff.

@sim642
Copy link
Member Author

sim642 commented May 26, 2021

Another fun bug of the hook: it complains about misindented code that's being deleted, because that's also in the diff.

I'm sorry, but this is worse than it first seemed. Not only does the new hook complain about misindented code that's now deleted, it also complains about misindented code that's now changed to be properly indented because that's also in the diff!

For example, if you change

let x =
y

into

let x =
  y

The new hook first computes the diff without ocp-indented files, which is:

< y
---
>   y

Second it computes the diff with ocp-indented files, which are both equal to the new properly indented one, so the diff is empty.
Finally it computes the diff between the two diffs, which is:

< < y
< ---
< >   y

And because this is non-empty, it complains. I hope I'm understanding this correctly. (Not to mention it's essentially impossible to read this diff of diffs.)

The old hook, which checks the entire file (new without ocp-indent vs new with ocp-indent) sees no difference, because the new file is properly indented and everything is fine. So the new hook is complaining about things that the old one didn't.

@sim642 sim642 reopened this May 26, 2021
@vogler
Copy link
Collaborator

vogler commented May 26, 2021

Didn't test anything but adding misindented code 🙈
What you actually want is some set minus instead of diff, but I guess that can be achieved by piping it through sed and ignoring all the lines starting with < (and lstrip > to make it look like a normal diff).

Concerning merge commits: you wrote it checks for each commit - I thought that would only be the case for rebase, and for a merge commit it should only check that commit.
My git-fu is not that strong. There's probably a better way to get the to-be-committed changes that also plays well with merges than the git show approach above.
When there are conflicts then you'd likely also have to take MERGE_HEAD into account.
In case of an octopus merge, do we need to diff all parents?
Maybe easier to ignore merge commits.

@sim642
Copy link
Member Author

sim642 commented May 26, 2021

Didn't test anything but adding misindented code see_no_evil
What you actually want is some set minus instead of diff, but I guess that can be achieved by piping it through sed and ignoring all the lines starting with < (and lstrip > to make it look like a normal diff).

At this point I'm wondering if multiple calls using --lines argument extracted from diff lines would be more straightforward.

Concerning merge commits: you wrote it checks for each commit - I thought that would only be the case for rebase, and for a merge commit it should only check that commit.

Well, it doesn't re-check for each commit, but if you git merge --no-commit (or it stays in MERGING state due to conflicts), then all changes being merged are staged I think, so that's why it would complain again.

Maybe easier to ignore merge commits.

This is probably the most reasonable approach, we don't need it to be that advanced. (Also I don't think anyone in Goblint's history has ever performed an octopus merge, we aren't the linux kernel.)

Technically a merge commit could also introduce new changes whose indentation may be off, but given our imperfect indentation state, such rare occurrence wouldn't be significant.

vogler added a commit that referenced this issue May 26, 2021
command string gets too long for xargs to replace:

~~~
fatal: ambiguous argument ':%': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
~~~
@vogler
Copy link
Collaborator

vogler commented Jun 8, 2021

Changed from xargs to for-loop in a381b47, but noticed that a simple sed replacement is not enough.

At this point I'm wondering if multiple calls using --lines argument extracted from diff lines would be more straightforward.

This is probably the way to go.

@vogler
Copy link
Collaborator

vogler commented Jun 9, 2021

I changed it to be based on --lines. Should be better.
No longer uses HEAD, so maybe it's also fine for merges now.

@michael-schwarz
Copy link
Member

I think this seems to be working for almost everyone now. We can reopen if we encounter any new problems.

@vogler
Copy link
Collaborator

vogler commented Jul 27, 2021

Stumbled upon this: https://pre-commit.ci
Not sure if we want automatic commits that fix indentation in PRs.
Also, https://pre-commit.com does not list OCaml, so we might as well just add another GitHub workflow for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants