Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some basic stats on which PRs merge now vs after black etc #7470

Draft
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

toofar
Copy link
Member

@toofar toofar commented Nov 9, 2022

One of the main issues remaining around the 3.0 release is what autoformatters we can introduce to the codebase and how badly that'll conflict with the open PRs. As asked in #1455 (comment)

This is an attempt to provide some numbers to support that decisions. It runs black and friends then tries to merge all the PRs.

Questions I believe we want answered from this investigation:

  1. How many PRs merge to master currently? 29 (64%)
  2. Of those how many merge with various auto formatters applied to master? 15-16
  3. Can we massage the PR branches to apply the same auto formatting and improve that? yes - 24-? (I've only checked with black, not black+isort+pyupgrade)
  4. Of those (that apply to master?) how many merge to qt6-v2? 24
  5. Can we improve that?
    • I expect the main source of conflicts will be from all the imports changing, we could write an "autoformatter" script which re-wrote any PyQt5 imports in the top ~100 lines of the tree files during a rebase
  6. Of those how many merge with various autoformatters applied to qt6-v2?

The script needs updating to try some different merge strategies to see if we can help people keep their PRs inline with the upcoming changes without all having to be updated manually. Also removing webkit will probably cause conflicts too.

TODOs are at the bottom of the script.

Note that this won't attempt to make a runnable qutebrowser after all the automated merges. usort and isort at least introduce some recursive import issues.

Apologies for writing it in shell code. Running a bunch of commands comes a bit more naturally to me like this than python. I don't expect this PR to be merged anyhow, it's opened for visibility and collaboration.

Current summary:

report-master.csv succeeded=29 failed=45
report-qt6-v2.csv succeeded=19 failed=55
report-tmp-black.csv succeeded=16 failed=58
report-tmp-black_isort.csv succeeded=15 failed=59
report-tmp-black_isort_pyupgrade.csv succeeded=15 failed=59
report-tmp-black_pyupgrade.csv succeeded=16 failed=58
report-tmp-black_usort.csv succeeded=15 failed=59
report-tmp-black-rebase.csv succeeded=24 failed=5
report-tmp-black_isort_pyupgrade-rebase.csv succeeded=19 failed=10

One of the main issues remaining about the 3.0 release is what
autoformatters we can introduce to the codebase and how badly that'll
conflict with the open PRs. #1455

This is an attempt to provide some numbers to support that decisions. It
runs black and friends then tries to merge all the PRs.

The script needs updating to try some different merge strategies to see
if we can help people keep their PRs inline with the upcoming changes
without all having to be updated manually.

TODOs are at the bottom of the script.
See the comments in the script for more details. We are trying to see if
we can migrate PRs to be based off of a master branch that has had
autoformatting applied to it. Doing this via a merge is very difficult
because once you have conflict markers you can't really do much.

So you need to apply the autoformatters to the base branch and the PR
branch and then try to point the changed PR branch at the changed
master.

I tried doing it via a rebase first, which necessitates re-writing the
commits transparently when applying them. It seems to work well enough.
An initial one shows all of the PRs that applies successfully to master
applied to the changed master after going through this process.

I tried doing it via a rebase first because I find rebases easier to
reason about than merges. I did add a merge strategy too but it
performed poorly. I haven't put much thought into it and are probably
doing something wrong.

There's some duplication and hardcoded stuff. The tool "aliases" I moved
up because the re-writing is done in two places now. I'm hardcoding the
master report for excluding already-failing PRs, that might end up being
fine.

Next steps:
* verify this rebase strategy some more
* see how many fail when rebasing on qt6-v2 too
* look at the merge strategy again
* clean up the script more, consider porting it to python if the results
  are anywhere near acceptable
It seems if you specify the same attribute multiple times in
gitattributes the last one wins. So on runs with multiple tools we were
only running the last one.

This switches to running all the tools from a shell script. Which is
written out from a shell script. I think there is probably a few more
cards we can stack on here before it all comes tumbling down.

TODO:
* why are we not seeing much of an improvement for the more complicated case,
  weren't most of the conflicts in the first place from black?

report-black.csv succeeded=16 failed=58
report-tmp-black-rebase.csv succeeded=20 failed=9

report-black_isort_pyupgrade.csv succeeded=15 failed=59
report-tmp-black_isort_pyupgrade-rebase.csv succeeded=16 failed=13
Previously it was setting head_sha within the loop. When doing
rebase_with_formatting and that failed we weren't resetting to $base properly.
So it ended up trying to merge subsequent PRs onto previous ones, which caused
additional PRs to show as failed.

Change to save the base branch sha before we start doing work and just reset
at the top of the loop instead of trying to do it on each failure path.
The cache for changes due to pyupgrade aren't the same from "pyupgrade"
as "black pyupgrade", so include the full set of commands in the cache
path.

Unfortunately that makes the cache less useful across different
configurations, but hopefully it still helps me when developing this.
I'm seeing conflicts when trying to apply later commits that are
complaining about formatting changes made in earlier commits. In
particular the changes that are left in the working tree during a rebase
that are getting picked up by the "fix lint" commits.

I'm a little worried about this. For resolving those particular
conflicts -X theirs is the right thing to do, just take the new commit,
it already essentially replaces prior ones. But in cases where there is
a real conflict with master might we be obscuring that? Should be able
to test that.

TODO:
* look more into the cause of the "fix lint" commits, maybe even try
  changing that to `reset HEAD` since those are just lint changes
* test if -X theirs will obscure real conflicts, look into some PRs to
  see if it is doing weird stuff (I checked 7405 and 7312 already and
  they look good!)
Add a second argument to `maybepause()` to get it to pause working so
you can investigate in another terminal.
TODO: look at merging instead of rebasing
Pyupgrade and black were fighting, which made this difficult to pin
down. #4578 was a good test case.
This makes it a bit slow now, of course.
clean up temp files
always log the fail reason even if not pausing
Weird, not sure where these empty files are coming from.

Also tell pyupgrade to always return 0 where nothing odd happens.
It seems -X renormalize works from with merge. And I think more
consistently than rebasing? Not sure, need to go back and check after
the last round of fixes.

This just merges an auto-formatted merge-base branch into the PR, and
applies more autoformatting in the merge commit. It does not try to
merge the PR into a more recent base branch (which is what we want to do
with the open PRs). So it's not an answer yet but it might be the first
step in one.

Probably there could be more code de-duplicated between
merge_with_formatting and rebase_with_formatting.

TODO:
* look at merging into the more recent base instead of the temp one
* if that doesn't work see if we can rebase the merge commit up or
  something after this initial formatting related merge
I'm more interested in refining the strategies to deal with adapting to
whatever tools we choose to go with right now
I new there was something weird about these issues. They were due to
formatting changes but the seemed like ones that should have been taken
care of by the smudge filter. Turns out the issue was twofold:

1. there are some formatting changes that arise from running tools one
   after the other, so that was resolved by running them all twice, so
   they all get the change to run on each other's output
2. I think I typoed renormalize in my test script (s instead of z) which
   made me think the option either did nothing or was on by default.
   Much confusion.
@The-Compiler The-Compiler moved this from Done to WIP in Pull request backlog Apr 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Small old PRs
Development

Successfully merging this pull request may close these issues.

None yet

1 participant