New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test_owners.csv causes repo-wide merge conflicts #35850
Comments
c.f. #35471 (comment) |
We definitely want to improve this. I would like to see evidence that rebases are a widespread problem (it should show up here: http://velodrome.k8s.io/dashboard/db/kubernetes-developer-velocity) before disabling this check. Batched merges should provide a minor improvement here. And contribx also has some ongoing discussions about auto-generating this file via test comments. |
I like the intent of the test owner check, but a week before code freeze is not the time to explore how much this impacts velocity. The mentioned improvements should probably be done before blocking merge on a single file repo wide |
I do not want to extrapolate too much from a single PR that changes 651 files. What evidence suggests this is a widespread problem? The data I have right now tells me that:
|
One more rebase: #35673. The PR is just moving a test to a different location. |
I copied the test_owners.csv before and after the rebase of #35673, in case it was useful to investigate what caused the frequent rebase: https://gist.github.com/caesarxuchao/b3b87d8eb7033e19ed66d023e35cdf50 |
Any time two commits change the same file one needs to be rebased -- even if there are no merge conflicts. |
Does that mean two PRs that add/remove two different tests will cause each other to rebase? |
Yes 😞 |
#35471 just got conflict in test_owners.csv again. I added a test in that PR. |
You are changing 614 files in that PR. You will need to rebase any time anyone makes any change to those 614 files. There are 20 PRs queued up to merge. I'm pretty confident multiple of those PRs will change some of those 614 files, requiring you to do a few more rebases after this one. I strong recommend putting large, refactoring PRs like this one at the top of the queue -- this is how we handle this sort of thing. |
(Note that another problem is that the submit queue has been blocked 30-60% of the past few days, exacerbating this situation) |
I took your advice and bumped the priority. (Most of the 614 files are pkg/client/clientset_generated/, not many PRs touch those. I rebased 3 times since last night, 2 of them are because of the test_owners.csv.) |
Thanks, very sorry for the trouble here! This is definitely something @rmmh and I want to improve very quickly (and we can evaluate on Monday if we need to defer putting in place this check) |
How is rebase/pr calculated? Is it when the bot says "rebase needed"? Which PRs is it measured against? I see 100 prs modified in the last 24 hours (last update as proxy for push / comment / review rate). I see ~32 with rebase-needed that have modified in the last 24 hours. The 95% is 2 for the last week. I'm not sure what information that is giving me, but I would expect the result this week to have spiked to at minimum 3 just based on the instantaneous rate of rebase_needed. It might be more accurate to measure based on total rebase_needed. Every rebase_needed is 5-15 minutes of developer time (especially if it causes a context switch). So if we see a spike in total rebase needed, we are wasting more developer time (it could be adjusted based on a pool of recently modified PRs, probably). I'm skeptical that this single file isn't having a significant impact across all developers. Do we have a measurement for time from the minute lgtm goes on a PR to the minute it merges? That latency is also valuable and is a more reliable measure of "time required to babysit a valid PR". |
That said I don't see any of the <12hr updated PRs that have rebase-needed conflicting on this file (is that because they haven't rebased yet)? Agree hard data on number of conflicts on certain types of files would be useful to identify impact of rebases, so we can quantify developer cost due to conflicts, vs due to flakes (flakes are cheaper to resolve - 1-5 minutes, usually). |
The lgtm->merge data you asked for is also available on velodrome site Merge latency graph). The data show that right now 20% of PRs take at least 30 hours to get committed 😞 As we mentioned at the beginning of October our goal to merge everything 4 hours after getting an LGTM. The first thing we plan to try are batched merges. Also note that the submit queue plots how long things spend on the queue: http://submit-queue.k8s.io/#/e2e, which shows that this spiked dramatically this week from < 10hrs to 70hrs (second/backlog chart). This is largely because of test failures starting on 10/25 causing the submit queue to spend 50% of the day doing nothing (third/unblocked chart). I do not see a correlation between adding this check on at the end of 10/28 and the merge experience declining. If anything the reverse seems true (although I suspect that is just luck) |
I believe rebase rate is the number of times a PR has been in a state requiring a rebase. We have 50p: 0, 80p: 0, 95p: 2. This means 95% of PRs committed in the last week were in a needs-rebase state 2 or fewer times, and 5% of PRs were in a needs-rebase state 2 or more times. Is that right @apelisse? |
test_owners.csv doesn't cause excessive rebases as far as I can tell. Two PRs can affect the same file without necessarily causing merge conflicts, depending on the merge algorithm used. As long as your different commits don't change the same lines in the file, they can be merged without conflicts-- In the particular linked case, a merge conflict was caused by a bad test_owners.csv change in b0421c1 that altered more things than were necessary. I'm looking into what @ConnorDoyle's environment looked like to cause that sort of change. |
What does "committed" mean here? "Had a pr status update that included a On Oct 29, 2016, at 6:26 PM, Erick Fejta notifications@github.com wrote: I believe rebase rate is the number of times a PR has been in a state This means 95% of PRs committed in the last week were in a needs-rebase — |
Should we be bucketing PRs differently (by size category)? A two line PR A metric I want to know (beyond just anecdote) is how many people are On Oct 30, 2016, at 4:20 AM, Ryan Hitchman notifications@github.com wrote: test_owners.csv doesn't cause excessive rebases as far as I can tell. Two As long as your different commits don't change the same lines in the In the particular linked case, a merge conflict was caused by a bad — |
By committed I mean the PR is merged and closed. So 95% of PRs that merged in the last week had 2 or fewer rebases during the entire lifetime of the PR. |
Ok - probably would expect to break it down in more detail between On Oct 30, 2016, at 5:57 PM, Erick Fejta notifications@github.com wrote: By committed I mean the PR is merged and closed. So 95% of PRs that merged — |
Sure! You can make changes to that effect here: https://github.com/kubernetes/test-infra/blob/master/velodrome/transform/rebase.go My interpretation of the data is that rebase pain is a) relatively rare and b) concentrated around a few types of PRs (API changes, refactoring, build changes). This makes me think that simply sending these PRs to the front of the queue with a P1 label &c is sufficient for now, freeing us to focus on our other efforts until further attention is required. |
updates to test/test_owners.csv are now required to merge new tests
This adds a single file that must be updated for any new test package in the repo. This is causing completely unrelated PRs to have merge conflicts with each other
This check should be removed from blocking the merge queue until after code freeze or until an approach that removes the repo-wide single-file contention is implemented
The text was updated successfully, but these errors were encountered: