test_owners.csv causes repo-wide merge conflicts #35850

liggitt · 2016-10-29T17:42:30Z

updates to test/test_owners.csv are now required to merge new tests

This adds a single file that must be updated for any new test package in the repo. This is causing completely unrelated PRs to have merge conflicts with each other

This check should be removed from blocking the merge queue until after code freeze or until an approach that removes the repo-wide single-file contention is implemented

liggitt · 2016-10-29T17:44:10Z

c.f. #35471 (comment)

fejta · 2016-10-29T18:23:04Z

We definitely want to improve this.

I would like to see evidence that rebases are a widespread problem (it should show up here: http://velodrome.k8s.io/dashboard/db/kubernetes-developer-velocity) before disabling this check.

Batched merges should provide a minor improvement here. And contribx also has some ongoing discussions about auto-generating this file via test comments.

liggitt · 2016-10-29T18:30:15Z

I like the intent of the test owner check, but a week before code freeze is not the time to explore how much this impacts velocity. The mentioned improvements should probably be done before blocking merge on a single file repo wide

fejta · 2016-10-29T18:47:32Z

I do not want to extrapolate too much from a single PR that changes 651 files. What evidence suggests this is a widespread problem?

The data I have right now tells me that:

velodrome shows fewer than 20% of PRs need any rebases.
The test owner file changes a couple times a day: https://github.com/kubernetes/kubernetes/commits/master/test/test_owners.csv

caesarxuchao · 2016-10-29T19:19:15Z

One more rebase: #35673. The PR is just moving a test to a different location.

caesarxuchao · 2016-10-29T19:33:51Z

I copied the test_owners.csv before and after the rebase of #35673, in case it was useful to investigate what caused the frequent rebase: https://gist.github.com/caesarxuchao/b3b87d8eb7033e19ed66d023e35cdf50

fejta · 2016-10-29T19:36:35Z

Any time two commits change the same file one needs to be rebased -- even if there are no merge conflicts.

caesarxuchao · 2016-10-29T19:48:06Z

Any time two commits change the same file one needs to be rebased

Does that mean two PRs that add/remove two different tests will cause each other to rebase?

fejta · 2016-10-29T19:48:23Z

Yes 😞

caesarxuchao · 2016-10-29T20:27:55Z

#35471 just got conflict in test_owners.csv again. I added a test in that PR.

fejta · 2016-10-29T20:33:45Z

You are changing 614 files in that PR. You will need to rebase any time anyone makes any change to those 614 files. There are 20 PRs queued up to merge. I'm pretty confident multiple of those PRs will change some of those 614 files, requiring you to do a few more rebases after this one.

I strong recommend putting large, refactoring PRs like this one at the top of the queue -- this is how we handle this sort of thing.

fejta · 2016-10-29T20:36:25Z

(Note that another problem is that the submit queue has been blocked 30-60% of the past few days, exacerbating this situation)

caesarxuchao · 2016-10-29T20:40:57Z

I took your advice and bumped the priority. (Most of the 614 files are pkg/client/clientset_generated/, not many PRs touch those. I rebased 3 times since last night, 2 of them are because of the test_owners.csv.)

fejta · 2016-10-29T20:49:32Z

Thanks, very sorry for the trouble here! This is definitely something @rmmh and I want to improve very quickly (and we can evaluate on Monday if we need to defer putting in place this check)

smarterclayton · 2016-10-29T21:30:04Z

How is rebase/pr calculated? Is it when the bot says "rebase needed"? Which PRs is it measured against?

I see 100 prs modified in the last 24 hours (last update as proxy for push / comment / review rate). I see ~32 with rebase-needed that have modified in the last 24 hours. The 95% is 2 for the last week. I'm not sure what information that is giving me, but I would expect the result this week to have spiked to at minimum 3 just based on the instantaneous rate of rebase_needed.

It might be more accurate to measure based on total rebase_needed. Every rebase_needed is 5-15 minutes of developer time (especially if it causes a context switch). So if we see a spike in total rebase needed, we are wasting more developer time (it could be adjusted based on a pool of recently modified PRs, probably). I'm skeptical that this single file isn't having a significant impact across all developers.

Do we have a measurement for time from the minute lgtm goes on a PR to the minute it merges? That latency is also valuable and is a more reliable measure of "time required to babysit a valid PR".

smarterclayton · 2016-10-29T21:38:14Z

That said I don't see any of the <12hr updated PRs that have rebase-needed conflicting on this file (is that because they haven't rebased yet)? Agree hard data on number of conflicts on certain types of files would be useful to identify impact of rebases, so we can quantify developer cost due to conflicts, vs due to flakes (flakes are cheaper to resolve - 1-5 minutes, usually).

fejta · 2016-10-29T22:12:17Z

The lgtm->merge data you asked for is also available on velodrome site Merge latency graph). The data show that right now 20% of PRs take at least 30 hours to get committed 😞

As we mentioned at the beginning of October our goal to merge everything 4 hours after getting an LGTM. The first thing we plan to try are batched merges.

Also note that the submit queue plots how long things spend on the queue: http://submit-queue.k8s.io/#/e2e, which shows that this spiked dramatically this week from < 10hrs to 70hrs (second/backlog chart). This is largely because of test failures starting on 10/25 causing the submit queue to spend 50% of the day doing nothing (third/unblocked chart).

I do not see a correlation between adding this check on at the end of 10/28 and the merge experience declining. If anything the reverse seems true (although I suspect that is just luck)

fejta · 2016-10-29T22:25:59Z

I believe rebase rate is the number of times a PR has been in a state requiring a rebase. We have 50p: 0, 80p: 0, 95p: 2.

This means 95% of PRs committed in the last week were in a needs-rebase state 2 or fewer times, and 5% of PRs were in a needs-rebase state 2 or more times.

Is that right @apelisse?

rmmh · 2016-10-30T08:19:50Z

test_owners.csv doesn't cause excessive rebases as far as I can tell. Two PRs can affect the same file without necessarily causing merge conflicts, depending on the merge algorithm used.

As long as your different commits don't change the same lines in the file, they can be merged without conflicts-- test/.gitattributes makes it use the merge=union option for combining changes line by line.

In the particular linked case, a merge conflict was caused by a bad test_owners.csv change in b0421c1 that altered more things than were necessary. I'm looking into what @ConnorDoyle's environment looked like to cause that sort of change.

smarterclayton · 2016-10-30T16:41:55Z

What does "committed" mean here? "Had a pr status update that included a
commit"?

On Oct 29, 2016, at 6:26 PM, Erick Fejta notifications@github.com wrote:

I believe rebase rate is the number of times a PR has been in a state
requiring a rebase. We have 50p: 0, 80p: 0, 95p: 2.

This means 95% of PRs committed in the last week were in a needs-rebase
state 2 or fewer times, and 5% of PRs were in a needs-rebase state 2 or
more times.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#35850 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_p84xPi3fLNhcCauovzu3Ta4Ceg1qks5q48gFgaJpZM4KkLAS
.

smarterclayton · 2016-10-30T20:56:34Z

Should we be bucketing PRs differently (by size category)? A two line PR
and a 300 line PR seem unlikely to have the same characteristics wrt time
in queue and number of rebases. Rebase cost for a small PR is
disproportionate to anything larger.

A metric I want to know (beyond just anecdote) is how many people are
dropping major refactors in the last two weeks before code freeze and how
that impacts other developers (probably related to PR sizeg. The last two
releases have been getting much hotter as we get more serious about code
freeze and time spent reacting to those refactors more impactful.

On Oct 30, 2016, at 4:20 AM, Ryan Hitchman notifications@github.com wrote:

test_owners.csv doesn't cause excessive rebases as far as I can tell. Two
PRs can affect the same file without necessarily causing merge conflicts,
depending on the merge algorithm used.

As long as your different commits don't change the same lines in the
file, they can be merged without conflicts-- test/.gitattributes makes it
use the merge=union option for combining changes line by line.

In the particular linked case, a merge conflict was caused by a bad
test_owners.csv change in b0421c1
b0421c1
that altered more things than were necessary. I'm looking into what
@ConnorDoyle https://github.com/ConnorDoyle's environment looked like to
cause that sort of change.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#35850 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_p7jsd0yqUtW6UUZ21fi8hJ8aL0Frks5q5FMzgaJpZM4KkLAS
.

fejta · 2016-10-30T21:57:14Z

By committed I mean the PR is merged and closed. So 95% of PRs that merged in the last week had 2 or fewer rebases during the entire lifetime of the PR.

smarterclayton · 2016-10-30T23:40:52Z

Ok - probably would expect to break it down in more detail between
meaningful PRs and simple PRs. A 2 line PR merging doesn't really impact
the same way more involved PRs do.

On Oct 30, 2016, at 5:57 PM, Erick Fejta notifications@github.com wrote:

By committed I mean the PR is merged and closed. So 95% of PRs that merged
in the last week had 2 or fewer rebases during the entire lifetime of the
PR.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#35850 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_pxSRJcWZN8clKlgvrl7Iw85csr9lks5q5RLJgaJpZM4KkLAS
.

fejta · 2016-10-31T00:35:18Z

Sure! You can make changes to that effect here: https://github.com/kubernetes/test-infra/blob/master/velodrome/transform/rebase.go

My interpretation of the data is that rebase pain is a) relatively rare and b) concentrated around a few types of PRs (API changes, refactoring, build changes). This makes me think that simply sending these PRs to the front of the queue with a P1 label &c is sufficient for now, freeing us to focus on our other efforts until further attention is required.

liggitt assigned fejta Oct 29, 2016

k8s-github-robot added area/kubectl area/test-infra labels Oct 29, 2016

fejta assigned rmmh and unassigned fejta Oct 29, 2016

fejta added the sig/contributor-experience Categorizes an issue or PR as relevant to SIG Contributor Experience. label Oct 29, 2016

caesarxuchao mentioned this issue Oct 30, 2016

Minor change: move watch.ListWatchUntil to its own package #35673

Merged

liggitt mentioned this issue Oct 31, 2016

Allow apiserver to choose preferred kubelet address type #35497

Merged

k8s-github-robot closed this as completed in 57a152c Nov 3, 2016

derekwaynecarr mentioned this issue Mar 1, 2017

Improvements for hack/test_owners.csv #42312

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test_owners.csv causes repo-wide merge conflicts #35850

test_owners.csv causes repo-wide merge conflicts #35850

liggitt commented Oct 29, 2016

liggitt commented Oct 29, 2016

fejta commented Oct 29, 2016

liggitt commented Oct 29, 2016

fejta commented Oct 29, 2016

caesarxuchao commented Oct 29, 2016

caesarxuchao commented Oct 29, 2016

fejta commented Oct 29, 2016

caesarxuchao commented Oct 29, 2016

fejta commented Oct 29, 2016

caesarxuchao commented Oct 29, 2016

fejta commented Oct 29, 2016 •

edited

fejta commented Oct 29, 2016

caesarxuchao commented Oct 29, 2016

fejta commented Oct 29, 2016 •

edited

smarterclayton commented Oct 29, 2016

smarterclayton commented Oct 29, 2016 •

edited

fejta commented Oct 29, 2016 •

edited

fejta commented Oct 29, 2016 •

edited

rmmh commented Oct 30, 2016

smarterclayton commented Oct 30, 2016

smarterclayton commented Oct 30, 2016

fejta commented Oct 30, 2016

smarterclayton commented Oct 30, 2016

fejta commented Oct 31, 2016

test_owners.csv causes repo-wide merge conflicts #35850

test_owners.csv causes repo-wide merge conflicts #35850

Comments

liggitt commented Oct 29, 2016

liggitt commented Oct 29, 2016

fejta commented Oct 29, 2016

liggitt commented Oct 29, 2016

fejta commented Oct 29, 2016

caesarxuchao commented Oct 29, 2016

caesarxuchao commented Oct 29, 2016

fejta commented Oct 29, 2016

caesarxuchao commented Oct 29, 2016

fejta commented Oct 29, 2016

caesarxuchao commented Oct 29, 2016

fejta commented Oct 29, 2016 • edited

fejta commented Oct 29, 2016

caesarxuchao commented Oct 29, 2016

fejta commented Oct 29, 2016 • edited

smarterclayton commented Oct 29, 2016

smarterclayton commented Oct 29, 2016 • edited

fejta commented Oct 29, 2016 • edited

fejta commented Oct 29, 2016 • edited

rmmh commented Oct 30, 2016

smarterclayton commented Oct 30, 2016

smarterclayton commented Oct 30, 2016

fejta commented Oct 30, 2016

smarterclayton commented Oct 30, 2016

fejta commented Oct 31, 2016

fejta commented Oct 29, 2016 •

edited

fejta commented Oct 29, 2016 •

edited

smarterclayton commented Oct 29, 2016 •

edited

fejta commented Oct 29, 2016 •

edited

fejta commented Oct 29, 2016 •

edited