Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_owners.csv causes repo-wide merge conflicts #35850

Closed
liggitt opened this issue Oct 29, 2016 · 24 comments
Closed

test_owners.csv causes repo-wide merge conflicts #35850

liggitt opened this issue Oct 29, 2016 · 24 comments
Assignees
Labels
area/kubectl area/test-infra sig/contributor-experience Categorizes an issue or PR as relevant to SIG Contributor Experience.

Comments

@liggitt
Copy link
Member

liggitt commented Oct 29, 2016

updates to test/test_owners.csv are now required to merge new tests

This adds a single file that must be updated for any new test package in the repo. This is causing completely unrelated PRs to have merge conflicts with each other

This check should be removed from blocking the merge queue until after code freeze or until an approach that removes the repo-wide single-file contention is implemented

@liggitt
Copy link
Member Author

liggitt commented Oct 29, 2016

c.f. #35471 (comment)

@fejta fejta assigned rmmh and unassigned fejta Oct 29, 2016
@fejta fejta added the sig/contributor-experience Categorizes an issue or PR as relevant to SIG Contributor Experience. label Oct 29, 2016
@fejta
Copy link
Contributor

fejta commented Oct 29, 2016

We definitely want to improve this.

I would like to see evidence that rebases are a widespread problem (it should show up here: http://velodrome.k8s.io/dashboard/db/kubernetes-developer-velocity) before disabling this check.

Batched merges should provide a minor improvement here. And contribx also has some ongoing discussions about auto-generating this file via test comments.

@liggitt
Copy link
Member Author

liggitt commented Oct 29, 2016

I like the intent of the test owner check, but a week before code freeze is not the time to explore how much this impacts velocity. The mentioned improvements should probably be done before blocking merge on a single file repo wide

@fejta
Copy link
Contributor

fejta commented Oct 29, 2016

I do not want to extrapolate too much from a single PR that changes 651 files. What evidence suggests this is a widespread problem?

The data I have right now tells me that:

@caesarxuchao
Copy link
Member

One more rebase: #35673. The PR is just moving a test to a different location.

@caesarxuchao
Copy link
Member

I copied the test_owners.csv before and after the rebase of #35673, in case it was useful to investigate what caused the frequent rebase: https://gist.github.com/caesarxuchao/b3b87d8eb7033e19ed66d023e35cdf50

@fejta
Copy link
Contributor

fejta commented Oct 29, 2016

Any time two commits change the same file one needs to be rebased -- even if there are no merge conflicts.

@caesarxuchao
Copy link
Member

Any time two commits change the same file one needs to be rebased

Does that mean two PRs that add/remove two different tests will cause each other to rebase?

@fejta
Copy link
Contributor

fejta commented Oct 29, 2016

Yes 😞

@caesarxuchao
Copy link
Member

#35471 just got conflict in test_owners.csv again. I added a test in that PR.

@fejta
Copy link
Contributor

fejta commented Oct 29, 2016

You are changing 614 files in that PR. You will need to rebase any time anyone makes any change to those 614 files. There are 20 PRs queued up to merge. I'm pretty confident multiple of those PRs will change some of those 614 files, requiring you to do a few more rebases after this one.

I strong recommend putting large, refactoring PRs like this one at the top of the queue -- this is how we handle this sort of thing.

@fejta
Copy link
Contributor

fejta commented Oct 29, 2016

(Note that another problem is that the submit queue has been blocked 30-60% of the past few days, exacerbating this situation)

@caesarxuchao
Copy link
Member

I took your advice and bumped the priority. (Most of the 614 files are pkg/client/clientset_generated/, not many PRs touch those. I rebased 3 times since last night, 2 of them are because of the test_owners.csv.)

@fejta
Copy link
Contributor

fejta commented Oct 29, 2016

Thanks, very sorry for the trouble here! This is definitely something @rmmh and I want to improve very quickly (and we can evaluate on Monday if we need to defer putting in place this check)

@smarterclayton
Copy link
Contributor

How is rebase/pr calculated? Is it when the bot says "rebase needed"? Which PRs is it measured against?

I see 100 prs modified in the last 24 hours (last update as proxy for push / comment / review rate). I see ~32 with rebase-needed that have modified in the last 24 hours. The 95% is 2 for the last week. I'm not sure what information that is giving me, but I would expect the result this week to have spiked to at minimum 3 just based on the instantaneous rate of rebase_needed.

It might be more accurate to measure based on total rebase_needed. Every rebase_needed is 5-15 minutes of developer time (especially if it causes a context switch). So if we see a spike in total rebase needed, we are wasting more developer time (it could be adjusted based on a pool of recently modified PRs, probably). I'm skeptical that this single file isn't having a significant impact across all developers.

Do we have a measurement for time from the minute lgtm goes on a PR to the minute it merges? That latency is also valuable and is a more reliable measure of "time required to babysit a valid PR".

@smarterclayton
Copy link
Contributor

smarterclayton commented Oct 29, 2016

That said I don't see any of the <12hr updated PRs that have rebase-needed conflicting on this file (is that because they haven't rebased yet)? Agree hard data on number of conflicts on certain types of files would be useful to identify impact of rebases, so we can quantify developer cost due to conflicts, vs due to flakes (flakes are cheaper to resolve - 1-5 minutes, usually).

@fejta
Copy link
Contributor

fejta commented Oct 29, 2016

The lgtm->merge data you asked for is also available on velodrome site Merge latency graph). The data show that right now 20% of PRs take at least 30 hours to get committed 😞

As we mentioned at the beginning of October our goal to merge everything 4 hours after getting an LGTM. The first thing we plan to try are batched merges.

Also note that the submit queue plots how long things spend on the queue: http://submit-queue.k8s.io/#/e2e, which shows that this spiked dramatically this week from < 10hrs to 70hrs (second/backlog chart). This is largely because of test failures starting on 10/25 causing the submit queue to spend 50% of the day doing nothing (third/unblocked chart).

I do not see a correlation between adding this check on at the end of 10/28 and the merge experience declining. If anything the reverse seems true (although I suspect that is just luck)

@fejta
Copy link
Contributor

fejta commented Oct 29, 2016

I believe rebase rate is the number of times a PR has been in a state requiring a rebase. We have 50p: 0, 80p: 0, 95p: 2.

This means 95% of PRs committed in the last week were in a needs-rebase state 2 or fewer times, and 5% of PRs were in a needs-rebase state 2 or more times.

Is that right @apelisse?

@rmmh
Copy link
Contributor

rmmh commented Oct 30, 2016

test_owners.csv doesn't cause excessive rebases as far as I can tell. Two PRs can affect the same file without necessarily causing merge conflicts, depending on the merge algorithm used.

As long as your different commits don't change the same lines in the file, they can be merged without conflicts-- test/.gitattributes makes it use the merge=union option for combining changes line by line.

In the particular linked case, a merge conflict was caused by a bad test_owners.csv change in b0421c1 that altered more things than were necessary. I'm looking into what @ConnorDoyle's environment looked like to cause that sort of change.

@smarterclayton
Copy link
Contributor

What does "committed" mean here? "Had a pr status update that included a
commit"?

On Oct 29, 2016, at 6:26 PM, Erick Fejta notifications@github.com wrote:

I believe rebase rate is the number of times a PR has been in a state
requiring a rebase. We have 50p: 0, 80p: 0, 95p: 2.

This means 95% of PRs committed in the last week were in a needs-rebase
state 2 or fewer times, and 5% of PRs were in a needs-rebase state 2 or
more times.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#35850 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_p84xPi3fLNhcCauovzu3Ta4Ceg1qks5q48gFgaJpZM4KkLAS
.

@smarterclayton
Copy link
Contributor

Should we be bucketing PRs differently (by size category)? A two line PR
and a 300 line PR seem unlikely to have the same characteristics wrt time
in queue and number of rebases. Rebase cost for a small PR is
disproportionate to anything larger.

A metric I want to know (beyond just anecdote) is how many people are
dropping major refactors in the last two weeks before code freeze and how
that impacts other developers (probably related to PR sizeg. The last two
releases have been getting much hotter as we get more serious about code
freeze and time spent reacting to those refactors more impactful.

On Oct 30, 2016, at 4:20 AM, Ryan Hitchman notifications@github.com wrote:

test_owners.csv doesn't cause excessive rebases as far as I can tell. Two
PRs can affect the same file without necessarily causing merge conflicts,
depending on the merge algorithm used.

As long as your different commits don't change the same lines in the
file, they can be merged without conflicts-- test/.gitattributes makes it
use the merge=union option for combining changes line by line.

In the particular linked case, a merge conflict was caused by a bad
test_owners.csv change in b0421c1
b0421c1
that altered more things than were necessary. I'm looking into what
@ConnorDoyle https://github.com/ConnorDoyle's environment looked like to
cause that sort of change.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#35850 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_p7jsd0yqUtW6UUZ21fi8hJ8aL0Frks5q5FMzgaJpZM4KkLAS
.

@fejta
Copy link
Contributor

fejta commented Oct 30, 2016

By committed I mean the PR is merged and closed. So 95% of PRs that merged in the last week had 2 or fewer rebases during the entire lifetime of the PR.

@smarterclayton
Copy link
Contributor

Ok - probably would expect to break it down in more detail between
meaningful PRs and simple PRs. A 2 line PR merging doesn't really impact
the same way more involved PRs do.

On Oct 30, 2016, at 5:57 PM, Erick Fejta notifications@github.com wrote:

By committed I mean the PR is merged and closed. So 95% of PRs that merged
in the last week had 2 or fewer rebases during the entire lifetime of the
PR.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#35850 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_pxSRJcWZN8clKlgvrl7Iw85csr9lks5q5RLJgaJpZM4KkLAS
.

@fejta
Copy link
Contributor

fejta commented Oct 31, 2016

Sure! You can make changes to that effect here: https://github.com/kubernetes/test-infra/blob/master/velodrome/transform/rebase.go

My interpretation of the data is that rebase pain is a) relatively rare and b) concentrated around a few types of PRs (API changes, refactoring, build changes). This makes me think that simply sending these PRs to the front of the queue with a P1 label &c is sufficient for now, freeing us to focus on our other efforts until further attention is required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubectl area/test-infra sig/contributor-experience Categorizes an issue or PR as relevant to SIG Contributor Experience.
Projects
None yet
Development

No branches or pull requests

6 participants