Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestions around merging PRs w/o running tests #5334

Closed
kargakis opened this Issue Nov 3, 2017 · 26 comments

Comments

Projects
None yet
8 participants
@kargakis
Copy link
Member

kargakis commented Nov 3, 2017

We have a process in place today that ensures our web console (separate project from the main repo) is vendored back to the main repo. The main issue with the state of the world is that this process creates a bump commit with all changes since last time we vendored and pushes that commit directly to master. This is obviously problematic as it breaks the merge queue so we want to rearchitect the flow and make the bump commit go through the merge queue as a normal PR.

The problem with making this a PR:
PRs can get plagued by flakes. We need the bump PR to merge as quick as possible. I think we don't care about running tests at all (similarly to how we don't care about running tests for documentation fixes). In case a bump stays more than long enough in the queue, the next bump PR needs to invalidate the previous one. The logic to handle this sounds etoohairy.

My question:
How can we handle this with tide in place? I was thinking of a way to mark prowjobs as blessed that would force plank/jenkins-operator to clear all statuses and skip tests but it needs more thought. Any other ideas?

@stevekuznetsov @cjwagner @spxtr @rmmh @fejta @BenTheElder

fyi @jwforres @spadgett

/area prow
/kind feature

@stevekuznetsov

This comment has been minimized.

Copy link
Contributor

stevekuznetsov commented Nov 3, 2017

The similar situation upstream in k/k is the docs-only-retest-not-required -- I know I've helped to babysit a PR from a new contributor that found something wrong in a README and needed to have it explained why we were running tests on their PR for two days before it merged. We could think about approaching this as human approval + matching a set of file regex constraints to not require status contexts to be posted or to be allowed to enter the tide pool.

@cjwagner

This comment has been minimized.

Copy link
Member

cjwagner commented Nov 3, 2017

The main issue with the state of the world is that this process creates a bump commit with all changes since last time we vendored and pushes that commit directly to master. This is obviously problematic as it breaks the merge queue

Why does this break the merge queue? It will restart testing on PRs because there will be a new master HEAD, but the merge queue should continue to merge properly? Pushing directly to master seems like the right way to merge a change that needs to be merged quickly and doesn't require testing.

I'd prefer to avoid adding special logic to make Tide ignore status contexts on some PRs if that is possible. I think its better if statuses that aren't required don't report to github or are marked as skipped.

@kargakis

This comment has been minimized.

Copy link
Member Author

kargakis commented Nov 3, 2017

Why does this break the merge queue? It will restart testing on PRs because there will be a new master HEAD, but the merge queue should continue to merge properly? Pushing directly to master seems like the right way to merge a change that needs to be merged quickly and doesn't require testing.

It breaks any batch/single PR that runs at that point.

I'd prefer to avoid adding special logic to make Tide ignore status contexts on some PRs if that is possible. I think its better if statuses that aren't required don't report to github or are marked as skipped.

FWIW, nobody proposed that (yet). I was thinking of a way to mark all contexts as green inside plank/jenkins-operator. We want to avoid testing in other cases too like documentation fixes. I don't think that it's viable to merge directly to master every time such a PR comes up.

@kargakis

This comment has been minimized.

Copy link
Member Author

kargakis commented Nov 3, 2017

FWIW, nobody proposed that (yet)

Seems like @stevekuznetsov did. Nice.

@stevekuznetsov

This comment has been minimized.

Copy link
Contributor

stevekuznetsov commented Nov 3, 2017

@cjwagner "breaking" a batch here just means the mungers will refuse to merge it since master HEAD changed. Not sure if that was clear before.

@cjwagner

This comment has been minimized.

Copy link
Member

cjwagner commented Nov 3, 2017

How often is a vendor bump done? If this is infrequent it doesn't seem like a big deal if 2 presubmit jobs have to restart.

@stevekuznetsov

This comment has been minimized.

Copy link
Contributor

stevekuznetsov commented Nov 3, 2017

We would love to do them every time the other repo gets a commit, which used to mean ten, fifteen times a day. We're doing it now once a day in the night, but it will still mean the queue wastes a lot of time on it and when flakes are otherwise making it inefficient, it feels bad to see a batch job that could have merged five PRs fail due to this.

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Feb 7, 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@stevekuznetsov

This comment has been minimized.

Copy link
Contributor

stevekuznetsov commented Feb 8, 2018

/remove-lifecycle stale

@spiffxp spiffxp referenced this issue Feb 21, 2018

Closed

Implement tide and replace the submit queue #3866

6 of 6 tasks complete

@kargakis kargakis referenced this issue Feb 22, 2018

Closed

Switch k/k to Tide. #6417

2 of 5 tasks complete
@BenTheElder

This comment has been minimized.

Copy link
Member

BenTheElder commented Feb 22, 2018

FWIW I'm not a fan of anything merging into kubernetes at least without presubmits, how do we guarantee the change doesn't break the repo? I think we can partially solve this by making batching better in tide.

Manual merging without the submit queue can be done by ...manually merging...?

@stevekuznetsov

This comment has been minimized.

Copy link
Contributor

stevekuznetsov commented Feb 22, 2018

Yeah there's some disconnect with the "stigma" of pushing the button and anecdotally people use the retest-not-required label to get away from that. Agreed that efficient batching would effectively solve this as well as the real pain point is the time taken to merge.

@chrislovecnm

This comment has been minimized.

Copy link
Member

chrislovecnm commented Mar 9, 2018

/cc

@chrislovecnm

This comment has been minimized.

Copy link
Member

chrislovecnm commented Mar 10, 2018

Having this testing functionality is a big need. When, for instance, spelling errors are being tested in non-go files, we should not be running e2e.

Is anyone working on this?

@BenTheElder

This comment has been minimized.

Copy link
Member

BenTheElder commented Mar 11, 2018

Having this testing functionality is a big need. When, for instance, spelling errors are being tested in non-go files, we should not be running e2e.

That exists with run_if_changed, you can set something like run_if_changed: *.go to regex only go files on individual jobs.

@kargakis

This comment has been minimized.

Copy link
Member Author

kargakis commented Mar 11, 2018

That exists with run_if_changed, you can set something like run_if_changed: *.go to regex only go files on individual jobs.

Setting all jobs to run as run_if_changed does not sound ideal but it can work. It would be nice to offer a per repo run_if_changed option (and potentially a run_if_not_changed option too since golang regexps do not support negative lookaheads).

@kargakis

This comment has been minimized.

Copy link
Member Author

kargakis commented Apr 2, 2018

@stevekuznetsov has a proposal that would satisfy this if implemented

/assign stevekuznetsov

@fejta

This comment has been minimized.

Copy link
Contributor

fejta commented Jun 7, 2018

I would rather we fix batched merges to run all powers of 2 in parallel. So if there are 10 open PRs it runs a 1, 2, 4, and 8 PR batch. Merging the largest set that passes

@fejta

This comment has been minimized.

Copy link
Contributor

fejta commented Jun 7, 2018

Reprioritizing the queue has caused as much harm as good on mungegithub. I suspect overall project velocity would be higher if we eliminated all this functionality.

Whenever a higher priority PR gets approved we wind up throwing away the pending test results a) causing delays while we wait for these thrown away results to complete and b) not merging the batch run that would have merged 5 PRs (ideally all PRs) instead of 1.

A secondary problem is that once we start ordering or classifying mergeable PRs this naturally causes people to spend time and energy attempting to make the ordering/classification more correct.

Tide is an attempt to deal with these issues. It already mitigates some of it: waiting for the batch run to complete before considering whether to merge a serial run, for example. Similar mitigations would work here (but not address the complexity creep)

@kargakis

This comment has been minimized.

Copy link
Member Author

kargakis commented Jun 30, 2018

I think we can fix this w/o adding the concept of priority in tide.

@kargakis

This comment has been minimized.

Copy link
Member Author

kargakis commented Jul 23, 2018

@stevekuznetsov is the override plugin enough for covering our use case?

@stevekuznetsov

This comment has been minimized.

Copy link
Contributor

stevekuznetsov commented Jul 23, 2018

Not really, we'd have to /override every context and hope it gets chosen into the next batch, and that the batch would merge, if not we'd need to re-/override after the failed batch results get posted to the PR, etc.

@stevekuznetsov

This comment has been minimized.

Copy link
Contributor

stevekuznetsov commented Jul 23, 2018

This is more like #7899

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Oct 21, 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Nov 20, 2018

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Dec 20, 2018

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Dec 20, 2018

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.