Fix bors problems or decide to remove it #6139

ekager · 2019-10-19T22:52:59Z

Because I don't like it 👸🏼

┆Issue is synchronized with this Jira Task

ekager · 2019-10-19T22:55:11Z

My arguments:

Have had more issues with it than time it has saved us
Flakiness / weird error messages that aren't easily resolved (at least with my level of bors understanding)
When it autocloses the PR no one has the immediate responsibility to "QA-Needed" the issue that was linked or go and add the milestone - so it's still a manual process you just have to remember to go back and do it

ekager · 2019-10-19T22:56:29Z

If it would be possible to make bors an optional check so that non-admins could still merge issues if bors fails or without bors at all (once all tests pass on the PR like before) I would consider keeping it around if people have strong opinions.

ekager · 2019-10-19T22:58:29Z

In the small amount of time we've had bors, I've run into multiple times where bors refuses to merge a PR (even a small change) and after a million "bors retry"s I'm pretty over it. I've wasted enough time coming back and checking on it it would have been 1000x faster to just merge it myself

NotWoods · 2019-10-20T01:49:26Z

It's also impossible (as far as I can tell) to see the problems now if a build fails. This might be due to configuration, since A-C shows taskcluster checks still.

sblatz · 2019-10-21T13:41:20Z

When it autocloses the PR no one has the immediate responsibility to "QA-Needed" the issue that was linked or go and add the milestone - so it's still a manual process you just have to remember to go back and do it

I think we originally thought we could automate more of this so it’d not be manual at all. Since we can’t, I think this is one of the stronger arguments for removing it. Things are more likely to fall through the cracks.

I’m not sure how AC hasn’t run into all of these merging problems. I wonder if @pocmo can shine some light on that? We consistently get PR’s that get in an impossible-to-merge state no matter how many times we retry.

If we can’t resolve these issues I’m definitely in @ekager’s camp of removing it.

I'm sorry, Dave, I'm afraid I can't do that.

pocmo · 2019-10-22T13:45:54Z

I’m not sure how AC hasn’t run into all of these merging problems. I wonder if @pocmo can shine some light on that? We consistently get PR’s that get in an impossible-to-merge state no matter how many times we retry.

bors has been very stable for us. If there were problems then they were caused by taskcluster tasks failing, freezing, not reporting status etc. - even without bors that seems to be something we should get resolved.

I'm happy to help fix and debug that. What errors did bors report (build failed, timeout, not responding with anything at all..)?

I usually start by looking at the taskcluster checks and tasks for the merge commit bors creates. One of them is usually the culprit.

In general bors is super helpful to keep a healthy green master since it merges PRs that have been tested against the latest master change. I wouldn't give that up if we can fix it. :)

pocmo · 2019-10-22T15:15:45Z

What I just saw on my PR is:

UI tests are taking ages and at the end fail (-> Long-term: Lets see if we can make them faster, short-term: Fix the ui tests and/or Consider removing ui tests from the checks bors waits for)
Bors did not fail immediately when a task failed and was instead still waiting on the other tasks (and eventually timed out). I'm not sure whether we can configure that but in AC we only let bors wait on one "complete" task and that worked.

JohanLorenzo · 2019-10-24T12:49:00Z

Hey there!

Thanks for raising concerns about bors! I don't personally have any preference about using bors itself. Although I should say a group of Mozillians from different teams came to the consensus that having an autoland bot is the best solution we can have at the moment. The context and the decision are gathered in this document. Bors is one autoland bot, I'm open to switching to another bot.

Flakiness / weird error messages that aren't easily resolved (at least with my level of bors understanding)

I confess I'm disappointed in bors try. It intermittently times out, without any explanatory error message. This issue has been reported at bors-ng/bors-ng#739. So far, no active developer interacted with us.

An example of weird error message I got is this one: #6117 (comment).

I think no matter what autoland bot we use, we will run into setup issues. Bors has been proven good enough to suit https://github.com/servo/servo, which has a similar complexity as Gecko. What changes between servo and Fenix+AC is the way we use taskcluster-github. Fenix+AC have used Github Checks (the new API) for the past month, while Servo remains on Github Statuses (the old API). I believe the situation can be improved if bors provides a better support for Github Statuses.

It's also impossible (as far as I can tell) to see the problems now if a build fails. This might be due to configuration, since A-C shows taskcluster checks still.

I might not have the full context, but Fenix has showed Taskcluster checks (aka Github Checks) for the past month. Do you have an example of a failed build where you cannot see what the problem is?

UI tests are taking ages and at the end fail (-> Long-term: Lets see if we can make them faster, short-term: Fix the ui tests and/or Consider removing ui tests from the checks bors waits for)

👍
Another short term fix I can help with:

fenix/taskcluster/ci/test/kind.yml

Line 51 in 210e358

# TODO Generate APKs in a build task instead

This way the UI tests just run the tests and doesn't build a dedicated APK in the same APK task.

Bors did not fail immediately when a task failed and was instead still waiting on the other tasks (and eventually timed out). I'm not sure whether we can configure that but in AC we only let bors wait on one "complete" task and that worked.

We can be at parity with AC, if I manage to land #6117 (let's discuss the technical details of why I cannot in the PR itself).

Please let me know if this helps addressing your major concerns 🙂

liuche · 2019-10-29T17:38:08Z

For now, we've decided:

On Friday, we'll turn on bors 11/1 but not make it a merge requirement
If bors is running (bc someone started a bors task), DO NOT MANUALLY MERGE
otherwise, can merge manually if things look fine

sblatz · 2019-11-12T18:40:13Z

I'll take this away and investigate if it's working now :)

sblatz · 2019-12-17T16:54:17Z

After some more testing with this, I don't think bors solves a use case for us as we still need to babysit it. Because of that I'm going to move ahead with removing it.

I'm sorry, Dave, I'm afraid I can't do that.

JohanLorenzo · 2019-12-17T17:18:10Z

👍 I haven't managed to get it working on Fenix since then. I think I tried like half a dozen times. It has always timed out. I'm fine removing bors from Fenix.

ekager added the eng:health Improve code health label Oct 19, 2019

ekager added the needs:group-triage label Oct 19, 2019

NotWoods added the eng:automation Build automation, Continuous integration, .. label Oct 20, 2019

ekager removed the eng:health Improve code health label Oct 20, 2019

sblatz added a commit to sblatz/fenix that referenced this issue Oct 21, 2019

Closes mozilla-mobile#6139: Removes bors

5d5b5da

I'm sorry, Dave, I'm afraid I can't do that.

sblatz changed the title ~~Get rid of bors~~ Fix bors problems or decide to remove it Oct 23, 2019

JohanLorenzo mentioned this issue Oct 25, 2019

Let bors wait on complete-task which errors out whenever all task have run #6117

Closed

5 tasks

This was referenced Nov 7, 2019

Bors timeout when task fails (complete-push unscheduled vs. failed) mozilla-mobile/android-components#4686

Closed

Split UI test into a build task and a test one #6517

Merged

boek assigned sblatz Nov 12, 2019

boek removed the needs:group-triage label Nov 12, 2019

sblatz added a commit to sblatz/fenix that referenced this issue Dec 17, 2019

Closes mozilla-mobile#6139: Removes bors

836ef14

I'm sorry, Dave, I'm afraid I can't do that.

sblatz closed this as completed in 99f3804 Dec 17, 2019

sblatz added the eng:qa:not-needed Added by QA to issues that cannot be tested label Dec 17, 2019

JohanLorenzo mentioned this issue Mar 27, 2020

Way to approve contributer pull requests for automation #9295

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bors problems or decide to remove it #6139

Fix bors problems or decide to remove it #6139

ekager commented Oct 19, 2019 •

edited by data-sync-user

ekager commented Oct 19, 2019

ekager commented Oct 19, 2019

ekager commented Oct 19, 2019

NotWoods commented Oct 20, 2019

sblatz commented Oct 21, 2019 •

edited

pocmo commented Oct 22, 2019

pocmo commented Oct 22, 2019

JohanLorenzo commented Oct 24, 2019

liuche commented Oct 29, 2019 •

edited

sblatz commented Nov 12, 2019

sblatz commented Dec 17, 2019

JohanLorenzo commented Dec 17, 2019

Fix bors problems or decide to remove it #6139

Fix bors problems or decide to remove it #6139

Comments

ekager commented Oct 19, 2019 • edited by data-sync-user

ekager commented Oct 19, 2019

ekager commented Oct 19, 2019

ekager commented Oct 19, 2019

NotWoods commented Oct 20, 2019

sblatz commented Oct 21, 2019 • edited

pocmo commented Oct 22, 2019

pocmo commented Oct 22, 2019

JohanLorenzo commented Oct 24, 2019

liuche commented Oct 29, 2019 • edited

sblatz commented Nov 12, 2019

sblatz commented Dec 17, 2019

JohanLorenzo commented Dec 17, 2019

ekager commented Oct 19, 2019 •

edited by data-sync-user

sblatz commented Oct 21, 2019 •

edited

liuche commented Oct 29, 2019 •

edited