More Linux/Android resources for staging are needed #96864

zanderso · 2022-01-19T16:05:23Z

Flaky tests and benchmark get moved to staging and need some number of consecutive non-flaky runs in order to move back to prod. However, it looks like staging is under-provisioned, and so achieving that number of runs is going to take a long time. As an example Linux_android opacity_peephole_fade_transition_text_perf__e2e_summary, has only been run on 6 out of the last 70 framework commits. There are several other benchmarks that appear to be receiving similar treatment. For contrast Linux_android animated_placeholder_perf__e2e_summary has been run on 50 out of the last 70 commits.

Marking P2 to determine whether this is really due to insufficient resources or rather due to a bug in scheduling.

/cc @godofredoc

The text was updated successfully, but these errors were encountered:

keyonghan · 2022-01-19T20:00:38Z

We have three linux bots running tasks in staging, there are 4 in idle status (1 motog4 (M), 3 samsung) which are not being scheduled to run tests ever.

A couple of things to move forward:

I don't think we need to run all linux/android devicelab tests in staging (https://ci.chromium.org/p/flutter/g/devicelab_staging/console). This consumes most of resources, and causes builds queued up and run in batches. The next time we need to validate any new configs/hardware, we can enable them back to validate and then skip after.
We have a KR to support high-end phones in Q1 to make sure benchmarks are collected smoothly. We need to start a plan to run tests in those new testbeds.

As a short-time workaround, we can limit the number of devicelab tests running in the staging pool, giving room for the real flaky tests validation.

keyonghan · 2022-01-19T20:17:45Z

https://flutter-review.googlesource.com/c/infra/+/25440 to skip ~35 tests migrated from mac/android.

keyonghan · 2022-01-20T17:41:19Z

I believe those flaky tests are now being picked up frequently enough. Here are the top ones

godofredoc · 2022-01-20T23:09:29Z

This issue has been mitigated removing a subset of tests from staging.

zanderso · 2022-01-20T23:44:14Z

Thanks! I think we can close this as fixed, and I'll re-open or file a new one as needed.

zanderso · 2022-01-21T03:36:55Z

It still seems like there is a capacity issue.

Also, there are only two windows bots on staging? They've both been offline for 9+ hours.

keyonghan · 2022-01-21T18:36:21Z

Yeah, builds are queued up quickly especially when we have several commits merged around the same time.

There are currently 3 linux staging bots running 25 staging linux/android tests, whereas there are 17 linux prod bots running 83 prod linux/android test (with current 90th% queue time 22min, SLO 35 min).

It makes sense to me to migrate, say 2 bots, from prod to staging to help validate the flaky tests for now. We can expand the prod capacity when new bots are available.

For windows bots, opened #97017.

keyonghan · 2022-01-24T18:23:13Z

Instead of migrating bots between prod and staging, https://flutter-review.googlesource.com/c/infra/+/25566 to run devicelab staging linux tests in A02 testbeds.
Only benchmarks and bringup:true ones are now running on motoG4. This way we are reducing 10 more tests.

keyonghan · 2022-01-26T23:20:03Z

Linux staging builders are being picked up and run in a timely manner now. Will monitor for a while before close.

keyonghan · 2022-01-29T02:10:34Z

Builds are running at a frequent pace. Closing.

github-actions · 2022-02-12T02:14:50Z

This thread has been automatically locked since there has not been any recent activity after it was closed. If you are still experiencing a similar issue, please open a new bug, including the output of flutter doctor -v and a minimal reproduction of the issue.

zanderso added team-infra Owned by Infrastructure team P2 labels Jan 19, 2022

zanderso added this to New in Infra Ticket Queue via automation Jan 19, 2022

yusuf-goog moved this from New to Triaged in Infra Ticket Queue Jan 19, 2022

keyonghan self-assigned this Jan 19, 2022

keyonghan moved this from Triaged to In progress in Infra Ticket Queue Jan 19, 2022

godofredoc added the passed secondary triage label Jan 20, 2022

zanderso closed this as completed Jan 20, 2022

Infra Ticket Queue automation moved this from In progress to Done Jan 20, 2022

zanderso reopened this Jan 21, 2022

Infra Ticket Queue automation moved this from Done to In progress Jan 21, 2022

keyonghan closed this as completed Jan 29, 2022

Infra Ticket Queue automation moved this from In progress to Done Jan 29, 2022

github-actions bot locked as resolved and limited conversation to collaborators Feb 12, 2022

flutter-triage-bot bot added P0 Critical issues such as a build break or regression and removed P2 labels Jun 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More Linux/Android resources for staging are needed #96864

More Linux/Android resources for staging are needed #96864

zanderso commented Jan 19, 2022

keyonghan commented Jan 19, 2022 •

edited

Loading

keyonghan commented Jan 19, 2022

keyonghan commented Jan 20, 2022

godofredoc commented Jan 20, 2022

zanderso commented Jan 20, 2022

zanderso commented Jan 21, 2022

keyonghan commented Jan 21, 2022

keyonghan commented Jan 24, 2022

keyonghan commented Jan 26, 2022

keyonghan commented Jan 29, 2022

github-actions bot commented Feb 12, 2022

More Linux/Android resources for staging are needed #96864

More Linux/Android resources for staging are needed #96864

Comments

zanderso commented Jan 19, 2022

keyonghan commented Jan 19, 2022 • edited Loading

keyonghan commented Jan 19, 2022

keyonghan commented Jan 20, 2022

godofredoc commented Jan 20, 2022

zanderso commented Jan 20, 2022

zanderso commented Jan 21, 2022

keyonghan commented Jan 21, 2022

keyonghan commented Jan 24, 2022

keyonghan commented Jan 26, 2022

keyonghan commented Jan 29, 2022

github-actions bot commented Feb 12, 2022

keyonghan commented Jan 19, 2022 •

edited

Loading