[infra] reduce cirrus task dependencies#49454
[infra] reduce cirrus task dependencies#49454fluttergithubbot merged 2 commits intoflutter:masterfrom
Conversation
|
I think part of the idea is that deploy may not test things that indicate a deploy should not be done. Are these tasks actually deploying anything though? If not then this change should be fine. |
| - framework_tests-widgets-linux | ||
| - framework_tests-libraries-linux | ||
| - framework_tests-misc-linux | ||
| - tool_tests-general-linux | ||
| - tool_tests-commands-linux | ||
| - tool_tests-integration-linux | ||
| - build_tests-linux | ||
| - hostonly_devicelab_tests-0-linux | ||
| - hostonly_devicelab_tests-1-linux | ||
| - hostonly_devicelab_tests-2-linux | ||
| - hostonly_devicelab_tests-3_last-linux | ||
| - firebase_test_lab_tests-linux |
There was a problem hiding this comment.
Is there any way to make this conditional on what branch we're on?
Maybe even just making two separate tasks, one of which is always skipped where we actually deploy and has no deps, one of which is skipped when we don't deploy and has the deps?
There was a problem hiding this comment.
Not that I can see from looking in the logs
There was a problem hiding this comment.
Talked offline - it would be helpful to have a comment here explaining why we think this is safe - it doesn't actually upload on master, by the time it gets to dev it's been throughly tested, adding the dependencies greatly slows down CI
There was a problem hiding this comment.
Updated with comments, PTAL
|
It doesn't look like we actually deploy on commits to master though: https://github.com/flutter/flutter/blob/master/dev/bots/deploy_gallery.sh#L39 |
|
This pull request is not suitable for automatic merging in its current state.
|
Description
Currently the deploy_gallery-* tasks depend on a large number of other cirrus tasks. The intention, I believe, is to avoid running these deployments on obviously failing builds.
Unfortunately cirrus infra is unstable, failing due to agents being killed or other issues that are not diagnosable via logs. When a parent task fails due to an infra issue, it also causes the child tasks to be cancelled. When the failed parent task is rerun, naturally the child task cannot be rerun until complete.
Since we have moved to a model where the build is considered red until 100% of previously failing tasks have passed, this behavior roughly doubles the outage time from a flaked test. If you conservatively assume each cirrus task takes 30 minutes to schedule and run, then a failure of one of the
depends_ontasks would lead to approximately an hour of outage.To reduce the outage to "only" half an hour, I propose removing all "depends_on" clauses except for
analyze_linux/docs, which is faster than average and fairly stable.