Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ToT is red due to a row of infra failures #100233

Closed
keyonghan opened this issue Mar 16, 2022 · 9 comments
Closed

ToT is red due to a row of infra failures #100233

keyonghan opened this issue Mar 16, 2022 · 9 comments
Labels
team-infra Owned by Infrastructure team

Comments

@keyonghan
Copy link
Contributor

Lots of tests failed due to infra failure when updating task status:

Screen Shot 2022-03-16 at 9 49 47 AM

Example build: https://ci.chromium.org/ui/p/flutter/builders/prod/Linux_android%20devtools_profile_start_test/2056/overview

AuthenticatedClientError:
  URI: https://flutter-dashboard.appspot.com/api/update-task-status
  HTTP Status: 500
  Response body:
Key not found: Commit:flutter/flutter/master/1c2c9421121f345392162e3dde0a784d0cbdb69b
#0      DatastoreDB.lookupValue (package:gcloud/src/db/db.dart:381:9)
<asynchronous suspension>
#1      UpdateTaskStatus._constructCommitKey (package:cocoon_service/src/request_handlers/update_task_status.dart:115:27)
<asynchronous suspension>
#2      UpdateTaskStatus._getTaskFromNamedParams (package:cocoon_service/src/request_handlers/update_task_status.dart:77:35)
<asynchronous suspension>
#3      UpdateTaskStatus.post (package:cocoon_service/src/request_handlers/update_task_status.dart:58:23)
<asynchronous suspension>
#4      RequestHandler.service.<anonymous closure> (package:cocoon_service/src/request_handling/request_handler.dart:52:22)
<asynchronous suspension>
#5      ApiRequestHandler.service.<anonymous closure> (package:cocoon_service/src/request_handling/api_request_handler.dart:151:7)
<asynchronous suspension>
#6      ApiRequestHandler.service (package:cocoon_service/src/request_handling/api_request_handler.dart:150:5)
<asynchronous suspension>
#7      main.<anonymous closure>.<anonymous closure> (file:///app/bin/server.dart:267:9)
<asynchronous suspension>
@keyonghan keyonghan added the team-infra Owned by Infrastructure team label Mar 16, 2022
@keyonghan keyonghan added this to New in Infra Ticket Queue via automation Mar 16, 2022
@keyonghan
Copy link
Contributor Author

/cc @godofredoc @CaseyHillers This is the failure I mentioned yesterday when updating task status.

@godofredoc
Copy link
Contributor

What is missing to move the test results processing to pub/sub?

@CaseyHillers
Copy link
Contributor

@yusuf-goog was starting something in flutter/cocoon#1574. Once a Cocoon handler has been added, we can update the devicelab test runner to push to pub sub instead of making http requests

@godofredoc
Copy link
Contributor

Sounds like the problem here is that something is creating an entry in the data models that is out of sync with the handlers requests. We need to fix that race condition or avoid depending in the data model entry to exist.

@CaseyHillers
Copy link
Contributor

To recap, this is what happened:

  1. flutter/flutter is on the LUCI scheduler, which wasn't impacted by the GitHub outage today
  2. Builds triggered and finished before Cocoon received webhook notifications to create the Task entities
  3. When the luci builds sent /api/update-task-status they reached the 500 error

This is a duplicate of #78876, which we will mark as being fixed once moved to the cocoon scheduler.

@godofredoc godofredoc moved this from New to Triaged in Infra Ticket Queue Mar 18, 2022
@godofredoc
Copy link
Contributor

After fixing the recipes the test has been passing consistently.

Infra Ticket Queue automation moved this from Triaged to Done Mar 18, 2022
@godofredoc godofredoc reopened this Mar 21, 2022
Infra Ticket Queue automation moved this from Done to In progress Mar 21, 2022
@CaseyHillers
Copy link
Contributor

Any AIs for this being reopened? Since this was caused by the GitHub webhook outage, this will be obsolete once #92300 is landed (which moves the scheduler to be dependent on GitHub webhooks)

@godofredoc
Copy link
Contributor

It was reopened because we were missing the context of what actually fixed it. Thanks for updating!

@github-actions
Copy link

github-actions bot commented Apr 5, 2022

This thread has been automatically locked since there has not been any recent activity after it was closed. If you are still experiencing a similar issue, please open a new bug, including the output of flutter doctor -v and a minimal reproduction of the issue.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 5, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
team-infra Owned by Infrastructure team
Projects
No open projects
Development

No branches or pull requests

3 participants