Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flutter dashboard does not refresh luci builds data as expected #66846

Closed
godofredoc opened this issue Sep 28, 2020 · 18 comments
Closed

Flutter dashboard does not refresh luci builds data as expected #66846

godofredoc opened this issue Sep 28, 2020 · 18 comments
Assignees
Labels
P0 Critical issues such as a build break or regression team-infra Owned by Infrastructure team

Comments

@godofredoc
Copy link
Contributor

The expectation is that flutter build dashboard tasks are updated in a max 2 mins after the luci build completes. The observed behavior is that sometimes the tree is closed for more than 15 mins after a red task has become green.

I think this is a problem in two areas:

  • The endpoint may not be updating the tasks frequently enough.
  • A visualization problem where the cached data shows a task state from up to 15 mins ago.

\cc @CaseyHillers @keyonghan

@godofredoc godofredoc added this to New in Infra Ticket Queue via automation Sep 28, 2020
@CaseyHillers
Copy link
Contributor

The build dashboard data endpoint /api/public/get-status has a 1 minute cache ttl, source

The benchmark data for the performance dashboard is stored on 15 min intervals. Should that be reduced to 1 minute as well?

Is it possible that this is running on a cron job when it should be using a pub-sub?

@godofredoc
Copy link
Contributor Author

We are not adding pub/sub for prod builds yet but that is a very good idea. We can just add a pub/sub topic to the prod builder requests and process the data as we are getting it rather than polling the server.

@digiter digiter added team-infra Owned by Infrastructure team P2 labels Sep 28, 2020
@digiter digiter moved this from New to Triaged in Infra Ticket Queue Sep 28, 2020
@godofredoc godofredoc removed this from Triaged in Infra Ticket Queue Oct 1, 2020
@godofredoc
Copy link
Contributor Author

Moving out of the infra ticket queue because the implementation will take more than a couple of days

@CaseyHillers CaseyHillers self-assigned this Oct 6, 2020
@CaseyHillers CaseyHillers added this to To do in [infra] Devicelab on LUCI via automation Oct 7, 2020
@keyonghan
Copy link
Contributor

Maybe related. There used to be a bug when refreshing chromebot status, and got fixed in flutter/cocoon#963.

@CaseyHillers
Copy link
Contributor

I looked into this some more. We need to set the notifier of LUCI prod builds to point to a flutter-dashboard pub sub. This requires changes in both flutter/infra (LUCI schedules the initial build) and flutter/cocoon (schedules reruns).

There's the existing pubsub handler luci-status-handler which forwards updates to github checks for try jobs. For simplicity, we can create a prod version of the pubsub, and look into refactoring the duplicate logic later.

We'll also need to figure out why flutter/cocoon#992 failed with Json encoding the LUCI requests.

@CaseyHillers
Copy link
Contributor

The DeviceLab LUCI tests now upload their results during the recipe. This allows for quicker updates than the current LUCI cron job.

Depending on the complexity of adding pubsub to LUCI prod builders, it may be better to generalize the devicelab upload for all Flutter builders.

@CaseyHillers
Copy link
Contributor

I updated the git on borg mirror fetch cron from the default of 15 mins to 5 mins for the Flutter GitHub mirrors. This should reduce the time it takes for commits to be picked up by LUCI from average of 7.5 mins to 2.5 mins.

5 mins is the minimum the GoB fetch allows. If needed, we can send an RPC to the GoB backend to update based off the GitHub webhooks

$ gob-ctl repos update-mirror-config github/flutter --uri https://github.com/flutter/.git --fetch_frequency 5m

uri: "https://github.com/flutter/.git"
fetch_frequency: {
  seconds: 300
}

@godofredoc
Copy link
Contributor Author

Can we do the same for the engine repo?

/cc @zanderso FYI

Although I thought this bug was related to the cells not being updated because of the cache and or the latency to process build status.

@CaseyHillers
Copy link
Contributor

Can we do the same for the engine repo?

I applied to all mirrors in the Flutter org (there was only one config).

Although I thought this bug was related to the cells not being updated because of the cache and or the latency to process build status.

This helps with getting gray->yellow quicker on LUCI tasks. I documented this here as it's not obvious why there was occasionally a 15 min delay in the a task going from gray to yellow.

@keyonghan
Copy link
Contributor

gray=>yellow means status from New to In process. It is possibly related to capacity scheduling latency. Do we still see the 15 min delay occasionally?

@CaseyHillers
Copy link
Contributor

I fixed the fetch timeout. There were configs for each repo. For some reason, flutter/flutter was set to sync every 30 minutes.

chillers@chillzone:~$ gob-ctl repos update-mirror-config github/flutter/flutter --uri https://github.com/flutter/flutter.git --fetch_frequency 5m
uri: "https://github.com/flutter/flutter.git"
fetch_frequency: {
  seconds: 300
}
chillers@chillzone:~$ gob-ctl repos update-mirror-config github/flutter/engine --uri https://github.com/flutter/engine.git --fetch_frequency 5m
uri: "https://github.com/flutter/engine.git"
fetch_frequency: {
  seconds: 300
}

It is possibly related to capacity scheduling latency. Do we still see the 15 min delay occasionally?

I believe the issue is that LUCI is based on this GoB repo. It cannot schedule builds for a new commit until the commit has been mirrored from GitHub to GoB. To validate a commit on LUCI we end up with the following time breakdown:

  1. Mirror from GitHub to GoB (1 minute to 30 minutes, average 15 mins)
  2. LUCI scheduler detects new commit and schedules builds (<1 minute)
  3. Builds queued until capacity is available (see infra metrics dashboard for latest)
  4. Builds run (~1 hour)

1 affects gray -> yellow timing
2 - 4 affect yellow -> completion

@keyonghan
Copy link
Contributor

Thanks for the explanations.
One note for 2, if we have more commits, it may affect gray->yellow as well per our triggering policy max_batch_size and max_concurrent_invocations.

@godofredoc
Copy link
Contributor Author

\cc @CaseyHillers can we close this one or are we still missing something?

@CaseyHillers CaseyHillers removed their assignment Mar 2, 2021
@CaseyHillers
Copy link
Contributor

No. Cocoon still relies on the cron based method for updating LUCI tasks on the dashboard.

Unassigning myself as i'm not actively working on this.

\cc @yusufm

@godofredoc
Copy link
Contributor Author

Assigning to Yusuf, this impacts the team and it prevents the adoption of the flutter-dashboard for teams different than framework.

@yusuf-goog
Copy link
Contributor

This is currently blocked on flutter/flutter waiting to move onto the cocoon scheduler, which should fix this.

@keyonghan keyonghan assigned CaseyHillers and unassigned yusuf-goog Apr 21, 2022
@CaseyHillers
Copy link
Contributor

Deduping with #92300

[infra] Devicelab on LUCI automation moved this from To do to Done Apr 21, 2022
@github-actions
Copy link

github-actions bot commented May 5, 2022

This thread has been automatically locked since there has not been any recent activity after it was closed. If you are still experiencing a similar issue, please open a new bug, including the output of flutter doctor -v and a minimal reproduction of the issue.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 5, 2022
@flutter-triage-bot flutter-triage-bot bot added P0 Critical issues such as a build break or regression and removed P2 labels Jun 28, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
P0 Critical issues such as a build break or regression team-infra Owned by Infrastructure team
Projects
Development

No branches or pull requests

6 participants