Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[scheduler] Add batch scheduling logic #1685

Merged
merged 6 commits into from
Mar 31, 2022

Conversation

CaseyHillers
Copy link
Contributor

Adds batch scheduling logic for devicelab tests. When the platform contains ios or android, builds will only be triggered in batches of 3. This means the previous 2 tasks must be marked as new.

Additionally, marks builds that are triggered as in progress (an indicator they've been scheduled)

Issues

flutter/flutter#100793
flutter/flutter#92300

Future work

Continue flutter/flutter#100793 by adding backfilling when there is capacity. Once that PR lands, the remaining targets in flutter/flutter can be migrated to the cocoon scheduler.

@CaseyHillers
Copy link
Contributor Author

Friendly ping

required DatastoreService datastore,
}) async {
final List<Task> recentTasks = await datastore.queryRecentTasksByName(name: task.name!).toList();
// Ensure task isn't considered in recentTasks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case we split the scheduling logic from the tasks. In the current PR state, it shouldn't happen. However, we need to move the post-submit triggering logic to be based on GoB events (flutter/flutter#100911)

The returned column may or may not include the task we're scheduling for, so as a preemptive measure we remove it if it exists.

required Task task,
required DatastoreService datastore,
}) async {
final List<Task> recentTasks = await datastore.queryRecentTasksByName(name: task.name!).toList();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a task with recent statuses:
new
pass
pass
pass
pass
, it will not be scheduled based on current logic. What if there is no new commit for 10 hours? We should run TOT tests regularly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep the PRs small, I deferred the backfilling logic to the next PR (overall work tracked in flutter/flutter#100793).

The plan is for the backfill cron job to detect new-pass columns and trigger a build (but skip on running-pass columns).

test('triggers after batch size', () async {
db.addOnQuery<Task>((Iterable<Task> results) => allPending);
expect(
await policy.triggerPriority(task: generateTask(4), datastore: datastore), LuciBuildService.kDefaultPriority);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a batch with size 4. shall we test a batch size 3?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I made the test 3, and I added another test for when there is 2 pending tasks.

class BatchPolicy implements SchedulerPolicy {
static const int kBatchSize = 3;
@override
Future<int?> triggerPriority({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: add a doc explaining the policy for different cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added!

final int? priority = await policy.triggerPriority(task: task, datastore: datastore);
if (priority != null) {
// Mark task as in progress to ensure it isn't scheduled over
task.status = Task.statusInProgress;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a race condition here. The refresh-chromebot-status API may update the task back to New if the task is not scheduled yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ciYaml.getPostSubmitBuilders is used in RefreshChromebotStatus, and it only returns targets based on the luci scheduler. This scheduling logic is only for targets with the cocoon scheduler.

return LuciBuildService.kRerunPriority;
}

if (recentTasks[0].status == Task.statusNew && recentTasks[1].status == Task.statusNew) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rethink this part: the current logic seems schedule tasks in batch size >= kBatchSize. For example, if we have new, new, new, new, then we will batch schedule the latest in a size of 4. Or will our logic guarantee this case will not happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test for this to show the batch size of kBatchSize is respected.

Yes, if there. is new, new, new, new it would schedule, but that shouldn't happen in prod (it's more of a theoretical).

Copy link
Contributor

@keyonghan keyonghan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@CaseyHillers CaseyHillers added the waiting for tree to go green Merge PR when tree becomes green via fluttergithubbot label Mar 31, 2022
@fluttergithubbot
Copy link

This pull request is not suitable for automatic merging in its current state.

  • The status or check suite Linux Cocoon has failed. Please fix the issues identified (or deflake) before re-applying this label.

@fluttergithubbot fluttergithubbot removed the waiting for tree to go green Merge PR when tree becomes green via fluttergithubbot label Mar 31, 2022
@CaseyHillers CaseyHillers added the waiting for tree to go green Merge PR when tree becomes green via fluttergithubbot label Mar 31, 2022
@fluttergithubbot fluttergithubbot merged commit 72f9823 into flutter:main Mar 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting for tree to go green Merge PR when tree becomes green via fluttergithubbot
Projects
None yet
3 participants