-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[scheduler] Add batch scheduling logic #1685
Conversation
required DatastoreService datastore, | ||
}) async { | ||
final List<Task> recentTasks = await datastore.queryRecentTasksByName(name: task.name!).toList(); | ||
// Ensure task isn't considered in recentTasks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case we split the scheduling logic from the tasks. In the current PR state, it shouldn't happen. However, we need to move the post-submit triggering logic to be based on GoB events (flutter/flutter#100911)
The returned column may or may not include the task we're scheduling for, so as a preemptive measure we remove it if it exists.
required Task task, | ||
required DatastoreService datastore, | ||
}) async { | ||
final List<Task> recentTasks = await datastore.queryRecentTasksByName(name: task.name!).toList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a task with recent statuses:
new
pass
pass
pass
pass
, it will not be scheduled based on current logic. What if there is no new commit for 10 hours? We should run TOT tests regularly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep the PRs small, I deferred the backfilling logic to the next PR (overall work tracked in flutter/flutter#100793).
The plan is for the backfill cron job to detect new-pass columns and trigger a build (but skip on running-pass columns).
test('triggers after batch size', () async { | ||
db.addOnQuery<Task>((Iterable<Task> results) => allPending); | ||
expect( | ||
await policy.triggerPriority(task: generateTask(4), datastore: datastore), LuciBuildService.kDefaultPriority); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a batch with size 4. shall we test a batch size 3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I made the test 3, and I added another test for when there is 2 pending tasks.
class BatchPolicy implements SchedulerPolicy { | ||
static const int kBatchSize = 3; | ||
@override | ||
Future<int?> triggerPriority({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: add a doc explaining the policy for different cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added!
final int? priority = await policy.triggerPriority(task: task, datastore: datastore); | ||
if (priority != null) { | ||
// Mark task as in progress to ensure it isn't scheduled over | ||
task.status = Task.statusInProgress; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a race condition here. The refresh-chromebot-status API may update the task back to New
if the task is not scheduled yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ciYaml.getPostSubmitBuilders
is used in RefreshChromebotStatus, and it only returns targets based on the luci scheduler. This scheduling logic is only for targets with the cocoon scheduler.
return LuciBuildService.kRerunPriority; | ||
} | ||
|
||
if (recentTasks[0].status == Task.statusNew && recentTasks[1].status == Task.statusNew) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rethink this part: the current logic seems schedule tasks in batch size >= kBatchSize
. For example, if we have new, new, new, new
, then we will batch schedule the latest in a size of 4. Or will our logic guarantee this case will not happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test for this to show the batch size of kBatchSize
is respected.
Yes, if there. is new, new, new, new it would schedule, but that shouldn't happen in prod (it's more of a theoretical).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This pull request is not suitable for automatic merging in its current state.
|
Adds batch scheduling logic for devicelab tests. When the platform contains
ios
orandroid
, builds will only be triggered in batches of 3. This means the previous 2 tasks must be marked as new.Additionally, marks builds that are triggered as in progress (an indicator they've been scheduled)
Issues
flutter/flutter#100793
flutter/flutter#92300
Future work
Continue flutter/flutter#100793 by adding backfilling when there is capacity. Once that PR lands, the remaining targets in flutter/flutter can be migrated to the cocoon scheduler.