-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[scheduler] Add batch scheduling logic #1685
Changes from 3 commits
9444d6f
510f0b3
56b4f75
d72a3c1
ad4591b
70a7cfc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
// Copyright 2021 The Flutter Authors. All rights reserved. | ||
// Use of this source code is governed by a BSD-style license that can be | ||
// found in the LICENSE file. | ||
|
||
import 'package:cocoon_service/src/service/datastore.dart'; | ||
|
||
import '../../model/appengine/task.dart'; | ||
import '../logging.dart'; | ||
import '../luci_build_service.dart'; | ||
|
||
/// Interface for implementing various scheduling policies in the Cocoon scheduler. | ||
abstract class SchedulerPolicy { | ||
/// Returns the priority of [Task]. | ||
/// | ||
/// If null is returned, the task should not be scheduled. | ||
Future<int?> triggerPriority({ | ||
required Task task, | ||
required DatastoreService datastore, | ||
}); | ||
} | ||
|
||
/// Every [Task] is triggered to run. | ||
class GuranteedPolicy implements SchedulerPolicy { | ||
@override | ||
Future<int?> triggerPriority({ | ||
required Task task, | ||
required DatastoreService datastore, | ||
}) async => | ||
LuciBuildService.kDefaultPriority; | ||
} | ||
|
||
/// [Task] is run at least every 3 commits. | ||
/// | ||
/// If there is capacity, a backfiller cron triggers the latest task that was not run | ||
/// to ensure ToT is always tested. | ||
/// | ||
/// This is intended for targets that are run in an infra pool that has limited capacity, | ||
/// such as the on device tests in the DeviceLab. | ||
class BatchPolicy implements SchedulerPolicy { | ||
static const int kBatchSize = 3; | ||
@override | ||
Future<int?> triggerPriority({ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: add a doc explaining the policy for different cases. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added! |
||
required Task task, | ||
required DatastoreService datastore, | ||
}) async { | ||
final List<Task> recentTasks = await datastore.queryRecentTasksByName(name: task.name!).toList(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For a task with recent statuses: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To keep the PRs small, I deferred the backfilling logic to the next PR (overall work tracked in flutter/flutter#100793). The plan is for the backfill cron job to detect new-pass columns and trigger a build (but skip on running-pass columns). |
||
// Ensure task isn't considered in recentTasks | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why do we need this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the case we split the scheduling logic from the tasks. In the current PR state, it shouldn't happen. However, we need to move the post-submit triggering logic to be based on GoB events (flutter/flutter#100911) The returned column may or may not include the task we're scheduling for, so as a preemptive measure we remove it if it exists. |
||
recentTasks.removeWhere((Task t) => t.commitKey == task.commitKey); | ||
if (recentTasks.length < kBatchSize - 1) { | ||
log.warning('${task.name} has less than $kBatchSize, triggerring all builds regardless of policy'); | ||
return LuciBuildService.kDefaultPriority; | ||
} | ||
|
||
// Prioritize tasks that recently failed. | ||
if (_isFailed(recentTasks[0]) || _isFailed(recentTasks[1])) { | ||
return LuciBuildService.kRerunPriority; | ||
} | ||
|
||
if (recentTasks[0].status == Task.statusNew && recentTasks[1].status == Task.statusNew) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Rethink this part: the current logic seems schedule tasks in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added a test for this to show the batch size of Yes, if there. is new, new, new, new it would schedule, but that shouldn't happen in prod (it's more of a theoretical). |
||
return LuciBuildService.kDefaultPriority; | ||
} | ||
|
||
return null; | ||
} | ||
|
||
bool _isFailed(Task task) { | ||
return task.status == Task.statusFailed || task.status == Task.statusInfraFailure; | ||
} | ||
} | ||
|
||
/// [Task] run outside of Cocoon are not triggered by the Cocoon scheduler. | ||
class OmitPolicy implements SchedulerPolicy { | ||
@override | ||
Future<int?> triggerPriority({ | ||
required Task task, | ||
required DatastoreService datastore, | ||
}) async => | ||
null; | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
// Copyright 2021 The Flutter Authors. All rights reserved. | ||
// Use of this source code is governed by a BSD-style license that can be | ||
// found in the LICENSE file. | ||
|
||
import 'package:cocoon_service/src/model/appengine/task.dart'; | ||
import 'package:cocoon_service/src/service/datastore.dart'; | ||
import 'package:cocoon_service/src/service/luci_build_service.dart'; | ||
import 'package:cocoon_service/src/service/scheduler/policy.dart'; | ||
import 'package:test/test.dart'; | ||
|
||
import '../../src/datastore/fake_datastore.dart'; | ||
import '../../src/utilities/entity_generators.dart'; | ||
|
||
void main() { | ||
group('BatchPolicy', () { | ||
late FakeDatastoreDB db; | ||
late DatastoreService datastore; | ||
|
||
final BatchPolicy policy = BatchPolicy(); | ||
|
||
setUp(() { | ||
db = FakeDatastoreDB(); | ||
datastore = DatastoreService(db, 5); | ||
}); | ||
|
||
final List<Task> allPending = <Task>[ | ||
generateTask(2), | ||
generateTask(1), | ||
]; | ||
|
||
final List<Task> latestFinishedButRestPending = <Task>[ | ||
generateTask(2, status: Task.statusSucceeded), | ||
generateTask(1), | ||
]; | ||
|
||
final List<Task> latestFailed = <Task>[ | ||
generateTask(2, status: Task.statusFailed), | ||
generateTask(1), | ||
]; | ||
|
||
final List<Task> latestPending = <Task>[ | ||
generateTask(2), | ||
generateTask(1, status: Task.statusSucceeded), | ||
]; | ||
|
||
test('triggers after batch size', () async { | ||
db.addOnQuery<Task>((Iterable<Task> results) => allPending); | ||
expect( | ||
await policy.triggerPriority(task: generateTask(3), datastore: datastore), LuciBuildService.kDefaultPriority); | ||
}); | ||
|
||
test('triggers with higher priority on recent failures', () async { | ||
db.addOnQuery<Task>((Iterable<Task> results) => latestFailed); | ||
expect( | ||
await policy.triggerPriority(task: generateTask(4), datastore: datastore), LuciBuildService.kRerunPriority); | ||
}); | ||
|
||
test('does not trigger when a test was recently scheduled', () async { | ||
db.addOnQuery<Task>((Iterable<Task> results) => latestFinishedButRestPending); | ||
expect(await policy.triggerPriority(task: generateTask(4), datastore: datastore), isNull); | ||
}); | ||
|
||
test('does not trigger when pending queue is smaller than batch', () async { | ||
db.addOnQuery<Task>((Iterable<Task> results) => latestPending); | ||
expect(await policy.triggerPriority(task: generateTask(3), datastore: datastore), isNull); | ||
}); | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a race condition here. The refresh-chromebot-status API may update the task back to
New
if the task is not scheduled yet.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ciYaml.getPostSubmitBuilders
is used in RefreshChromebotStatus, and it only returns targets based on the luci scheduler. This scheduling logic is only for targets with the cocoon scheduler.