-
Notifications
You must be signed in to change notification settings - Fork 20
Conversation
Codecov Report
@@ Coverage Diff @@
## master #73 +/- ##
=======================================
Coverage 64.29% 64.29%
=======================================
Files 24 24
Lines 2644 2644
=======================================
Hits 1700 1700
Misses 944 944 Continue to review full report at Codecov.
|
- All session creation request is enqueued to job queue and return response immediately without waiting for actual kernel creation. - After a kernel creation request comes in or a kernel is terminated, then `ScalingGroup.schedule()` is invoked. It calls `AbstractJobScheduler.schedule()` and actually create kernels returned by scheduler.
- Scale in/out when a kernel is created/terminated or an agent is joined/left. - Note that scaling always preceed scheduling. When scaling up, scaling does not impact on available resource shares immediately since starting a new instance takes considerably long time, whereas scheduling depends on current available resource shares. Therefore, the order of scheduling and scaling does not matter when scaling up. However, when scaling down, AbstractScalingDriver should "mark" agents to terminate in the (near) future, and scheduler should avoid assigning kernels to those agents. This forces scaling down to be done before scheduling.
Scaling group
Design?
|
1d5d39f
to
b69ba4c
Compare
I think I need some more time to design interfaces. Please wait for my PR review request through offline. |
We also need to consider some corner cases like this:
|
Suggestion: I think So, I think it is good to separate AbstractScalingDriver into AbstractScalingDriver and VenderScalingDriverMixin. Is it seemed to be over-engineering? |
8a42e4a
to
cd93502
Compare
145c893
to
82a5d2d
Compare
Let job scheduler to determine required instances to schedule pending jobs, and scaling driver should depend on this when scaling out or in
For example, do not schedule jobs without gpu to agents with gpu slots
This is now considered replaced with #167. |
Let's implement policy-based scalers and scaling groups.
Concept:
Policy options:
Batch job functions:
JobSpecAdaptor
Related: