FairQueuing: Adjust virtual start time synchronization #91567

yue9944882 · 2020-05-29T09:39:52Z

NONE

/sig api-machinery
/kind feature

in the original paper, Si -- the virtual start time for pkt i -- is defined as:

Si = MAX(R(t), Fi-1)

currently in our implementation, Fi-1 is calculated from the sum of the current virtual start and an initial guess service time. and the virtual start time is set in the following logic:

sets Si to the R(t) for empty queue
sets Si to Fi-1 for non-empty queue

mostly, Fi-1 will be greater than R(t) as the guessed service time is the default limit of request time (1 min). however there can be cases when R(t) > Fi-1 which will result in different behaviors than the paper proposed way. for example, a queue keeps serving "mouse" requests but never gets empty. its virtual start time will keep lower and lower than R(t) unboundedly.

this pull updates the computation of virtual start time to sync up w/ the paper.

k8s-ci-robot · 2020-05-29T09:41:23Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: yue9944882
To complete the pull request process, please assign liggitt
You can assign the PR to them by writing /assign @liggitt in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

staging/src/k8s.io/apiserver/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

MikeSpreitzer · 2020-06-01T01:48:24Z

@MikeSpreitzer

MikeSpreitzer · 2020-06-01T06:18:13Z

Here is an illustrative example. Suppose C=3 and there are two queues. The first queue very nearly always has one request executing and none queued; just before the executing request finishes, another request arrives. The second queue very nearly always has two requests executing and none queued; just before either of the executing requests finishes, another arrives. Thus, at each time when a request finishes executing, there is no choice about which request to dispatch and the imbalanced service is perpetuated. This is the right result because the first queue is requesting half as much service as the second.

The concern here is that the first queue builds up ever more credit, which is the difference in queue final finish times. If ever the first queue switches to offering more than half the load, it will receive an extended burst of more than 50% service.

MikeSpreitzer · 2020-06-01T06:48:35Z

There are two layers of confounding facts here: the ability to service multiple requests at once, and the fact that a request's service time is not known until it really finishes. Let us separate them. Consider the problem of fair queuing for a router that is transmitting on C parallel links rather than just 1. Define a round to be sending one bit from each non-emtpy queue. Suppose the links go at 1 Gb/s. The formula for dR/dt is directly analogous to the one for Fair Queuing for Server Requests:

min(sum[over q] packets(q, t), C) * (bits sendable over one link over 1 second) / NEQ(t) .

In the illustrative example above, dR/dt is 1.5 G/s. Start the scenario at t=0s & R=0G. Suppose the first packet of the first queue has length = 1Gb and thus will finish at t=1s. But the virtual finish time of the first queue is R=1G, which arrives earlier at t=2/3 s. From t=2/3 s to when the second packet of the first queue arrives at t=(1-epsilon)s, the first queue is empty in the virtual world and dR/dt is 1 G/s. This is a peculiar stretch of time. The queue is empty in the virtual world and non-empty in the real world. Our current implementation does not really contemplate such a thing. When the first queue's second request arrives at t=(1-epsilon)s, R=(4/3 - epsilon)G and the request's virtual start time should be set to that (not 1G) because the queue was empty in the virtual world. This setting solves one side of the problem of continual buildup of credit. It does not solve the other side. The second queue's virtual finish time grows ever greater than R(t) because it is receiving service at a rate of 2 Gb/s in the real world while dR/dt says it should be getting 1.5 Gb/s for 2/3 of the time (2 Gb/s for the other 1/3 of the time).

Now let's bring in the second confounding fact: the implementation does not know a request's execution duration until it is over. So, in the example above, the accounting for R(t) does not get adjusted at t=2/3 s; only at t=1s is it discovered that the accounting should have been adjusted for t beween 2/3 s and (1-epsilon)s. In this simple example the accounting for a given range of real time has to be revised at most once. But in larger examples, it may be necessary to revise the accounting for a given range of real time multiple times.

MikeSpreitzer · 2020-06-01T07:51:11Z

To solve both sides of the credit buildup problem, perhaps we need to look back to the generic max-min fairness solution, and say that a round consists of giving min(mu_fair, reqs(q)) ns of service to each queue q where mu_fair solves ( sum [over q] min(mu_fair, reqs(q)) ) = min(C, sum [over q] reqs(q)). In the simple example, almost all the time this will say that a round consists of giving 1 ns of service to the first queue and 2 ns of service to the second queue.

yue9944882 · 2020-06-01T16:12:19Z

giving min(mu_fair, reqs(q)) ns of service to each queue q where mu_fair solves C = sum [over q] min(mu_fair, reqs(q))

@MikeSpreitzer can you elaborate how to compute mu_fair dynamically in a running apiserver? i can think of two approaches and their defects:

computes mu_fair every round (upon every request): it works at the expense of locking and looping queues in a priority-level for every request. the performance impact can get worse as the # of queues of the priority-level grows.
computes mu_fair every priority-level periodically: it will bring less performance cost but it can't handle unstable/bursty concurrency (total exeucting requests) properly. but probably computing moving average of the mu_fair will help a bit.

fedebongio · 2020-06-02T20:05:13Z

/assign @lavalamp

lavalamp · 2020-06-02T20:13:09Z

Please make a test that demonstrates the difference.

MikeSpreitzer · 2020-06-02T20:27:41Z

Without denying the usefulness of a test, the problem can be seen with a simple thought experiment, as outlined in #91567 (comment) .

lavalamp · 2020-06-02T20:40:03Z

A test demonstrates that the thought experiment matches up with a real experiment, and helps all future maintainers preserve the property we believe this change gives us.

k8s-ci-robot · 2020-06-20T20:58:30Z

@yue9944882: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

fejta-bot · 2020-09-18T21:24:15Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

yue9944882 · 2020-09-21T02:22:58Z

/remove-lifecycle stale

yue9944882 · 2020-11-03T08:42:21Z

closing in favor of #95986

revises vstart setter

57cf7b6

k8s-ci-robot requested review from apelisse and wojtek-t May 29, 2020 09:41

k8s-ci-robot added the area/apiserver label May 29, 2020

k8s-ci-robot assigned lavalamp Jun 2, 2020

MikeSpreitzer mentioned this pull request Jun 4, 2020

Add test for windup problem in APF #91761

Merged

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 20, 2020

yue9944882 mentioned this pull request Sep 16, 2020

*: Implement etcd QoS POC etcd-io/etcd#12290

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 18, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 21, 2020

MikeSpreitzer mentioned this pull request Oct 29, 2020

API Priority and Fairness aims for even allocation rather than min-max fairness #95979

Closed

yue9944882 closed this Nov 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FairQueuing: Adjust virtual start time synchronization #91567

FairQueuing: Adjust virtual start time synchronization #91567

yue9944882 commented May 29, 2020

k8s-ci-robot commented May 29, 2020

MikeSpreitzer commented Jun 1, 2020

MikeSpreitzer commented Jun 1, 2020

MikeSpreitzer commented Jun 1, 2020 •

edited

MikeSpreitzer commented Jun 1, 2020 •

edited

yue9944882 commented Jun 1, 2020 •

edited

fedebongio commented Jun 2, 2020

lavalamp commented Jun 2, 2020

MikeSpreitzer commented Jun 2, 2020

lavalamp commented Jun 2, 2020

k8s-ci-robot commented Jun 20, 2020

fejta-bot commented Sep 18, 2020

yue9944882 commented Sep 21, 2020

yue9944882 commented Nov 3, 2020

FairQueuing: Adjust virtual start time synchronization #91567

FairQueuing: Adjust virtual start time synchronization #91567

Conversation

yue9944882 commented May 29, 2020

k8s-ci-robot commented May 29, 2020

MikeSpreitzer commented Jun 1, 2020

MikeSpreitzer commented Jun 1, 2020

MikeSpreitzer commented Jun 1, 2020 • edited

MikeSpreitzer commented Jun 1, 2020 • edited

yue9944882 commented Jun 1, 2020 • edited

fedebongio commented Jun 2, 2020

lavalamp commented Jun 2, 2020

MikeSpreitzer commented Jun 2, 2020

lavalamp commented Jun 2, 2020

k8s-ci-robot commented Jun 20, 2020

fejta-bot commented Sep 18, 2020

yue9944882 commented Sep 21, 2020

yue9944882 commented Nov 3, 2020

MikeSpreitzer commented Jun 1, 2020 •

edited

MikeSpreitzer commented Jun 1, 2020 •

edited

yue9944882 commented Jun 1, 2020 •

edited