Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FairQueuing: Adjust virtual start time synchronization #91567

Closed
wants to merge 1 commit into from

Conversation

yue9944882
Copy link
Member

NONE

/sig api-machinery
/kind feature

in the original paper, Si -- the virtual start time for pkt i -- is defined as:

Si = MAX(R(t), Fi-1)

currently in our implementation, Fi-1 is calculated from the sum of the current virtual start and an initial guess service time. and the virtual start time is set in the following logic:

  • sets Si to the R(t) for empty queue
  • sets Si to Fi-1 for non-empty queue

mostly, Fi-1 will be greater than R(t) as the guessed service time is the default limit of request time (1 min). however there can be cases when R(t) > Fi-1 which will result in different behaviors than the paper proposed way. for example, a queue keeps serving "mouse" requests but never gets empty. its virtual start time will keep lower and lower than R(t) unboundedly.

this pull updates the computation of virtual start time to sync up w/ the paper.

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 29, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: yue9944882
To complete the pull request process, please assign liggitt
You can assign the PR to them by writing /assign @liggitt in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@MikeSpreitzer
Copy link
Member

@MikeSpreitzer

@MikeSpreitzer
Copy link
Member

Here is an illustrative example. Suppose C=3 and there are two queues. The first queue very nearly always has one request executing and none queued; just before the executing request finishes, another request arrives. The second queue very nearly always has two requests executing and none queued; just before either of the executing requests finishes, another arrives. Thus, at each time when a request finishes executing, there is no choice about which request to dispatch and the imbalanced service is perpetuated. This is the right result because the first queue is requesting half as much service as the second.

The concern here is that the first queue builds up ever more credit, which is the difference in queue final finish times. If ever the first queue switches to offering more than half the load, it will receive an extended burst of more than 50% service.

@MikeSpreitzer
Copy link
Member

MikeSpreitzer commented Jun 1, 2020

There are two layers of confounding facts here: the ability to service multiple requests at once, and the fact that a request's service time is not known until it really finishes. Let us separate them. Consider the problem of fair queuing for a router that is transmitting on C parallel links rather than just 1. Define a round to be sending one bit from each non-emtpy queue. Suppose the links go at 1 Gb/s. The formula for dR/dt is directly analogous to the one for Fair Queuing for Server Requests:

min(sum[over q] packets(q, t), C) * (bits sendable over one link over 1 second) / NEQ(t) .

In the illustrative example above, dR/dt is 1.5 G/s. Start the scenario at t=0s & R=0G. Suppose the first packet of the first queue has length = 1Gb and thus will finish at t=1s. But the virtual finish time of the first queue is R=1G, which arrives earlier at t=2/3 s. From t=2/3 s to when the second packet of the first queue arrives at t=(1-epsilon)s, the first queue is empty in the virtual world and dR/dt is 1 G/s. This is a peculiar stretch of time. The queue is empty in the virtual world and non-empty in the real world. Our current implementation does not really contemplate such a thing. When the first queue's second request arrives at t=(1-epsilon)s, R=(4/3 - epsilon)G and the request's virtual start time should be set to that (not 1G) because the queue was empty in the virtual world. This setting solves one side of the problem of continual buildup of credit. It does not solve the other side. The second queue's virtual finish time grows ever greater than R(t) because it is receiving service at a rate of 2 Gb/s in the real world while dR/dt says it should be getting 1.5 Gb/s for 2/3 of the time (2 Gb/s for the other 1/3 of the time).

Now let's bring in the second confounding fact: the implementation does not know a request's execution duration until it is over. So, in the example above, the accounting for R(t) does not get adjusted at t=2/3 s; only at t=1s is it discovered that the accounting should have been adjusted for t beween 2/3 s and (1-epsilon)s. In this simple example the accounting for a given range of real time has to be revised at most once. But in larger examples, it may be necessary to revise the accounting for a given range of real time multiple times.

@MikeSpreitzer
Copy link
Member

MikeSpreitzer commented Jun 1, 2020

To solve both sides of the credit buildup problem, perhaps we need to look back to the generic max-min fairness solution, and say that a round consists of giving min(mu_fair, reqs(q)) ns of service to each queue q where mu_fair solves ( sum [over q] min(mu_fair, reqs(q)) ) = min(C, sum [over q] reqs(q)). In the simple example, almost all the time this will say that a round consists of giving 1 ns of service to the first queue and 2 ns of service to the second queue.

@yue9944882
Copy link
Member Author

yue9944882 commented Jun 1, 2020

giving min(mu_fair, reqs(q)) ns of service to each queue q where mu_fair solves C = sum [over q] min(mu_fair, reqs(q))

@MikeSpreitzer can you elaborate how to compute mu_fair dynamically in a running apiserver? i can think of two approaches and their defects:

  • computes mu_fair every round (upon every request): it works at the expense of locking and looping queues in a priority-level for every request. the performance impact can get worse as the # of queues of the priority-level grows.
  • computes mu_fair every priority-level periodically: it will bring less performance cost but it can't handle unstable/bursty concurrency (total exeucting requests) properly. but probably computing moving average of the mu_fair will help a bit.

@fedebongio
Copy link
Contributor

/assign @lavalamp

@lavalamp
Copy link
Member

lavalamp commented Jun 2, 2020

Please make a test that demonstrates the difference.

@MikeSpreitzer
Copy link
Member

Without denying the usefulness of a test, the problem can be seen with a simple thought experiment, as outlined in #91567 (comment) .

@lavalamp
Copy link
Member

lavalamp commented Jun 2, 2020

A test demonstrates that the thought experiment matches up with a real experiment, and helps all future maintainers preserve the property we believe this change gives us.

@k8s-ci-robot
Copy link
Contributor

@yue9944882: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 20, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 18, 2020
@yue9944882
Copy link
Member Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 21, 2020
@yue9944882
Copy link
Member Author

closing in favor of #95986

@yue9944882 yue9944882 closed this Nov 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/apiserver cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants