Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable throughput for clients to the API server. #447

Merged

Conversation

RaunakShah
Copy link
Contributor

@RaunakShah RaunakShah commented Jun 19, 2020

Adding configurable throughput for clients to the API server.
Use a separate client for leader election go routine.

Testing
Kubernetes version: 1.17.4
CSI driver: Dummy driver that always returns success for CreateVolume() and DeleteVolume() callbacks with randomized VolumeID's.

Test: Use 64-threads to create and delete PVCs in a loop for 5 minutes. Tracking the number of operations successfully executed within this time helps calculate the throughput.
Logs - https://gist.github.com/RaunakShah/ac18639695ff968df84021c1ba12df95

  1. QPS = 5.0, Burst = 10 (Default)
    a. Number of successful Create and Delete operations - 728
    b. Ops/sec - 2.435
    I have not pasted the entire log file here but there were 1780 instances of "Throttling ..." within these 5 minutes.
root@k8-master-440:/var/log/containers# cat !$ | grep Throttl | wc -l
cat vsphere-csi-controller-69cbc685fb-8gn9p_kube-system_csi-provisioner-bc80ce4ae96c1b0025f73b2166da2028fc59be55c9aa5b1ea12d848df0fa3fa6.log | grep Throttl | wc -l
1780
  1. QPS = 10.0, Burst = 20
    a. Number of successful Create and Delete operations - 1404
    b. Ops/sec - 4.69

  2. QPS = 25.0, Burst = 50
    a. Number of successful Create and Delete operations - 1666
    b. Ops/sec - 5.57

  3. QPS = 50.0, Burst = 100
    a. Number of successful Create and Delete operations - 1703
    b. Ops/sec - 5.696
    Zero instances of "Throttling ..." within these 5 minutes.

root@k8-master-440:/var/log/containers# cat !$ | grep Throt | wc -l
cat vsphere-csi-controller-7bcbdbc485-9g9vp_kube-system_csi-provisioner-16fff97dcb70cf23f09004656e5b32e574d59be39d4f1ec5ebd5f3a36b25f29a.log | grep Throt | wc -l
0

What type of PR is this?
/kind bug

What this PR does / why we need it:
This change adds two new flags "kube-api-qps" and "kube-api-burst" to configure the QPS and Burst of clients to the API server. These values default to 5 which is not very efficient at larger scales.
The default QPS and Burst for clients is changed from 5 to 50.
This change also uses a separate client for leader election go routine.

Which issue(s) this PR fixes:

Fixes #416

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Use a separate client for leader election go routine and add kube-api-qps and kube-api-burst configurable parameters for the provisioner's kubernetes client. 

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 19, 2020
@k8s-ci-robot
Copy link
Contributor

Hi @RaunakShah. Thanks for your PR.

I'm waiting for a kubernetes-csi member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jun 19, 2020
@RaunakShah
Copy link
Contributor Author

/assign @xing-yang @jsafrane

@xing-yang
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 19, 2020
cmd/csi-provisioner/csi-provisioner.go Outdated Show resolved Hide resolved
cmd/csi-provisioner/csi-provisioner.go Outdated Show resolved Hide resolved
@@ -74,6 +74,9 @@ var (

defaultFSType = flag.String("default-fstype", "", "The default filesystem type of the volume to provision when fstype is unspecified in the StorageClass. If the default is not set and fstype is unset in the StorageClass, then no fstype will be set")

kubeAPIQPS = flag.Float32("kube-api-qps", 50, "QPS to use while talking with the kubernetes apiserver. Defaults to 50.0.")
kubeAPIBurst = flag.Int("kube-api-burst", 50, "Burst to use while talking with the kubernetes apiserver. Defaults to 50.")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to have different sets of QPS/Burst config parameters for the leader election client and the client for create/delete operations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that too, but it seemed a little superfluous to add two more flags just for a leader election client. There's also a snapshot client that uses the same config, so that would need its own QPS and Burst too then..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine for the snapshot client to use the same config as the client that handles create/delete operations.
Let's see if @msau42 has any suggestions here.

@@ -124,10 +127,15 @@ func main() {
if err != nil {
klog.Fatalf("Failed to create config: %v", err)
}

config.QPS = *kubeAPIQPS
config.Burst = *kubeAPIBurst
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain how Burst interacts with QPS?

Also are you able to get a profile about what operations are coming from the provisioner? I want to make sure we're not hiding inefficiencies or bugs that could be fixed, such as using informers or not rate limiting/backing off. It may be worth talking to sig-scalability or sig-api-machinery about reasonable qps values would be because there are tradeoffs if we give everything a high qps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewForConfig() uses a "token bucket approach". QPS is the guaranteed number of queries that are processed per second. Burst is the size of the bucket/queue that is filled at the rate of QPS. So requests to the API server will only be throttled after there are "Burst" number of items in the bucket/queue.

There are some Get() operations on the API server that could potentially be replaced with informers, but Create()/Update()/Delete() are still limited by the throughput of these clients. I've updated the testing section with more information but using QPS/Burst of 50/100 increases ops/sec of CreateVolume and DeleteVolume from ~2.5 to ~5.5.

Can we potentially tag someone from sig-scalability or sig-api-machinery here for their input?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised we need a qps of 50 to achieve 5.5 ops per second. Does that imply we're making 10 api calls per operation?

cc @wojtek-t if you have any thoughts on reasonable client qps/burst values

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgive my ignorance here, but is csi-provisioner a O(1) instance in the cluster or there can be a lot of them (e.g. one per node or sth like that).
If there is O(1), setting to 50 sounds fine, if there are a lot of them, that sounds like too much too me.

BTW - I second Michelle`s question - if you need 50qps to achieve 5.5ops that doesn't sound very promissing...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RaunakShah thanks for putting together the benchmarks! The results are concerning to me, that increasing qps 10x only achieves 2x in ops throughput. Could you also try increasing qps 2x, and 5x to get some more datapoints?

(Also separately, I think your benchmarking work is very valuable, and we should consider adding it into our CI and also try to get CSI tested in sig-scalability's test framework as well)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup i'll play around a little and add some more results shortly!
I'm not sure if these results are entirely accurate because i've done some hacking with the CSI driver as well, but a reasonable baseline i would think

Copy link
Contributor Author

@RaunakShah RaunakShah Jun 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msau42 i've updated the PR description with the test results. Please take a look!

I also did some further analysis. Interestingly, there were "Throttling ... " messages for four operations:

  1. Posting events:
{"log":"I0625 17:10:17.542869       1 request.go:565] Throttling request took 221.204186ms, request: POST:https://10.96.0.1:443/api/v1/namespaces/default/events\n","stream":"stderr","time":"2020-06-25T17:10:17.54314291Z"}
  1. Posting PVs:
{"log":"I0625 17:10:17.741553       1 request.go:565] Throttling request took 402.941089ms, request: POST:https://10.96.0.1:443/api/v1/persistentvolumes\n","stream":"stderr","time":"2020-06-25T17:10:17.741860753Z"}
  1. Deleting PVs:
{"log":"I0625 17:10:21.741317       1 request.go:565] Throttling request took 3.330344146s, request: DELETE:https://10.96.0.1:443/api/v1/persistentvolumes/pvc-90b8a0f3-6914-4ae3-9744-74fbc6e0f3c4\n","stream":"stderr","time":"2020-06-25T17:10:21.741666156Z"}
  1. Get PVs:
{"log":"I0625 17:10:20.541396       1 request.go:565] Throttling request took 2.367093054s, request: GET:https://10.96.0.1:443/api/v1/persistentvolumes/pvc-c6fca517-345a-42b5-bde1-ea707f16e084\n","stream":"stderr","time":"2020-06-25T17:10:20.541660324Z"}

(1), (2) and (3) above probably can't be replaced but external-provisioner lib does a Get() on the API server for every provisioning and delete operation (ref: https://github.com/kubernetes-sigs/sig-storage-lib-external-provisioner/blob/master/controller/controller.go#L1290 and https://github.com/kubernetes-sigs/sig-storage-lib-external-provisioner/blob/master/controller/controller.go#L1428). The same clientset is used to initialize ProvisionController - https://github.com/kubernetes-csi/external-provisioner/blob/master/cmd/csi-provisioner/csi-provisioner.go#L246

I did another test by adding a pvLister in sig-storage-lib-external-provisioner instead of querying API server and got a ~2x increase in throughput for default qps and burst! I still see "Throttling ... " messages upto a QPS of 25 but not for any Get() operations

  1. QPS = 5.0, Burst = 10
    a. Number of successful Create and Delete operations - 1425
    b. Ops/sec - 4.76
root@k8-master-440:/var/log/containers/with-lib-change# cat qps5.log | wc -l
2450
  1. QPS = 10.0, Burst = 20
    a. Number of successful Create and Delete operations - 1586
    b. Ops/sec - 5.30
root@k8-master-440:/var/log/containers/with-lib-change# cat qps10.log | wc -l
3137
  1. QPS = 25.0, Burst = 50
    a. Number of successful Create and Delete operations - 1694
    b. Ops/sec - 5.66
root@k8-master-440:/var/log/containers/with-lib-change# cat qps25.log | wc -l
0

Since this change doesn't address sig-storage-lib-external-provisioner, i kept this out of the PR description, but i think its worth addressing in subsequent changes..

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's really great! Assuming we can fix lib-external-provisioner, it doesn't look like increasing the default qps/burst values will significantly improve throughput. What do you think of this plan?

  • Make improvements to sig-storage-lib-external-provisioner to use informers
  • Separate out clientset used for leader election
  • Make qps/burst configurable but use client default values as the default
  • Future: tuning other parameters like backoff ratio/interval
  • Future: Add benchmarking integration test using mock driver to our CI
  • Future: Add e2e scalability tests using mock driver to sig-scalability test framework

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, sounds good to me. I've updated this PR accordingly. Should i go ahead and raise an issue/PR/both on lib-external-provisioner?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup that sounds good!

@RaunakShah RaunakShah force-pushed the configurable_throughput branch 2 times, most recently from d101568 to 7778d54 Compare June 26, 2020 03:12
Defaults for QPS and Burst remain unchanged.
This change also creates a new client for leader election.
@msau42
Copy link
Collaborator

msau42 commented Jun 26, 2020

/lgtm
/approve

Can you also extend the release note to say that the leader election client is separate? Go ahead and "/hold cancel" when updated thanks!
/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 26, 2020
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 26, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: msau42, RaunakShah

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 26, 2020
@RaunakShah
Copy link
Contributor Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 26, 2020
@k8s-ci-robot k8s-ci-robot merged commit 6bc88c9 into kubernetes-csi:master Jun 26, 2020
@chrishenzie
Copy link
Contributor

Hi @RaunakShah , do you mind sharing the benchmark setup / tool you used to collect these metrics? I'm trying to gain more context on this issue and am curious how contributors collect these metrics.

kbsonlong pushed a commit to kbsonlong/external-provisioner that referenced this pull request Dec 29, 2023
…go_modules/k8s.io/csi-translation-lib-0.27.3

Bump k8s.io/csi-translation-lib from 0.27.2 to 0.27.3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

frequently switching leader when dynamically provisioning many PV objects at the same time
7 participants