Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace HTTP compression with a more scoped impl, only use on responses > 128KB #77449

Merged
merged 2 commits into from Jul 9, 2019

Conversation

@smarterclayton
Copy link
Contributor

commented May 4, 2019

The previous HTTP compression implementation functioned as an HTTP filter, which required it to deal with a number of special cases that complicated the implementation and prevented it from ever being turned on by default.

Instead, when we write an API object to a response, handle only the one case of a valid Kube object being encoded to the output. This will allow a more limited implementation that does not impact other code flows and is easier to reason about, as well as promote this to beta.

Because Golang clients request gzipping by default, and gzip has a significant CPU cost on small requests, ignore requests to compress objects that are smaller than 128KB in size. The goal of this feature is to reduce bandwidth and latency requirements on large lists, even with chunking, and 128KB is smaller than a 500 pod page but larger than almost any single object request.

Also fixes a bug introduced in #50342 because httpResponseWriterWithInit.Write wasn't a pointer receiver - the init code was called repeatedly:

2019/05/04 19:15:31 http: superfluous response.WriteHeader call from k8s.io/apiserver/pkg/endpoints/handlers/responsewriters.httpResponseWriterWithInit.Write (writers.go:56)
2019/05/04 19:15:31 http: superfluous response.WriteHeader call from k8s.io/apiserver/pkg/endpoints/handlers/responsewriters.httpResponseWriterWithInit.Write (writers.go:56)
2019/05/04 19:15:31 http: superfluous response.WriteHeader call from k8s.io/apiserver/pkg/endpoints/handlers/responsewriters.httpResponseWriterWithInit.Write (writers.go:56)

/kind bug

KEP kubernetes/enhancements#1115

Kubernetes now supports transparent compression of API responses. Clients that send `Accept-Encoding: gzip` will now receive a GZIP compressed response body if the API call was larger than 128KB.  Go clients automatically request gzip-encoding by default and should see reduced transfer times for very large API requests.  Clients in other languages may need to make changes to benefit from compression.
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented May 4, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@smarterclayton smarterclayton force-pushed the smarterclayton:compress_2 branch from 2e5b83d to c1aea48 May 4, 2019

@smarterclayton

This comment has been minimized.

Copy link
Contributor Author

commented May 4, 2019

@liggitt since you reviewed #50342

@smarterclayton smarterclayton force-pushed the smarterclayton:compress_2 branch from c1aea48 to 63522a4 May 4, 2019

@fejta-bot

This comment has been minimized.

Copy link

commented May 4, 2019

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@smarterclayton

This comment has been minimized.

Copy link
Contributor Author

commented May 4, 2019

/retest

@smarterclayton smarterclayton force-pushed the smarterclayton:compress_2 branch from 63522a4 to 7ede87d May 5, 2019

@smarterclayton

This comment has been minimized.

Copy link
Contributor Author

commented May 5, 2019

# 99th for this PR on cluster lists
$ api-100 https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/77449/pull-kubernetes-e2e-gce-100-performance/1124864914138075139/artifacts/APIResponsiveness_density_2019-05-05T03:01:32Z.json
LIST  cronjobs                cluster  5.486   46
LIST  namespaces              cluster  13.926  16
LIST  jobs                    cluster  14.989  46
LIST  persistentvolumes       cluster  26.484  16
LIST  pods                    cluster  31.564  4
LIST  replicationcontrollers  cluster  33.16   6
LIST  services                cluster  42.151  6
LIST  nodes                   cluster  74.265  117

# 99th for master(ish) on cluster lists
$ api-100 https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/77341/pull-kubernetes-e2e-gce-100-performance/1124998416011628545/artifacts/APIResponsiveness_density_2019-05-05T11:51:20Z.json
LIST  services                cluster  1.796    4
LIST  pods                    cluster  5.508    5
LIST  jobs                    cluster  9.609    36
LIST  replicationcontrollers  cluster  15.28    6
LIST  cronjobs                cluster  33.418   36
LIST  namespaces              cluster  86.998   12
LIST  persistentvolumes       cluster  89.04    12
LIST  nodes                   cluster  194.802  90

Same but namespace scoped

○ api-100 https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/77449/pull-kubernetes-e2e-gce-100-performance/1124864914138075139/artifacts/APIResponsiveness_density_2019-05-05T03:01:32Z.json
LIST  limitranges             namespace  3.034    16
LIST  secrets                 namespace  4.852    159
LIST  replicasets             namespace  10.3     16
LIST  ingresses               namespace  17.373   16
LIST  endpoints               namespace  19.825   16
LIST  statefulsets            namespace  22.191   16
LIST  cronjobs                namespace  25.027   16
LIST  persistentvolumeclaims  namespace  42.206   16
LIST  daemonsets              namespace  46.348   16
LIST  pods                    namespace  48.785   1019
LIST  configmaps              namespace  51.115   119
LIST  resourcequotas          namespace  68.432   17
LIST  jobs                    namespace  82.747   16
LIST  replicationcontrollers  namespace  88.394   16
LIST  deployments             namespace  91.022   16
LIST  services                namespace  128.396  16
○ api-100 https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/77341/pull-kubernetes-e2e-gce-100-performance/1124998416011628545/artifacts/APIResponsiveness_density_2019-05-05T11:51:20Z.json
LIST  persistentvolumeclaims  namespace  6.919   12
LIST  secrets                 namespace  7.289   136
LIST  endpoints               namespace  19.084  12
LIST  configmaps              namespace  22.612  79
LIST  cronjobs                namespace  23.684  12
LIST  limitranges             namespace  24.617  14
LIST  daemonsets              namespace  27.742  12
LIST  replicasets             namespace  29.692  12
LIST  statefulsets            namespace  29.859  12
LIST  pods                    namespace  31.246  1015
LIST  replicationcontrollers  namespace  32.53   12
LIST  deployments             namespace  37.013  12
LIST  resourcequotas          namespace  38.098  14
LIST  jobs                    namespace  38.902  12
LIST  services                namespace  43.577  12
LIST  ingresses               namespace  68.877  12

It looks like this brings in the 99th tail on large lists significantly, at a tradeoff of slightly higher latency on small lists. We could potentially tune this at a threshold higher than 16KB - for instance 32KB or 128KB, which would potentially reduce tail latency on lists.

@kubernetes/sig-scalability-pr-reviews

@smarterclayton smarterclayton changed the title Replace HTTP compression with an inline handler Replace HTTP compression with an inline handler, only use on responses > 16KB May 5, 2019

@smarterclayton smarterclayton force-pushed the smarterclayton:compress_2 branch from 7ede87d to 1bf167e May 5, 2019

@smarterclayton

This comment has been minimized.

Copy link
Contributor Author

commented May 5, 2019

/retest

@jennybuckley

This comment has been minimized.

Copy link
Contributor

commented May 6, 2019

/cc @wojtek-t

smarterclayton added some commits May 4, 2019

Replace HTTP compression with an inline handler
The previous HTTP compression implementation functioned as a filter, which
required it to deal with a number of special cases that complicated the
implementation.

Instead, when we write an API object to a response, handle only that one
case. This will allow a more limited implementation that does not impact
other code flows.

Also, to prevent excessive CPU use on small objects, compression is
disabled on responses smaller than 128Kb in size.

@smarterclayton smarterclayton force-pushed the smarterclayton:compress_2 branch from 26d202d to 4ed2b98 Jun 25, 2019

@smarterclayton

This comment has been minimized.

Copy link
Contributor Author

commented Jun 25, 2019

/retest

1 similar comment
@smarterclayton

This comment has been minimized.

Copy link
Contributor Author

commented Jun 26, 2019

/retest

@smarterclayton

This comment has been minimized.

Copy link
Contributor Author

commented Jun 26, 2019

https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/77449/pull-kubernetes-e2e-gce-100-performance/1143560782466781184/artifacts/APIResponsiveness_density_2019-06-25T17:22:09Z.json

LIST  configmaps              cluster  0.143  0.143   0.143    1
LIST  pods                    cluster  3.036  3.468   3.468    7
LIST  services                cluster  1.373  3.677   3.677    4
LIST  persistentvolumes       cluster  0.973  2.805   4.373    12
LIST  cronjobs                cluster  0.654  3.368   5.06     36
LIST  jobs                    cluster  0.866  3.346   5.763    36
LIST  namespaces              cluster  0.981  3.143   9.611    12
LIST  replicationcontrollers  cluster  0.959  21.873  21.873   6
LIST  nodes                   cluster  1.801  17.66   109.361  90

Nodes in particular is 30% below the p99 of perf-dash right now.

https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/77449/pull-kubernetes-e2e-gce-100-performance/1144003791448707072/artifacts/APIResponsiveness_density_2019-06-26T22:33:50Z.json

LIST  services                cluster  1.627  1.714   1.714    4
LIST  pods                    cluster  3.244  5.241   5.241    6
LIST  jobs                    cluster  1.111  5.045   8.967    36
LIST  namespaces              cluster  1.049  5.849   9.068    12
LIST  cronjobs                cluster  0.868  3.973   9.692    36
LIST  replicationcontrollers  cluster  1.39   27.937  27.937   6
LIST  persistentvolumes       cluster  1.263  7.879   49.757   12
LIST  nodes                   cluster  1.97   20.387  205.261  90

Normal

https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/77449/pull-kubernetes-e2e-gce-100-performance/1144281746527752192/artifacts/APIResponsiveness_density_2019-06-27T16:57:09Z.json

LIST  configmaps              cluster  0.204  0.204   0.204    1
LIST  namespaces              cluster  1.033  1.391   1.399    12
LIST  services                cluster  1.265  1.444   1.444    4
LIST  persistentvolumes       cluster  1.146  2.591   4.382    12
LIST  pods                    cluster  3.539  5.169   5.169    5
LIST  replicationcontrollers  cluster  1.778  43.263  43.263   6
LIST  cronjobs                cluster  0.818  13.377  53.772   37
LIST  jobs                    cluster  1.531  9.668   117.324  37
LIST  nodes                   cluster  1.892  21.964  119.049  95

Lower

@wojtek-t

This comment has been minimized.

Copy link
Member

commented Jun 27, 2019

Nodes in particular is 30% below the p99 of perf-dash right now.

There is significant variance there, but I agree the results look very promising.

I would like to also run a test on larger scale, once we get out of the last regressions we have: #79096

@smarterclayton

This comment has been minimized.

Copy link
Contributor Author

commented Jun 27, 2019

The variance before was all measured where we had the bucketing problem I believe.

I would trade some P99 on large requests in cluster for a dramatically reduced P99 outside the cluster. We don't have an easy way to measure that unless we simulate constrained bandwidth for a client.

@wojtek-t

This comment has been minimized.

Copy link
Member

commented Jun 27, 2019

I would trade some P99 on large requests in cluster for a dramatically reduced P99 outside the cluster. We don't have an easy way to measure that unless we simulate constrained bandwidth for a client.

That sounds reasonable to me - I would just like to know how much we are trading (if really this is visible) , I'm definitely NOT saying "we can't do this if it grows at all".

@smarterclayton

This comment has been minimized.

Copy link
Contributor Author

commented Jun 27, 2019

/test pull-kubernetes-e2e-gce-100-performance

@smarterclayton

This comment has been minimized.

Copy link
Contributor Author

commented Jun 27, 2019

Fortunately this is easy to test - we just gate it on or off. All clients are automatically requesting. We can also tune up.

@smarterclayton

This comment has been minimized.

Copy link
Contributor Author

commented Jul 2, 2019

Are we ready to try the larger run test now that the other blocker was resolved?

@wojtek-t

This comment has been minimized.

Copy link
Member

commented Jul 3, 2019

Are we ready to try the larger run test now that the other blocker was resolved?

I asked @krzysied to patch this to the experiments he is running offline, the plan was to run something over night - will get back to you when I know if that happened or not.

@wojtek-t wojtek-t self-assigned this Jul 3, 2019

@wojtek-t

This comment has been minimized.

Copy link
Member

commented Jul 3, 2019

I asked @krzysied to patch this to the experiments he is running offline, the plan was to run something over night - will get back to you when I know if that happened or not.

We don't have full results because we were running some other experiments during that test so it didn't finish. But what we've seen looked good enough so that I'm fine with this PR from scalability POV.

@smarterclayton - do you want me to review the code too (I can do that in the next 2 days).

@smarterclayton

This comment has been minimized.

Copy link
Contributor Author

commented Jul 8, 2019

Yes please. Jordan signed off on the KEP, and seeing the variance over time is the biggest factor. This is ready to review

@wojtek-t
Copy link
Member

left a comment

Just a couple nits - other than that lgtm.

}

// make a best effort to write the object if a failure is detected
utilruntime.HandleError(fmt.Errorf("apiserver was unable to write a JSON response: %v", err))

This comment has been minimized.

Copy link
@wojtek-t

wojtek-t Jul 8, 2019

Member

How do we know that it was JSON?

This comment has been minimized.

Copy link
@wojtek-t

wojtek-t Jul 8, 2019

Member

I see it's moved from a different place, but if I'm not missing something it would make sense to fix this comment if you're touching this code.

This comment has been minimized.

Copy link
@smarterclayton

smarterclayton Jul 8, 2019

Author Contributor

If encode fails, the response is always JSON. The assumption being that encode will not fail (and thus exit early) above. If encode fails, we just spit out what we have.

This comment has been minimized.

Copy link
@wojtek-t

wojtek-t Jul 9, 2019

Member

ok - that makes sense now

if len(encoding) == 0 {
return ""
}
if !utilfeature.DefaultFeatureGate.Enabled(features.APIResponseCompression) {

This comment has been minimized.

Copy link
@wojtek-t

wojtek-t Jul 8, 2019

Member

We probably want to check it as a first thing in this method (to avoid unnecessary work otherwise).

This comment has been minimized.

Copy link
@smarterclayton

smarterclayton Jul 8, 2019

Author Contributor

I had it this way because the feature gate check is probably more expensive than the map lookup. But it could be the other way if you think it's easier to read (I think it would perform slightly worse).

This comment has been minimized.

Copy link
@wojtek-t

wojtek-t Jul 9, 2019

Member

I don't have strong opinion - we can change that later too if needed.

@@ -0,0 +1,303 @@
/*
Copyright 2016 The Kubernetes Authors.

This comment has been minimized.

Copy link
@wojtek-t

wojtek-t Jul 8, 2019

Member

nit: 2019

req: &http.Request{Header: http.Header{}},
wantCode: http.StatusBadRequest,
wantHeaders: http.Header{"Content-Type": []string{"application/json"}},
wantBody: smallPayload,

This comment has been minimized.

Copy link
@wojtek-t

wojtek-t Jul 8, 2019

Member

Why for BadRequest we return an object?

This comment has been minimized.

Copy link
@wojtek-t

wojtek-t Jul 8, 2019

Member

I guess I'm missing something, because you do this in couple other tests below too.

This comment has been minimized.

Copy link
@smarterclayton

smarterclayton Jul 8, 2019

Author Contributor

In this case it's emulating a custom registry endpoint (aggregated api) that returns a valid object but with bad request, which is technically possible. I wanted to have a test that captured behavior that is infrequently used (error code with valid object), but which can show up in some cases (if you had a custom response type that implemented APIStatus, you could return an object + an error code today).

This comment has been minimized.

Copy link
@wojtek-t

wojtek-t Jul 9, 2019

Member

Thanks for explanation - that makes a lot of sense to me now.

statusCode: http.StatusOK,
wantCode: http.StatusNotFound,
wantHeaders: http.Header{"Content-Type": []string{"text/plain"}},
wantBody: []byte("NotFound: \"test\" not found"),

This comment has been minimized.

Copy link
@wojtek-t

wojtek-t Jul 8, 2019

Member

Feel free to ignore, but it seems we have some additional whitespace somewhere in the stack...

This comment has been minimized.

Copy link
@smarterclayton

smarterclayton Jul 8, 2019

Author Contributor

Yeah, I actually was testing the exact output. The two spaces separates an empty resource string (same message as before this PR).

},

{
name: "errors are compressed",

This comment has been minimized.

Copy link
@wojtek-t

wojtek-t Jul 8, 2019

Member

nice test

@wojtek-t
Copy link
Member

left a comment

There is only one nit (s/2014/2019) and I don't want to block this PR on it.

/lgtm

}

// make a best effort to write the object if a failure is detected
utilruntime.HandleError(fmt.Errorf("apiserver was unable to write a JSON response: %v", err))

This comment has been minimized.

Copy link
@wojtek-t

wojtek-t Jul 9, 2019

Member

ok - that makes sense now

if len(encoding) == 0 {
return ""
}
if !utilfeature.DefaultFeatureGate.Enabled(features.APIResponseCompression) {

This comment has been minimized.

Copy link
@wojtek-t

wojtek-t Jul 9, 2019

Member

I don't have strong opinion - we can change that later too if needed.

req: &http.Request{Header: http.Header{}},
wantCode: http.StatusBadRequest,
wantHeaders: http.Header{"Content-Type": []string{"application/json"}},
wantBody: smallPayload,

This comment has been minimized.

Copy link
@wojtek-t

wojtek-t Jul 9, 2019

Member

Thanks for explanation - that makes a lot of sense to me now.

@k8s-ci-robot k8s-ci-robot added the lgtm label Jul 9, 2019

@k8s-ci-robot k8s-ci-robot merged commit 7c7d70b into kubernetes:master Jul 9, 2019

23 checks passed

cla/linuxfoundation smarterclayton authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-csi-serial Skipped.
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gce-iscsi Skipped.
pull-kubernetes-e2e-gce-iscsi-serial Skipped.
pull-kubernetes-e2e-gce-storage-slow Skipped.
pull-kubernetes-godeps Skipped.
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.
tide In merge pool.
Details
@smarterclayton

This comment has been minimized.

Copy link
Contributor Author

commented Jul 9, 2019

Encouraging:

image

Significant reduction in p99 on cluster nodes LIST

Looks like CPU is variable, which is interesting:

image

@smarterclayton

This comment has been minimized.

Copy link
Contributor Author

commented Jul 9, 2019

Scratch that, it looks like there's intersections with the replica set / deployment change yesterday, and the runs are newer.

@smarterclayton

This comment has been minimized.

Copy link
Contributor Author

commented Jul 9, 2019

Even taking the RS/DS change into account, I'd say we're within bounds on p99 (maybe a bit more variability across runs). Will continue to monitor today and tomorrow, but looks like a small win in some flows and no huge impact elsewhere.

New: func() interface{} {
gw, err := gzip.NewWriterLevel(nil, defaultGzipContentEncodingLevel)
if err != nil {
panic(err)

This comment has been minimized.

Copy link
@tedyu

tedyu Jul 9, 2019

Contributor

Should an error be returned here ?

In deferredResponseWriter#Write, the error can be returned to caller.

See #79943

@wojtek-t

This comment has been minimized.

Copy link
Member

commented Jul 10, 2019

Even taking the RS/DS change into account, I'd say we're within bounds on p99 (maybe a bit more variability across runs). Will continue to monitor today and tomorrow, but looks like a small win in some flows and no huge impact elsewher

I think that on the scale of 100-nodes, the change is small enough that it's not easy to say what happened.

When looking into 5k-node scale, things like node-listing, regressed visibly (because those are called from within the cluster, actually even via localhost, where network throughput and latency is not a problem):
http://perf-dash.k8s.io/#/?jobname=gce-5000Nodes&metriccategoryname=APIServer&metricname=LoadResponsiveness&Resource=nodes&Scope=cluster&Subresource=&Verb=LIST
but on deployments (being called externally from test-framework) the signiifcant drop is visible:
http://perf-dash.k8s.io/#/?jobname=gce-5000Nodes&metriccategoryname=APIServer&metricname=LoadResponsiveness&Resource=deployments&Scope=cluster&Subresource=&Verb=LIST
we should also merge it somewhat with this graph, due to change from RCs to Deployments:
http://perf-dash.k8s.io/#/?jobname=gce-5000Nodes&metriccategoryname=APIServer&metricname=LoadResponsiveness&Resource=replicationcontrollers&Scope=cluster&Subresource=&Verb=LIST

So I would say that it's a reasonable tradeoff to take.

@smarterclayton

This comment has been minimized.

Copy link
Contributor Author

commented Jul 10, 2019

I'm wondering whether there is an additional heuristic we could add on the client side to suppress gzip encoding when making requests to localhost. It's kind of a weak heuristic though.

@smarterclayton

This comment has been minimized.

Copy link
Contributor Author

commented Jul 10, 2019

We could also have a way to have clients bypass compression when run in certain modes (i.e. kcm and scheduler disabling it)

@wojtek-t

This comment has been minimized.

Copy link
Member

commented Jul 11, 2019

We could also have a way to have clients bypass compression when run in certain modes (i.e. kcm and scheduler disabling it)

yeah - that is exactly what I was thinking about - add an option to client config to disable compression, default it to false (I mean enable compression by default) and disable it only in scheduler and kcm (which I believe would be enough).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.