Kubelet JSON logging performance test #102430

serathius · 2021-05-29T10:36:52Z

With Kubelet logs fully migrated to structured logging we can measure macro performance improvements related to using JSON logging format. We should use this data to motive more cluster administrators to save on resources.

Requirements:

Test different logging format to compare. Formats --logging-forma=text, --logging-format=json
Test different logging verbosity levels: --v=0 (default), and --v=4
Test Kubelet standard flow (run container, terminate etc)
Repeat test in small clusters or run larger cluster to collect enough data

My first idea would be to utilize pull-kubernetes-e2e-gce-100-performance scenario to create repeatable way to collect Kubelet metric data. Before starting tests we should consult with SIG Scalability/Node how to best measure compare Kubelet performance.

Goal: Motivate further investment in JSON by being able to say "Switching to JSON logging in kubelet saves average of X% CPU and Y% of Memory" in release announcements.

/wg structured-logging
/sig scalability
/sig node
@kubernetes/wg-structured-logging-members

/priority important-soon
Needed to graduate JSON to Beta.

MadhavJivrajani · 2021-05-30T07:33:47Z

/cc @sladyn98
Since you were interested in the testing aspect of it

MadhavJivrajani · 2021-05-30T08:08:16Z

/triage accepted

pacoxu · 2021-05-31T04:45:19Z

#99803 is a kubelet log benchmark example.

krzysiekg · 2021-06-01T17:43:16Z

I am working on this one.
/assign

serathius · 2021-06-01T18:23:52Z

@pacuxu Thanks for the links. Just to clarify in that issue we were working on benchmarking log volume, in this issue we are looking at performance CPU/Memory usage.

serathius · 2021-06-01T18:24:56Z

cc @wojtek-t @mborsz from SIG scalability

serathius · 2021-06-20T12:08:19Z

/milestone v1.22
Part of work go graduate JSON log format.

voigt · 2021-07-20T07:52:51Z

Due to v1.22 code freeze, I'm moving this issue to the next release.

/milestone v1.23

krzysiekg · 2021-07-29T00:09:07Z

@serathius here's how I ran performance tests locally using kind

Build node image

kind build node-image -v 2

Create kind cluster (example config for single node, json format and verbosity 4)

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  image: kindest/node:latest
kubeadmConfigPatches:
- |
  ---
  kind: InitConfiguration
  nodeRegistration:
    kubeletExtraArgs:
      v: "4"
      logging-format: "json"
  ---
  kind: JoinConfiguration
  nodeRegistration:
    kubeletExtraArgs:
      v: "4"
      logging-format: "json"

For single node cluster it's also necessary to set this env, otherwise tests won't compile

export CL2_LOAD_TEST_THROUGHPUT=1

Run tests (from perf-tests/clusterloader2 dir)

go run cmd/clusterloader.go --provider=kind --kubeconfig=/path/to/.kube/config --testconfig=./testing/load/config.yaml --report-dir=./reports -v 4

Kubelet CPU and Mem usage will be in ResourceUsageSummary report

I tested several verbosities, below are results for verbosity 1, 4 and 9 but other were similar

------------------------------------------    ------------------------------------------
format: json, verbosity: 1                    format: text, verbosity: 1                
----|-----|----------|----------|---------    ----|-----|----------|----------|---------
Run |     |      p50 |      p90 |      p99    Run |     |      p50 |      p90 |      p99
----|-----|----------|----------|---------    ----|-----|----------|----------|---------
  1 | CPU |   0.1436 |   0.2044 |   0.2512      1 | CPU |   0.1491 |   0.2006 |   0.2063
    | Mem |   104.96 |   121.68 |   122.21        | Mem |   95.621 |   116.05 |   120.01
----|-----|----------|----------|---------    ----|-----|----------|----------|---------
  2 | CPU |   0.1398 |   0.2035 |   0.2386      2 | CPU |   0.1445 |   0.1841 |   0.1865
    | Mem |   102.86 |   126.75 |   129.87        | Mem |   102.96 |   118.86 |   119.06
----|-----|----------|----------|---------    ----|-----|----------|----------|---------
  3 | CPU |   0.1560 |   0.1843 |   0.2169      3 | CPU |   0.1675 |   0.2164 |   0.2317
    | Mem |   98.007 |   121.90 |   124.83        | Mem |   101.53 |   121.15 |   124.22
----|-----|----------|----------|---------    ----|-----|----------|----------|---------
  4 | CPU |   0.1712 |   0.2100 |   0.2264      4 | CPU |   0.1597 |   0.2391 |   0.2522
    | Mem |   103.75 |   124.05 |   125.59        | Mem |   99.679 |   131.42 |   137.63
----|-----|----------|----------|---------    ----|-----|----------|----------|---------
  5 | CPU |   0.1379 |   0.2075 |   0.2253      5 | CPU |   0.1507 |   0.2167 |   0.2471
    | Mem |   101.05 |   119.41 |   124.23        | Mem |   101.32 |   124.15 |   125.19
----|-----|----------|----------|---------    ----|-----|----------|----------|---------

------------------------------------------    ------------------------------------------
format: json, verbosity: 4                    format: text, verbosity: 4                
----|-----|----------|----------|---------    ----|-----|----------|----------|---------
Run |     |      p50 |      p90 |      p99    Run |     |      p50 |      p90 |      p99
----|-----|----------|----------|---------    ----|-----|----------|----------|---------
  1 | CPU |   0.1388 |   0.2117 |   0.2575      1 | CPU |   0.1444 |   0.1945 |   0.2256
    | Mem |   108.10 |   125.71 |   129.37        | Mem |   98.355 |   119.72 |   121.40
----|-----|----------|----------|---------    ----|-----|----------|----------|---------
  2 | CPU |   0.1746 |   0.2488 |   0.2657      2 | CPU |   0.1541 |   0.2118 |   0.2359
    | Mem |   110.16 |   141.19 |   142.82        | Mem |    106.0 |   125.32 |   130.37
----|-----|----------|----------|---------    ----|-----|----------|----------|---------
  3 | CPU |   0.1758 |   0.2582 |   0.2848      3 | CPU |   0.1574 |   0.2033 |   0.2321
    | Mem |   107.92 |   128.60 |   129.18        | Mem |   97.277 |   119.97 |   120.73
----|-----|----------|----------|---------    ----|-----|----------|----------|---------
  4 | CPU |   0.1720 |   0.2222 |   0.2417      4 | CPU |   0.1629 |   0.2164 |   0.2311
    | Mem |   103.20 |   118.40 |   121.96        | Mem |   103.23 |   120.66 |   121.45
----|-----|----------|----------|---------    ----|-----|----------|----------|---------
  5 | CPU |   0.1460 |   0.2049 |   0.2271      5 | CPU |   0.1681 |   0.2452 |   0.2569
    | Mem |   97.703 |   115.61 |   120.73        | Mem |   107.48 |   129.45 |   132.33
----|-----|----------|----------|---------    ----|-----|----------|----------|---------

------------------------------------------    ------------------------------------------
format: json, verbosity: 9                    format: text, verbosity: 9                
----|-----|----------|----------|---------    ----|-----|----------|----------|---------
Run |     |      p50 |      p90 |      p99    Run |     |      p50 |      p90 |      p99
----|-----|----------|----------|---------    ----|-----|----------|----------|---------
  1 | CPU |   0.1871 |   0.2325 |   0.2705      1 | CPU |   0.2032 |   0.2710 |   0.2875
    | Mem |   99.773 |   117.39 |   121.74        | Mem |   101.43 |   134.34 |   137.10
----|-----|----------|----------|---------    ----|-----|----------|----------|---------
  2 | CPU |   0.1729 |   0.2311 |   0.2528      2 | CPU |   0.1811 |   0.2526 |   0.2681
    | Mem |   109.95 |   128.02 |   129.32        | Mem |   106.52 |   127.25 |   131.73
----|-----|----------|----------|---------    ----|-----|----------|----------|---------
  3 | CPU |   0.1494 |   0.2310 |   0.2634      3 | CPU |   0.1914 |   0.2692 |   0.3050
    | Mem |   111.30 |   136.43 |   139.33        | Mem |   110.10 |   126.37 |   131.41
----|-----|----------|----------|---------    ----|-----|----------|----------|---------
  4 | CPU |   0.1523 |   0.2209 |   0.2841      4 | CPU |   0.1441 |   0.2236 |   0.2324
    | Mem |   106.84 |   136.47 |   143.10        | Mem |   104.33 |   122.18 |   124.48
----|-----|----------|----------|---------    ----|-----|----------|----------|---------
  5 | CPU |   0.1662 |   0.2328 |   0.2988      5 | CPU |   0.1588 |   0.1874 |   0.2300
    | Mem |   104.92 |   140.07 |   142.57        | Mem |   106.81 |   123.46 |   125.25
----|-----|----------|----------|---------    ----|-----|----------|----------|---------

There is no clear winner, json and text performance is similar and varies from run to run

I also run a few tests on bigger clusters (1 cp and up to 9 workers), no significant difference either

Here's how we could test performance on a large scale:

Adding periodic jobs based on pull-kubernetes-e2e-gce-100-performance for each config variant and removing them once we conclude our tests
Run anhoc tests as described in https://github.com/kubernetes/perf-tests/tree/master/adhoc

serathius · 2021-08-04T15:27:54Z

Hmm, looks there is pretty large variance in tests, if there is difference it's smaller than variance. In such cases we will need to use some statistical analysis to see if there is an improvement. I have three ideas to improve quality of tests:

Eliminate as much noise during tests as possible. Basically try to turn off any other programs and disable CPU bursting, best run in VM with preallocated resources.
Focus on most optimistic scenario first. Here it should be verbosity 9 which generates most logs.
Increase number of Runs and assume natural distribution of usage and calculate difference and statistical significance p (I expect 10-20 runs should be enough)

Recommend reading https://about.sourcegraph.com/go/gophercon-2019-optimizing-go-code-without-a-blindfold/
It should explain what is and how to calculate statistical significance p.

Please let me know if you need any help with this. Happy to discuss details on Slack.

ritpanjw · 2021-10-19T10:16:39Z

Hi @krzysiekg , this is bug triage shadow here 👋
There hasn't been any activity on this issue for a couple of months now, Just want to make sure this issue is still on track for the 1.23?

ritpanjw · 2021-11-15T20:16:32Z

Hi @krzysiekg , following back, this is bug triage shadow here 👋
There hasn't been any activity on this issue for a couple of months now, Just want to make sure this issue is still on track for the 1.23?

ritpanjw · 2021-11-17T18:09:56Z

Hi 👋 I'm the bug triage shadow. There hasn't been activity for couple of months. We're in code freeze for 1.23 milestone. I wanted to check if we're targeting to get this issue merged in time for 1.23 release(7/12)

ehashman · 2021-11-17T19:57:53Z

@ritpanjw I don't believe we are. This is not a blocker.

/milestone clear

pohly · 2021-11-17T20:21:23Z

Another relevant aspect is log ingestion.

We would need to be careful about apple/orange comparisons (text import: just split into log messages vs json: full semantic), but to get the full picture we need to consider it.

About variance: I suspect that logging is not causing enough overhead during normal kubelet operation to make measuring small changes in the overhead statistically relevant when measuring total load.

What if we take a different approach:

collect realistic kubelet logs (= under load) in JSON format for different log levels
write a micro benchmark which loads such a file and then measures CPU cycles and memory consumption for writing it in JSON and text format
write a micro benchmark for parsing both

pohly · 2021-11-18T07:09:38Z

"collect ... for different log levels" - this isn't really needed. We now have "v":1 to indicate the verbosity of a log message, so we can do one run and collect at, say, -v9, and then during benchmarking only consider log messages below the log level we are interested in.

serathius · 2021-12-09T16:03:11Z

/unassign @krzysieg
/assign @pohly
/triage accepted

pohly · 2021-12-09T16:05:55Z

#106594 has benchmarks for encoding without caller and timestamping.

For this issue we need variants of those benchmarks with the overhead for caller determination and timestamping.

k8s-triage-robot · 2022-03-09T16:15:31Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

pohly · 2022-03-09T16:33:05Z

/remove-lifecycle stale

k8s-triage-robot · 2022-06-07T17:13:47Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

vaibhav2107 · 2022-06-29T04:27:16Z

/remove-lifecycle stale

k8s-triage-robot · 2022-09-27T04:39:36Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

serathius · 2022-09-27T08:39:06Z

/close

k8s-ci-robot · 2022-09-27T08:39:10Z

@serathius: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

serathius added the kind/feature Categorizes issue or PR as related to a new feature. label May 29, 2021

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 30, 2021

k8s-ci-robot assigned krzysiekg Jun 1, 2021

k8s-ci-robot added this to the v1.22 milestone Jun 20, 2021

k8s-ci-robot modified the milestones: v1.22, v1.23 Jul 20, 2021

k8s-ci-robot removed this from the v1.23 milestone Nov 17, 2021

k8s-ci-robot assigned pohly Dec 9, 2021

This was referenced Dec 9, 2021

sig-instrumentation: increase code coverage for JSON periodic job kubernetes/test-infra#24610

Merged

log message verbosity #106978

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 9, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 9, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 7, 2022

harshanarayana mentioned this issue Jun 23, 2022

structured-logging: replace KObjs with KObjSlice for logging #110747

Merged

1 task

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 29, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 27, 2022

k8s-ci-robot closed this as completed Sep 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubelet JSON logging performance test #102430

Kubelet JSON logging performance test #102430

serathius commented May 29, 2021 •

edited

Loading

MadhavJivrajani commented May 30, 2021

MadhavJivrajani commented May 30, 2021

pacoxu commented May 31, 2021

krzysiekg commented Jun 1, 2021

serathius commented Jun 1, 2021

serathius commented Jun 1, 2021

serathius commented Jun 20, 2021

voigt commented Jul 20, 2021

krzysiekg commented Jul 29, 2021

serathius commented Aug 4, 2021 •

edited

Loading

ritpanjw commented Oct 19, 2021

ritpanjw commented Nov 15, 2021

ritpanjw commented Nov 17, 2021

ehashman commented Nov 17, 2021

pohly commented Nov 17, 2021

pohly commented Nov 18, 2021

serathius commented Dec 9, 2021

pohly commented Dec 9, 2021

k8s-triage-robot commented Mar 9, 2022

pohly commented Mar 9, 2022

k8s-triage-robot commented Jun 7, 2022

vaibhav2107 commented Jun 29, 2022

k8s-triage-robot commented Sep 27, 2022

serathius commented Sep 27, 2022

k8s-ci-robot commented Sep 27, 2022

Kubelet JSON logging performance test #102430

Kubelet JSON logging performance test #102430

Comments

serathius commented May 29, 2021 • edited Loading

MadhavJivrajani commented May 30, 2021

MadhavJivrajani commented May 30, 2021

pacoxu commented May 31, 2021

krzysiekg commented Jun 1, 2021

serathius commented Jun 1, 2021

serathius commented Jun 1, 2021

serathius commented Jun 20, 2021

voigt commented Jul 20, 2021

krzysiekg commented Jul 29, 2021

serathius commented Aug 4, 2021 • edited Loading

ritpanjw commented Oct 19, 2021

ritpanjw commented Nov 15, 2021

ritpanjw commented Nov 17, 2021

ehashman commented Nov 17, 2021

pohly commented Nov 17, 2021

pohly commented Nov 18, 2021

serathius commented Dec 9, 2021

pohly commented Dec 9, 2021

k8s-triage-robot commented Mar 9, 2022

pohly commented Mar 9, 2022

k8s-triage-robot commented Jun 7, 2022

vaibhav2107 commented Jun 29, 2022

k8s-triage-robot commented Sep 27, 2022

serathius commented Sep 27, 2022

k8s-ci-robot commented Sep 27, 2022

serathius commented May 29, 2021 •

edited

Loading

serathius commented Aug 4, 2021 •

edited

Loading