Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gomaxprocs: Use a ticker (30s) to set gomaxprocs #8278

Merged
merged 9 commits into from
Jun 8, 2022

Conversation

marclop
Copy link
Contributor

@marclop marclop commented Jun 1, 2022

Motivation/summary

Adds a new goroutine which tries to refresh the gomaxprocs setting every
30s. This accounts for cases where the CFS quotas may be refreshed
without the APM Server process being restarted (like in ESS / ECE).

Checklist

- [ ] Update CHANGELOG.asciidoc
- [ ] Update package changelog.yml (only if changes to apmpackage have been made)
- [ ] Documentation has been updated

How to test these changes

  1. docker-compose up -d
  2. cd systemtest/cmd/runapm && go run main.go, copy the image that is used.
  3. Inspect the created container and obtain the FLEET_ENROLLMENT_TOKEN from the runapm container (you can stop the runapm container afterwards).
  4. Create a docker-compose.override.yml in the root of the APM Server with these contents:
services:
  apm-server:
    image: USE_BUILDAPM_RESULTING_IMAGE
    cpus: 2.7
    ports:
      - 8200:8200
    environment:
      - "FLEET_URL=https://fleet-server:8220"
      - "FLEET_CA=/etc/pki/tls/certs/fleet-ca.pem"
      - "FLEET_ENROLL=1"
      - "FLEET_ENROLLMENT_TOKEN=USE_RUNAPM_ENROLLMENT_TOKEN"
    healthcheck:
      test: ["CMD-SHELL", "curl -s http://localhost:8200 | grep -q 'true'"]
      retries: 300
      interval: 1s
    depends_on:
      elasticsearch: { condition: service_healthy }
      kibana: { condition: service_healthy }
      fleet-server: { condition: service_healthy }
    volumes:
      - "./testing/docker/fleet-server/certificate.pem:/etc/pki/tls/certs/fleet-ca.pem"
  1. docker-compose up -d. Verify that the apm-server container is reachable.
  2. Run: docker update -cpus=1 <CONTAINER ID>
  3. Observe a log message (In the observability logs UI) indicating that the CPU quota has been updated to 1.

Related issues

Part of #7967

Adds a new goroutine which tries to refresh the gomaxprocs setting every
`30s`. This accounts for cases where the CFS quotas may be refreshed
without the APM Server process being restarted (like in ESS / ECE).

Signed-off-by: Marc Lopez Rubio <marc5.12@outlook.com>
@marclop marclop added enhancement backport-8.3 Automated backport with mergify labels Jun 1, 2022
@apmmachine
Copy link
Collaborator

apmmachine commented Jun 1, 2022

❕ Build Aborted

Either there was a build timeout or someone aborted the build.

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Start Time: 2022-06-07T09:41:36.168+0000

  • Duration: 123 min 15 sec

Test stats 🧪

Test Results
Failed 0
Passed 2742
Skipped 9
Total 2751

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /hey-apm : Run the hey-apm benchmark.

  • /package : Generate and publish the docker images.

  • /test windows : Build & tests on Windows.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@apmmachine
Copy link
Collaborator

apmmachine commented Jun 1, 2022

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 100.0% (42/42) 💚
Files 91.96% (183/199) 👍
Classes 93.435% (427/457) 👍
Methods 89.218% (1084/1215) 👍 0.018
Lines 76.812% (13191/17173) 👍 0.014
Conditionals 100.0% (0/0) 💚

beater/beater.go Outdated Show resolved Hide resolved
Signed-off-by: Marc Lopez Rubio <marc5.12@outlook.com>
beater/beater.go Outdated Show resolved Hide resolved
Signed-off-by: Marc Lopez Rubio <marc5.12@outlook.com>
Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still in draft, but this looks good to go :)

@marclop
Copy link
Contributor Author

marclop commented Jun 8, 2022

Benchmark results

I benchmarked this PR changes vs 8.2.2 since that's the last version that didn't include the gomaxprocs
change. The summary is that using the autogomaxprocs library to limit the Go runtime's gomaxprocs utilizes
the granted CPU quota more efficiently, results in significantly less throttling, and translates into more
performance for the APM Server, all sizes but particularly for the smaller APM Server sizes (1-4GB of RAM).

In ESS, containers with RAM <16GB will have CPU credits granted to them over the period of time where they
aren't consuming more vCPU quota than granted (see table below). This allows the containers to consume more
cpu time than they would be otherwise allowed to. This "boosted" quota is equal to the vCPU an instance with
16GB would have.

Since the APM Server's maximum instance size is 8gb of RAM, boosting will always be a factor. The benchmark
warm-up mostly depleted the CPU credits before the actual benchmarks began, but it was increasingly harder to
do that for the 8GB APM Server size since the GOMAXPROCS value is set to 1 (vCPU 1.35 = 1).

Containers <4GB will be granted an extra 1GB of vCPU quota. For instance, 1GB will have the vCPU of a 2GB
container, 2GB will have the vCPU a 3GB container would have and a 4GB container will have the vCPU of a 4GB
container. The 4 to 8GB increment will double, as it would be expected.

Size vCPU quota (CPU cores) Performance Delta vs 8.2.2 Agents
1gb 0.3385 ~75% 64
2gb 0.5577 >100% 64
4gb 0.6769 >100% 64
8gb 1.3539 >36-45% 384

Detailed benchmark results

1GB

Quota: 33.85 (ms) (0.33 vCPU)
Comment: The CPU throttled time is significantly lower when automaxprocs are set.

$ export SIZE=1g; benchstat -alpha 0.34 benchresults/8.2.2/$SIZE.txt benchresults/automaxprocs/$SIZE.txt
name                 old time/op              new time/op              delta
1000Transactions-64               242ms ±57%               115ms ±20%     ~     (p=0.667 n=2+2)
AgentAll-64                       21.6s ± 0%               12.3s ± 1%  -43.08%  (p=0.333 n=2+2)
AgentGo-64                        4.80s ± 1%               2.63s ± 1%  -45.16%  (p=0.333 n=2+2)
AgentNodeJS-64                    2.43s ± 0%               1.51s ± 1%  -37.81%  (p=0.333 n=2+2)
AgentPython-64                    8.74s ± 1%               4.96s ± 0%  -43.19%  (p=0.333 n=2+2)
AgentRuby-64                      4.24s ± 1%               2.54s ± 3%  -39.97%  (p=0.333 n=2+2)
OTLPTraces-64                     270µs ± 0%               138µs ± 0%  -48.91%  (p=0.333 n=2+2)

name                 old events/sec           new events/sec           delta
1000Transactions-64               6.16k ±57%               9.06k ±20%     ~     (p=0.667 n=2+2)
AgentAll-64                         840 ± 0%                1472 ± 1%  +75.29%  (p=0.333 n=2+2)
AgentGo-64                        1.09k ± 1%               1.99k ± 0%  +81.74%  (p=0.333 n=2+2)
AgentNodeJS-64                      830 ± 0%                1334 ± 1%  +60.80%  (p=0.333 n=2+2)
AgentPython-64                      805 ± 1%                1417 ± 0%  +75.97%  (p=0.333 n=2+2)
AgentRuby-64                        888 ± 1%                1480 ± 3%  +66.55%  (p=0.333 n=2+2)
OTLPTraces-64                     3.71k ± 0%               7.26k ± 0%  +95.78%  (p=0.333 n=2+2)

name                 old alloc/op             new alloc/op             delta
1000Transactions-64              17.6MB ± 1%              17.9MB ± 0%   +1.56%  (p=0.333 n=2+2)
AgentAll-64                       751MB ± 0%               712MB ± 0%   -5.21%  (p=0.333 n=2+2)
AgentGo-64                        172MB ± 0%               167MB ± 0%   -3.03%  (p=0.333 n=2+2)
AgentNodeJS-64                   88.6MB ± 0%              86.8MB ± 0%   -2.01%  (p=0.333 n=2+2)
AgentPython-64                    313MB ± 0%               299MB ± 0%   -4.33%  (p=0.333 n=2+2)
AgentRuby-64                      152MB ± 0%               148MB ± 0%   -2.86%  (p=0.333 n=2+2)
OTLPTraces-64                    10.8kB ± 0%              10.8kB ± 0%   -0.32%  (p=0.333 n=2+2)

name                 old allocs/op            new allocs/op            delta
1000Transactions-64                131k ± 1%                129k ± 1%   -1.24%  (p=0.333 n=2+2)
AgentAll-64                       8.65M ± 0%               8.38M ± 0%   -3.12%  (p=0.333 n=2+2)
AgentGo-64                        1.82M ± 0%               1.77M ± 0%   -2.98%  (p=0.333 n=2+2)
AgentNodeJS-64                    1.07M ± 0%               1.06M ± 0%   -1.06%  (p=0.333 n=2+2)
AgentPython-64                    3.84M ± 0%               3.74M ± 0%   -2.44%  (p=0.333 n=2+2)
AgentRuby-64                      1.78M ± 0%               1.75M ± 0%   -1.72%  (p=0.333 n=2+2)
OTLPTraces-64                      79.5 ± 1%                76.0 ± 0%   -4.40%  (p=0.333 n=2+2)

name                 old error_responses/sec  new error_responses/sec  delta
OTLPTraces-64                      0.03 ±33%                0.01 ±31%  -66.19%  (p=0.333 n=2+2)

8.2.2

349,657 (ms) <- Max observed throttled CPU time without automaxprocs.

8.3.0-SNAPSHOT (+ticker change)

57,422 (ms) <- Max observed throttled CPU time with automaxprocs.

2GB

Quota: 50.77 (ms) (0.5 vCPU)
Comment: Again, the autogomaxprocs seems to boost the performance a lot by reducing the amount of
CPU time throttled.

$ export SIZE=2g; benchstat -alpha 0.34 benchresults/8.2.2/$SIZE.txt benchresults/automaxprocs/$SIZE.txt
name                 old time/op     new time/op     delta
1000Transactions-64      194ms ±50%      107ms ±31%      ~     (p=0.667 n=2+2)
AgentAll-64              16.5s ± 0%       7.1s ± 2%   -57.08%  (p=0.333 n=2+2)
AgentGo-64               3.71s ± 0%      1.94s ± 8%   -47.68%  (p=0.333 n=2+2)
AgentNodeJS-64           1.86s ± 2%      0.90s ± 3%   -51.53%  (p=0.333 n=2+2)
AgentPython-64           6.51s ± 2%      3.14s ± 1%   -51.81%  (p=0.333 n=2+2)
AgentRuby-64             3.30s ± 0%      1.70s ± 3%   -48.38%  (p=0.333 n=2+2)
OTLPTraces-64            217µs ± 1%       87µs ± 1%   -59.77%  (p=0.333 n=2+2)

name                 old events/sec  new events/sec  delta
1000Transactions-64      6.91k ±50%     10.34k ±31%      ~     (p=0.667 n=2+2)
AgentAll-64              1.10k ± 1%      2.56k ± 2%  +132.41%  (p=0.333 n=2+2)
AgentGo-64               1.41k ± 0%      2.72k ± 8%   +92.04%  (p=0.333 n=2+2)
AgentNodeJS-64           1.08k ± 2%      2.23k ± 3%  +106.23%  (p=0.333 n=2+2)
AgentPython-64           1.08k ± 2%      2.24k ± 1%  +107.36%  (p=0.333 n=2+2)
AgentRuby-64             1.14k ± 0%      2.21k ± 3%   +93.85%  (p=0.333 n=2+2)
OTLPTraces-64            4.62k ± 1%     11.48k ± 1%  +148.59%  (p=0.333 n=2+2)

name                 old alloc/op    new alloc/op    delta
1000Transactions-64     17.6MB ± 1%     17.6MB ± 1%      ~     (p=1.000 n=2+2)
AgentAll-64              738MB ± 0%      700MB ± 0%    -5.14%  (p=0.333 n=2+2)
AgentGo-64               171MB ± 0%      167MB ± 0%    -2.71%  (p=0.333 n=2+2)
AgentNodeJS-64          89.1MB ± 0%     85.6MB ± 0%    -3.82%  (p=0.333 n=2+2)
AgentPython-64           310MB ± 0%      296MB ± 0%    -4.57%  (p=0.333 n=2+2)
AgentRuby-64             153MB ± 0%      147MB ± 0%    -3.65%  (p=0.333 n=2+2)
OTLPTraces-64           10.8kB ± 0%     10.8kB ± 0%    -0.26%  (p=0.333 n=2+2)

name                 old allocs/op   new allocs/op   delta
1000Transactions-64       130k ± 1%       126k ± 1%    -3.21%  (p=0.333 n=2+2)
AgentAll-64              8.55M ± 0%      8.30M ± 0%    -2.89%  (p=0.333 n=2+2)
AgentGo-64               1.82M ± 0%      1.77M ± 0%    -2.75%  (p=0.333 n=2+2)
AgentNodeJS-64           1.08M ± 0%      1.06M ± 0%    -2.01%  (p=0.333 n=2+2)
AgentPython-64           3.82M ± 0%      3.72M ± 0%    -2.54%  (p=0.333 n=2+2)
AgentRuby-64             1.79M ± 0%      1.75M ± 0%    -2.19%  (p=0.333 n=2+2)
OTLPTraces-64             79.0 ± 0%       75.0 ± 0%    -5.06%  (p=0.333 n=2+2)

8.2.2

402,422 (ms) <- Max observed throtled CPU time.

8.3.0-SNAPSHOT (+ticker change)

210,886 (ms) <- Max observed throttled CPU time.

4GB

Quota: 67.69 (ms) (0.67 vCPU)
Comment: Again, the autogomaxprocs seems to boost the performance a lot by reducing the amount of
CPU time throttled.

$ export SIZE=4g; benchstat -alpha 0.34 benchresults/8.2.2/$SIZE.txt benchresults/automaxprocs/$SIZE.txt
name                 old time/op     new time/op     delta
1000Transactions-64      212ms ± 0%       85ms ±24%   -59.87%  (p=0.200 n=2+3)
AgentAll-64              9.46s ± 0%      4.66s ± 1%   -50.77%  (p=0.200 n=2+3)
AgentGo-64               2.09s ± 0%      1.03s ± 1%   -50.73%  (p=0.200 n=2+3)
AgentNodeJS-64           1.15s ± 0%      0.59s ± 1%   -48.60%  (p=0.200 n=2+3)
AgentPython-64           3.88s ± 1%      1.85s ± 7%   -52.25%  (p=0.200 n=2+3)
AgentRuby-64             1.59s ±19%      0.95s ± 1%   -40.22%  (p=0.200 n=2+3)
OTLPTraces-64            123µs ± 0%       55µs ± 0%   -55.37%  (p=0.200 n=2+3)

name                 old events/sec  new events/sec  delta
1000Transactions-64      4.72k ± 0%     12.15k ±28%  +157.61%  (p=0.200 n=2+3)
AgentAll-64              1.91k ± 0%      3.88k ± 1%  +102.81%  (p=0.200 n=2+3)
AgentGo-64               2.51k ± 0%      5.10k ± 1%  +102.92%  (p=0.200 n=2+3)
AgentNodeJS-64           1.76k ± 0%      3.42k ± 1%   +94.35%  (p=0.200 n=2+3)
AgentPython-64           1.81k ± 1%      3.80k ± 7%  +109.81%  (p=0.200 n=2+3)
AgentRuby-64             2.44k ±19%      3.94k ± 1%   +61.31%  (p=0.200 n=2+3)
OTLPTraces-64            8.16k ± 0%     18.28k ± 0%  +124.06%  (p=0.200 n=2+3)

name                 old alloc/op    new alloc/op    delta
1000Transactions-64     17.4MB ± 0%     17.5MB ± 1%    +0.56%  (p=0.200 n=2+3)
AgentAll-64              713MB ± 0%      694MB ± 0%    -2.56%  (p=0.200 n=2+3)
AgentGo-64               167MB ± 0%      165MB ± 0%    -1.17%  (p=0.200 n=2+3)
AgentNodeJS-64          87.7MB ± 0%     85.3MB ± 0%    -2.71%  (p=0.200 n=2+3)
AgentPython-64           302MB ± 0%      295MB ± 0%    -2.47%  (p=0.200 n=2+3)
AgentRuby-64             148MB ± 0%      145MB ± 0%    -1.78%  (p=0.200 n=2+3)
OTLPTraces-64           10.5kB ± 0%     10.7kB ± 0%    +1.68%  (p=0.200 n=2+3)

name                 old allocs/op   new allocs/op   delta
1000Transactions-64       129k ± 0%       125k ± 1%    -2.96%  (p=0.200 n=2+3)
AgentAll-64              8.38M ± 0%      8.27M ± 0%    -1.39%  (p=0.200 n=2+3)
AgentGo-64               1.79M ± 0%      1.76M ± 0%    -1.62%  (p=0.200 n=2+3)
AgentNodeJS-64           1.07M ± 0%      1.05M ± 0%    -1.54%  (p=0.200 n=2+3)
AgentPython-64           3.76M ± 0%      3.71M ± 0%    -1.38%  (p=0.200 n=2+3)
AgentRuby-64             1.76M ± 0%      1.74M ± 0%    -1.17%  (p=0.200 n=2+3)
OTLPTraces-64             77.0 ± 0%       75.0 ± 0%      ~     (p=0.600 n=2+3)

8.2.2

357,734 (ms) <- Max observed throtled CPU time.

8.3.0-SNAPSHOT (+ticker change)

113,050 (ms) <- Max observed throttled CPU time.

8GB

Quota: 135.39 (ms) (1.35 vCPU)
Comment: There was barely any throttled CPU time and oftentimes the load wasn't enough with 64 agents
so I re-run the benchmark with 384 for both commits and the results are much more consistent.

$ export SIZE=8g; benchstat -alpha 0.34 benchresults/8.2.2/$SIZE.txt benchresults/automaxprocs/$SIZE.txt
name                 old time/op     new time/op     delta
1000Transactions-64     67.0ms ± 1%     64.1ms ± 4%     ~     (p=0.400 n=2+3)
AgentAll-64              3.82s ± 2%      3.15s ±17%  -17.48%  (p=0.200 n=2+3)
AgentGo-64               849ms ± 2%      704ms ±14%  -17.07%  (p=0.200 n=2+3)
AgentNodeJS-64           470ms ± 0%      411ms ± 9%  -12.54%  (p=0.200 n=2+3)
AgentPython-64           1.59s ± 0%      1.32s ± 7%  -17.14%  (p=0.200 n=2+3)
AgentRuby-64             658ms ±17%      696ms ± 6%     ~     (p=1.000 n=2+3)
OTLPTraces-64           54.1µs ± 0%     44.3µs ±10%  -18.09%  (p=0.200 n=2+3)

name                 old events/sec  new events/sec  delta
1000Transactions-64      14.9k ± 1%      15.6k ± 4%     ~     (p=0.400 n=2+3)
AgentAll-64              4.73k ± 2%      5.82k ±18%  +23.11%  (p=0.200 n=2+3)
AgentGo-64               6.17k ± 2%      7.51k ±13%  +21.68%  (p=0.200 n=2+3)
AgentNodeJS-64           4.29k ± 0%      4.92k ± 9%  +14.85%  (p=0.200 n=2+3)
AgentPython-64           4.42k ± 0%      5.35k ± 7%  +20.96%  (p=0.200 n=2+3)
AgentRuby-64             5.87k ±17%      5.40k ± 6%     ~     (p=1.000 n=2+3)
OTLPTraces-64            18.5k ± 0%      22.7k ±10%  +22.68%  (p=0.200 n=2+3)

name                 old alloc/op    new alloc/op    delta
1000Transactions-64     17.3MB ± 0%     17.8MB ± 0%   +2.93%  (p=0.200 n=2+3)
AgentAll-64              692MB ± 0%      696MB ± 0%     ~     (p=0.400 n=2+3)
AgentGo-64               163MB ± 0%      166MB ± 0%   +1.98%  (p=0.200 n=2+3)
AgentNodeJS-64          85.1MB ± 0%     85.5MB ± 1%     ~     (p=0.800 n=2+3)
AgentPython-64           296MB ± 0%      297MB ± 0%     ~     (p=0.400 n=2+3)
AgentRuby-64             145MB ± 0%      146MB ± 1%     ~     (p=0.400 n=2+3)
OTLPTraces-64           10.4kB ± 0%     10.9kB ± 0%   +4.54%  (p=0.200 n=2+3)

name                 old allocs/op   new allocs/op   delta
1000Transactions-64       128k ± 0%       128k ± 0%   -0.24%  (p=0.200 n=2+3)
AgentAll-64              8.26M ± 0%      8.30M ± 0%   +0.46%  (p=0.200 n=2+3)
AgentGo-64               1.75M ± 0%      1.77M ± 0%   +0.94%  (p=0.200 n=2+3)
AgentNodeJS-64           1.05M ± 0%      1.06M ± 0%     ~     (p=0.800 n=2+3)
AgentPython-64           3.72M ± 0%      3.73M ± 0%   +0.49%  (p=0.200 n=2+3)
AgentRuby-64             1.73M ± 0%      1.74M ± 1%   +0.59%  (p=0.200 n=2+3)
OTLPTraces-64             76.0 ± 0%       77.0 ± 0%   +1.32%  (p=0.200 n=2+3)
$ export SIZE=8g-384; benchstat -alpha 0.21 benchresults/8.2.2/$SIZE.txt benchresults/automaxprocs/$SIZE.txt
name                  old time/op              new time/op              delta
1000Transactions-384               102ms ± 3%                59ms ± 3%   -42.12%  (p=0.100 n=3+3)
AgentAll-384                       3.43s ± 6%               2.34s ± 2%   -31.96%  (p=0.100 n=3+3)
AgentGo-384                        927ms ± 1%               451ms ± 7%   -51.36%  (p=0.100 n=3+3)
AgentNodeJS-384                    526ms ± 1%               314ms ± 3%   -40.35%  (p=0.100 n=3+3)
AgentPython-384                    1.54s ±15%               1.03s ± 7%   -32.97%  (p=0.100 n=3+3)
AgentRuby-384                      844ms ± 1%               520ms ±13%   -38.34%  (p=0.100 n=3+3)
OTLPTraces-384                    59.8µs ± 5%              44.4µs ±13%   -25.64%  (p=0.100 n=3+3)

name                  old events/sec           new events/sec           delta
1000Transactions-384               9.86k ± 3%              17.02k ± 3%   +72.66%  (p=0.100 n=3+3)
AgentAll-384                       5.26k ± 6%               7.72k ± 2%   +46.69%  (p=0.100 n=3+3)
AgentGo-384                        5.65k ± 1%              11.64k ± 7%  +106.05%  (p=0.100 n=3+3)
AgentNodeJS-384                    3.83k ± 1%               6.42k ± 3%   +67.72%  (p=0.100 n=3+3)
AgentPython-384                    4.61k ±14%               6.82k ± 7%   +47.94%  (p=0.100 n=3+3)
AgentRuby-384                      4.45k ± 1%               7.29k ±13%   +63.92%  (p=0.100 n=3+3)
OTLPTraces-384                     16.6k ± 4%               22.7k ±12%   +36.62%  (p=0.100 n=3+3)

name                  old alloc/op             new alloc/op             delta
1000Transactions-384              17.6MB ± 0%              18.1MB ± 0%    +3.10%  (p=0.100 n=3+3)
AgentAll-384                       695MB ± 0%               701MB ± 0%    +0.85%  (p=0.100 n=3+3)
AgentGo-384                        164MB ± 0%               167MB ± 0%    +2.04%  (p=0.100 n=3+3)
AgentNodeJS-384                   86.2MB ± 0%              85.9MB ± 0%    -0.34%  (p=0.100 n=3+3)
AgentPython-384                    296MB ± 0%               296MB ± 0%      ~     (p=0.700 n=3+3)
AgentRuby-384                      146MB ± 0%               146MB ± 0%      ~     (p=0.700 n=3+3)
OTLPTraces-384                    10.4kB ± 0%              10.9kB ± 0%    +5.48%  (p=0.100 n=3+3)

name                  old allocs/op            new allocs/op            delta
1000Transactions-384                130k ± 0%                131k ± 0%    +0.36%  (p=0.100 n=3+3)
AgentAll-384                       8.27M ± 0%               8.33M ± 0%    +0.69%  (p=0.100 n=3+3)
AgentGo-384                        1.76M ± 0%               1.78M ± 0%    +1.05%  (p=0.100 n=3+3)
AgentNodeJS-384                    1.06M ± 0%               1.06M ± 0%    -0.21%  (p=0.200 n=3+3)
AgentPython-384                    3.72M ± 0%               3.72M ± 0%      ~     (p=0.700 n=3+3)
AgentRuby-384                      1.74M ± 0%               1.75M ± 0%    +0.33%  (p=0.200 n=3+3)
OTLPTraces-384                      76.0 ± 0%                77.0 ± 0%    +1.32%  (p=0.100 n=3+3)

name                  old error_responses/sec  new error_responses/sec  delta
OTLPTraces-384                     1.44 ±147%                0.05 ±64%   -96.23%  (p=0.100 n=3+3)

8.2.2

341,896 (ms) <- Max observed throtled CPU time. (64 agents)
419,190 (ms) <- Max observed throtled CPU time. (384 agents)

8.3.0-SNAPSHOT (+ticker change)

14,856 (ms) <- Max observed throttled CPU time (64). Minimal
21,419 (ms) <- Max observed throttled CPU time (384). Minimal

16GB (2x 8GB)

The CPU quota is the same as for the 8GB for each instance, since APM Server has a hard limit on its size set to 8GB.
Comment: For this topology, the number of agents was increased to 196 to try and increase the CPU load
on each of the APM Servers + discount the fact that there are 2 APM Servers. I ran the benchmarks with both 64 and 196
agents to increase the load, but there wasn't any throttled CPU time in during the benchmark for the automaxprocs.

$ export SIZE=16g; benchstat -alpha 0.34 benchresults/8.2.2/$SIZE.txt benchresults/automaxprocs/$SIZE.txt
name                 old time/op     new time/op     delta
1000Transactions-64     68.9ms ± 4%     64.9ms ± 9%     ~     (p=0.400 n=3+3)
AgentAll-64              1.77s ± 9%      1.73s ±16%     ~     (p=0.700 n=3+3)
AgentGo-64               472ms ± 7%      417ms ± 4%  -11.77%  (p=0.100 n=3+3)
AgentNodeJS-64           291ms ±19%      211ms ±13%  -27.43%  (p=0.100 n=3+3)
AgentPython-64           1.02s ±12%      0.60s ± 6%  -41.18%  (p=0.100 n=3+3)
AgentRuby-64             508ms ± 8%      302ms ± 3%  -40.47%  (p=0.100 n=3+3)
OTLPTraces-64           44.9µs ± 4%     44.5µs ± 4%     ~     (p=1.000 n=3+3)

name                 old events/sec  new events/sec  delta
1000Transactions-64      14.5k ± 4%      15.5k ± 9%     ~     (p=0.400 n=3+3)
AgentAll-64              10.2k ± 9%      10.6k ±17%     ~     (p=0.700 n=3+3)
AgentGo-64               11.1k ± 7%      12.6k ± 4%  +13.19%  (p=0.100 n=3+3)
AgentNodeJS-64           7.07k ±21%      9.64k ±14%  +36.32%  (p=0.100 n=3+3)
AgentPython-64           6.96k ±13%     11.75k ± 5%  +68.87%  (p=0.100 n=3+3)
AgentRuby-64             7.42k ± 8%     12.42k ± 3%  +67.43%  (p=0.100 n=3+3)
OTLPTraces-64            22.3k ± 4%      22.5k ± 4%     ~     (p=1.000 n=3+3)

name                 old alloc/op    new alloc/op    delta
1000Transactions-64     17.4MB ± 0%     17.8MB ± 0%   +2.85%  (p=0.100 n=3+3)
AgentAll-64              696MB ± 0%      702MB ± 0%   +0.76%  (p=0.100 n=3+3)
AgentGo-64               164MB ± 0%      167MB ± 0%   +1.53%  (p=0.100 n=3+3)
AgentNodeJS-64          85.2MB ± 0%     86.4MB ± 0%   +1.37%  (p=0.100 n=3+3)
AgentPython-64           295MB ± 0%      297MB ± 0%   +0.46%  (p=0.100 n=3+3)
AgentRuby-64             145MB ± 0%      146MB ± 0%   +0.75%  (p=0.100 n=3+3)
OTLPTraces-64           10.5kB ± 0%     11.0kB ± 0%   +4.26%  (p=0.100 n=3+3)

name                 old allocs/op   new allocs/op   delta
1000Transactions-64       128k ± 0%       128k ± 1%     ~     (p=0.700 n=3+3)
AgentAll-64              8.26M ± 0%      8.31M ± 0%   +0.52%  (p=0.100 n=3+3)
AgentGo-64               1.76M ± 0%      1.77M ± 0%   +0.43%  (p=0.100 n=3+3)
AgentNodeJS-64           1.05M ± 0%      1.06M ± 0%   +0.69%  (p=0.100 n=3+3)
AgentPython-64           3.71M ± 0%      3.72M ± 0%   +0.35%  (p=0.100 n=3+3)
AgentRuby-64             1.73M ± 0%      1.74M ± 0%   +0.39%  (p=0.100 n=3+3)
OTLPTraces-64             77.0 ± 0%       77.0 ± 0%     ~     (all equal)
$ export SIZE=16g-196; benchstat -alpha 0.41 benchresults/8.2.2/$SIZE.txt benchresults/automaxprocs/$SIZE.txt
name                  old time/op              new time/op              delta
1000Transactions-196              74.4ms ± 4%              59.7ms ± 2%  -19.73%  (p=0.100 n=3+3)
AgentAll-196                       2.05s ±10%               1.68s ± 7%  -18.20%  (p=0.100 n=3+3)
AgentGo-196                        500ms ± 7%               393ms ± 3%  -21.36%  (p=0.100 n=3+3)
AgentNodeJS-196                    235ms ± 6%               238ms ± 5%     ~     (p=0.700 n=3+3)
AgentPython-196                    754ms ±10%               637ms ±13%  -15.50%  (p=0.200 n=3+3)
AgentRuby-196                      379ms ± 4%               351ms ± 8%   -7.37%  (p=0.400 n=3+3)
OTLPTraces-196                    50.3µs ±11%              47.7µs ±11%     ~     (p=0.700 n=3+3)

name                  old events/sec           new events/sec           delta
1000Transactions-196               13.5k ± 4%               16.8k ± 2%  +24.48%  (p=0.100 n=3+3)
AgentAll-196                       8.87k ±11%              10.80k ± 7%  +21.74%  (p=0.100 n=3+3)
AgentGo-196                        10.4k ± 8%               13.3k ± 3%  +27.58%  (p=0.100 n=3+3)
AgentNodeJS-196                    8.58k ± 6%               8.46k ± 5%     ~     (p=0.700 n=3+3)
AgentPython-196                    9.37k ±10%              11.14k ±13%  +18.84%  (p=0.200 n=3+3)
AgentRuby-196                      9.93k ± 4%              10.76k ± 9%   +8.36%  (p=0.400 n=3+3)
OTLPTraces-196                     19.9k ±13%               20.9k ±13%     ~     (p=0.700 n=3+3)

name                  old alloc/op             new alloc/op             delta
1000Transactions-196              17.5MB ± 0%              17.9MB ± 0%   +2.27%  (p=0.100 n=3+3)
AgentAll-196                       700MB ± 0%               705MB ± 0%   +0.75%  (p=0.100 n=3+3)
AgentGo-196                        164MB ± 1%               167MB ± 0%   +2.41%  (p=0.100 n=3+3)
AgentNodeJS-196                   86.3MB ± 0%              88.3MB ± 0%   +2.31%  (p=0.100 n=3+3)
AgentPython-196                    297MB ± 1%               302MB ± 1%   +1.58%  (p=0.100 n=3+3)
AgentRuby-196                      147MB ± 0%               149MB ± 0%   +1.61%  (p=0.100 n=3+3)
OTLPTraces-196                    10.5kB ± 0%              11.0kB ± 1%   +4.64%  (p=0.100 n=3+3)

name                  old allocs/op            new allocs/op            delta
1000Transactions-196                129k ± 0%                128k ± 0%   -1.26%  (p=0.100 n=3+3)
AgentAll-196                       8.28M ± 0%               8.30M ± 0%   +0.26%  (p=0.100 n=3+3)
AgentGo-196                        1.75M ± 1%               1.77M ± 0%   +0.85%  (p=0.100 n=3+3)
AgentNodeJS-196                    1.06M ± 0%               1.06M ± 0%   +0.80%  (p=0.100 n=3+3)
AgentPython-196                    3.72M ± 0%               3.73M ± 0%   +0.45%  (p=0.200 n=3+3)
AgentRuby-196                      1.74M ± 0%               1.75M ± 0%   +0.60%  (p=0.100 n=3+3)
OTLPTraces-196                      76.7 ± 1%                77.0 ± 0%     ~     (p=1.000 n=3+3)

name                  old error_responses/sec  new error_responses/sec  delta
OTLPTraces-196                     0.56 ±124%               0.58 ±115%     ~     (p=1.000 n=3+3)

8.2.2

254,214 (ms) <- Max observed throtled CPU time. APM 1
132,636 (ms) <- Max observed throtled CPU time. APM 2

8.3.0-SNAPSHOT (+ticker change)

0 (ms) <- No observed throttled CPU time.

@marclop marclop marked this pull request as ready for review June 8, 2022 08:40
Copy link
Contributor

@lahsivjar lahsivjar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have any such case currently but something to note for future, a 30 seconds refresh interval for cases with dynamic boosting and using vCPUs > 1 might lead to unexpected throttling (when the vCPUs are reset after the boost but GOMAXPROCS is still using the higher value, 30 seconds delay would mean 300 periods assuming CFS period = 100ms).

@marclop marclop enabled auto-merge (squash) June 8, 2022 10:20
@apmmachine
Copy link
Collaborator

❕ Build Aborted

There is a new build on-going so the previous on-going builds have been aborted.

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Start Time: 2022-06-08T08:40:39.555+0000

  • Duration: 103 min 44 sec

Test stats 🧪

Test Results
Failed 0
Passed 2742
Skipped 9
Total 2751

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /hey-apm : Run the hey-apm benchmark.

  • /package : Generate and publish the docker images.

  • /test windows : Build & tests on Windows.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@apmmachine
Copy link
Collaborator

apmmachine commented Jun 8, 2022

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 100.0% (42/42) 💚
Files 91.96% (183/199) 👍
Classes 93.435% (427/457) 👍
Methods 89.3% (1085/1215) 👍 0.1
Lines 76.789% (13187/17173) 👍 0.008
Conditionals 100.0% (0/0) 💚

@apmmachine
Copy link
Collaborator

apmmachine commented Jun 8, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-06-08T20:35:59.289+0000

  • Duration: 31 min 40 sec

Test stats 🧪

Test Results
Failed 0
Passed 4045
Skipped 13
Total 4058

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /hey-apm : Run the hey-apm benchmark.

  • /package : Generate and publish the docker images.

  • /test windows : Build & tests on Windows.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@marclop
Copy link
Contributor Author

marclop commented Jun 8, 2022

/test

Signed-off-by: Marc Lopez Rubio <marc5.12@outlook.com>
Signed-off-by: Marc Lopez Rubio <marc5.12@outlook.com>
@marclop marclop merged commit c61dd3a into elastic:main Jun 8, 2022
@marclop marclop deleted the f/adjust-maxprocs-with-30s-ticker branch June 8, 2022 21:06
mergify bot pushed a commit that referenced this pull request Jun 8, 2022
Adds a new goroutine which tries to refresh the gomaxprocs setting every
`30s`. This accounts for cases where the CFS quotas may be refreshed
without the APM Server process being restarted (like in ESS / ECE).

Signed-off-by: Marc Lopez Rubio <marc5.12@outlook.com>
(cherry picked from commit c61dd3a)
axw pushed a commit that referenced this pull request Jun 9, 2022
Adds a new goroutine which tries to refresh the gomaxprocs setting every
`30s`. This accounts for cases where the CFS quotas may be refreshed
without the APM Server process being restarted (like in ESS / ECE).

Signed-off-by: Marc Lopez Rubio <marc5.12@outlook.com>
(cherry picked from commit c61dd3a)

Co-authored-by: Marc Lopez Rubio <marc5.12@outlook.com>
@stuartnelson3 stuartnelson3 self-assigned this Jun 13, 2022
@stuartnelson3
Copy link
Contributor

confirmed:

terminal:

~/w/apm-server (8.3=)
# docker update --cpus=1 22ae530c2295
22ae530c2295
~/w/apm-server (8.3=)
# docker update --cpus=3 22ae530c2295
22ae530c2295

observability log ui

08:51:09.080
elastic_agent.apm_server
[elastic_agent.apm_server][info] maxprocs: Updating GOMAXPROCS=1: determined from CPU quota
08:51:39.080
elastic_agent.apm_server
[elastic_agent.apm_server][info] maxprocs: Updating GOMAXPROCS=3: determined from CPU quota

@axw
Copy link
Member

axw commented Jun 15, 2022

I've also tested this in QA cloud environment and it seems to be working, though I'm a bit unsure about the specific quotas being set.

I started up a 1GB Integrations Server: it started out with GOMAXPROCS=1, and then switched to GOMAXPROCS=8 after a short time. I then changed the server to 8GB, and it started with GOMAXPROCS=4 and then switched to GOMAXPROCS=8. It does line up with the reported quota/period, though.

@marclop
Copy link
Contributor Author

marclop commented Jun 15, 2022

@axw thanks for testing it in QA too. The QA quotas will be bigger since the VMs where the APM Server containers are run will be smaller resulting in a higher share of the CPU cores for each of workload. The boost to 8 would indicate that CPU credits are available for the container to use so it seems to be working as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants