Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Improve admission controller performance under high loads #8260

Closed
9 tasks done
JimBugwadia opened this issue Sep 5, 2023 · 1 comment
Closed
9 tasks done
Assignees
Labels
enhancement New feature or request performance

Comments

@JimBugwadia
Copy link
Member

JimBugwadia commented Sep 5, 2023

Kyverno Version

1.10.2

Description

Fixes and Changes

Here are the optimizations made (I will create separate issues for each with details):

Details

Kyverno 1.10.3 shows high latencies when handling a large number of requests. The expected behavior is that Kyverno adds a maximum of a few seconds of overhead, and scales well under load.

I performed tests on a large machine, with 1-6 replicas, but am also able to reproduce and troubleshoot the issues on my local system with a single replica.

The setup was as follows:

  1. Configure all pod security policies in Enforce mode with the following match clause and removal of the precondition:
  - match:
      all:
      - resources:
          kinds:
          - Pod
          selector:
            matchLabels:
              app: k6-test
          operations:
          - CREATE
          - UPDATE
  1. Scale down all controllers except the Kyverno admission controller.

  2. Use the flags:

--admissionReports=false
--omit-events=PolicyViolation,PolicyApplied,PolicyError,PolicySkipped
  1. Set memory and CPU as follows (this is sized for local tests):
        resources:
          limits:
            cpu: "2"
            memory: 2Gi
          requests:
            cpu: "1"
            memory: 1Gi

I ran a modified version of the load test at: https://github.com/kyverno/load-testing/tree/main/k6. The test creates pods with the label app: k6-test and expects a 400 response which corresponds to a blocked requests.

Here are the numbers:

  1. 1000 requests across 100 concurrent connections (virtual users): avg=2.77s; 99.90% success
     ✗ verify response code of POST is 400
      ↳  99% — ✓ 999 / ✗ 1

     █ teardown

     checks.........................: 99.90% ✓ 999       ✗ 1    
     data_received..................: 2.3 MB 80 kB/s
     data_sent......................: 380 kB 13 kB/s
     http_req_blocked...............: avg=21.1ms   min=1.29µs   med=2.75µs   max=378.49ms p(90)=9.21ms   p(95)=200.16ms
     http_req_connecting............: avg=8.91ms   min=0s       med=0s       max=295.54ms p(90)=392.17µs p(95)=85.6ms  
     http_req_duration..............: avg=2.77s    min=106.31ms med=2.47s    max=10s      p(90)=4.79s    p(95)=5.48s   
       { expected_response:true }...: avg=2.76s    min=106.31ms med=2.47s    max=9.37s    p(90)=4.78s    p(95)=5.47s   
     ...
  1. 2000 requests across 250 concurrent connections: avg:=6.85s; 72.59% success
     ✗ verify response code of POST is 400
      ↳  72% — ✓ 1452 / ✗ 548

     █ teardown

     checks.........................: 72.59% ✓ 1452      ✗ 548  
     data_received..................: 3.9 MB 68 kB/s
     data_sent......................: 822 kB 14 kB/s
     http_req_blocked...............: avg=39.57ms  min=1.2µs    med=2.79µs   max=710.22ms p(90)=201.75ms p(95)=317.29ms
     http_req_connecting............: avg=16.12ms  min=0s       med=0s       max=521.14ms p(90)=81.01ms  p(95)=122.15ms
     http_req_duration..............: avg=6.85s    min=209.3ms  med=7.09s    max=12.82s   p(90)=10s      p(95)=10s     
       { expected_response:true }...: avg=5.62s    min=209.3ms  med=5.58s    max=12.53s   p(90)=9s       p(95)=9.55s   
    

I made a number of optimizations (detailed in issue list above). Here are the results with the optimized image:

  1. 1000 requests across 100 concurrent connections (virtual users): avg=247.29ms; 100% pass
     ✓ verify response code of POST is 400

     █ teardown

     checks.........................: 100.00% ✓ 1000       ✗ 0    
     data_received..................: 2.3 MB  776 kB/s
     data_sent......................: 380 kB  129 kB/s
     http_req_blocked...............: avg=20.03ms  min=1.25µs  med=1.87µs   max=405.74ms p(90)=850.56µs p(95)=189.99ms
     http_req_connecting............: avg=5.82ms   min=0s      med=0s       max=198.52ms p(90)=8.45µs   p(95)=72.28ms 
     http_req_duration..............: avg=247.29ms min=3.56ms  med=199.69ms max=1.32s    p(90)=519.52ms p(95)=728.86ms
       { expected_response:true }...: avg=247.29ms min=3.56ms  med=199.69ms max=1.32s    p(90)=519.52ms p(95)=728.86ms

    ...
  1. 2000 requests across 250 concurrent connections: avg=671.02ms; 100% pass
     ✓ verify response code of POST is 400

     █ teardown

     checks.........................: 100.00% ✓ 2000       ✗ 0    
     data_received..................: 4.7 MB  743 kB/s
     data_sent......................: 824 kB  131 kB/s
     http_req_blocked...............: avg=54.24ms  min=1.04µs  med=1.75µs   max=913.97ms p(90)=284.66ms p(95)=493.95ms
     http_req_connecting............: avg=22ms     min=0s      med=0s       max=626.72ms p(90)=73.06ms  p(95)=195.17ms
     http_req_duration..............: avg=671.02ms min=17.51ms med=490.15ms max=3.3s     p(90)=1.51s    p(95)=1.84s   
       { expected_response:true }...: avg=671.02ms min=17.51ms med=490.15ms max=3.3s     p(90)=1.51s    p(95)=1.84s   
       
       ...

Slack discussion

No response

Troubleshooting

  • I have read and followed the documentation AND the troubleshooting guide.
  • I have searched other issues in this repository and mine is not recorded.
@JimBugwadia JimBugwadia added enhancement New feature or request triage Default label assigned to all new issues indicating label curation is needed to fully organize. labels Sep 5, 2023
@JimBugwadia JimBugwadia self-assigned this Sep 5, 2023
@JimBugwadia JimBugwadia added performance and removed triage Default label assigned to all new issues indicating label curation is needed to fully organize. labels Sep 5, 2023
@JimBugwadia JimBugwadia removed their assignment Dec 18, 2023
@JimBugwadia
Copy link
Member Author

Closing as all tasks have been completed!

Nice work, @KhaledEmaraDev!

I've created this issue for docs updates:

kyverno/website#1145

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance
Projects
Development

No branches or pull requests

2 participants