Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallelize everything #286

Merged
merged 7 commits into from
May 29, 2024
Merged

Conversation

flavio
Copy link
Member

@flavio flavio commented May 29, 2024

Change how the audit process is done, allowing more parallilazion if wanted by the user.

This PR adds these new flags (I'm open to change their names, I'm not super fond of them):

  • --parallel-namespaces: how many Namespaces can be audited in parallel. Default is 1.
  • --parallel-resources: how many resources can be audited in parallel. Default is 50.
  • --parallel-policies: when auditing a specic resource, how many policies can be invoked in parallel. Default is 10.

By tuning these values, the user has direct control over how many HTTP requests are sent to the policy server(s) at the same time.
For example, --parallel-namespaces 2 --parallel-resources 100 --parallel-policies 10 translates to a maximum of 2 * 100 * 10 = 2000 HTTP requests at the same time.

Benchmarks

Given the following scenario:

  • 30k RoleBinding objects, all defined inside of the default namespace
  • 1 Policy Server, with 3 replicas
  • 4 policies targeting RoleBinding objects

No parallelization

audit-scanner --disable-store --parallel-namespaces 1 --parallel-policies 1 --parallel-resources 1

Leads to the following results:

User time (seconds): 77.85
System time (seconds): 43.11
Percent of CPU this job got: 29%
Elapsed (wall clock) time (h:mm:ss or m:ss): 6m 50.45s
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 209872
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 10814
Voluntary context switches: 2569133
Involuntary context switches: 983543
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

Parallelization

./audit-scanner --disable-store --parallel-namespaces 2 --parallel-policies 10 --parallel-resources 100

Leads to the following results:

User time (seconds): 150.32
System time (seconds): 33.26
Percent of CPU this job got: 60%
Elapsed (wall clock) time (h:mm:ss or m:ss): 5m 3.90s
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 392304
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 157699
Voluntary context switches: 464957
Involuntary context switches: 785465
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

@flavio flavio requested a review from a team as a code owner May 29, 2024 08:23
@flavio flavio self-assigned this May 29, 2024
Copy link

codecov bot commented May 29, 2024

Codecov Report

Attention: Patch coverage is 66.37931% with 39 lines in your changes are missing coverage. Please review.

Project coverage is 70.37%. Comparing base (b853001) to head (4ab0bea).

Files Patch % Lines
cmd/root.go 0.00% 21 Missing ⚠️
internal/scanner/scanner.go 80.43% 12 Missing and 6 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #286      +/-   ##
==========================================
- Coverage   70.90%   70.37%   -0.53%     
==========================================
  Files          11       11              
  Lines         976     1043      +67     
==========================================
+ Hits          692      734      +42     
- Misses        213      235      +22     
- Partials       71       74       +3     
Flag Coverage Δ
unit-tests 70.37% <66.37%> (-0.53%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@jvanz jvanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have mixing feelings about this 3 cli flags. Knowing the code, I understand and they make sense. But thinking in the user UX I think this can be improved. Maybe we can have a simpler CLI flags something like max-parallel-request and worker-pool-size. Where the first controls the number of request going out of the scanner. While the second, controls the number of internal go routines doing the scanner logic. Or maybe have only the max-parallel-request and making everything in the scanner running in parallel, because the scanner logic is not that complex and the most expensive operation are the requests.

But I'm not sure if we want to do that now (if we decided to do it in the first place) or in a future iteration (v1.14.0)... Besides that, I'm fine with the code if we want to keep the current approach from this PR.

Allow the user to set the number of parallel resources being audited via
a cli-flag.

Signed-off-by: Flavio Castelli <fcastelli@suse.com>
Allow multiple namespaces to be audited at the same time. Expose that
via a cli flag.

Signed-off-by: Flavio Castelli <fcastelli@suse.com>
Given a Kubernetes resource to audit, allow multiple policies to be
invoked at the same time.

Allow the user to configure the amount of parallelization with a cli
flag.

Signed-off-by: Flavio Castelli <fcastelli@suse.com>
Allow user to configure the amount of resources fetched from the API
server while paginating.

Signed-off-by: Flavio Castelli <fcastelli@suse.com>
Provide better default values.

Signed-off-by: Flavio Castelli <fcastelli@suse.com>
Copy link
Contributor

@fabriziosestito fabriziosestito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Add new CLI flags about parellization, expand usage section.

Signed-off-by: Flavio Castelli <fcastelli@suse.com>
Reflect the latest changes done

Signed-off-by: Flavio Castelli <fcastelli@suse.com>
@flavio flavio merged commit de424bb into kubewarden:main May 29, 2024
5 of 7 checks passed
@flavio flavio deleted the parallelize-everything branch May 29, 2024 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants