Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace the default wasm garbage collector with nottinygc #25

Merged
merged 1 commit into from
Jun 6, 2023

Conversation

codefromthecrypt
Copy link
Contributor

@codefromthecrypt codefromthecrypt commented May 31, 2023

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

After profiling in #24, I noticed over half the time in wasm was spent in garbage collection. This could have been guessed, but it was nice to verify it with wzprof.

This uses an optimized garbage collector, nottinygc, dramatically improving performance.

Which issue(s) this PR fixes:

NONE

Special notes for your reviewer:

https://github.com/wasilibs/nottinygc was invented by the excellent @anuraaga

Does this PR introduce a user-facing change?

NONE

What are the benchmark results of this change?

same code takes half the time!

goos: darwin
goarch: arm64
pkg: sigs.k8s.io/kube-scheduler-wasm-extension/internal/e2e
                                            │  before.txt   │             after.txt              │
                                            │    sec/op     │   sec/op     vs base               │
PluginFilter/noop/params:_small-12             148.6n ±  2%   143.8n ± 1%   -3.20% (p=0.002 n=6)
PluginFilter/noop/params:_real-12              146.5n ±  0%   143.2n ± 0%   -2.25% (p=0.002 n=6)
PluginFilter/filter-simple/params:_small-12   14.436µ ±  5%   7.234µ ± 2%  -49.89% (p=0.002 n=6)
PluginFilter/filter-simple/params:_real-12     232.5µ ± 21%   121.1µ ± 1%  -47.91% (p=0.002 n=6)
geomean                                        2.923µ         2.061µ       -29.50%

@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 31, 2023
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label May 31, 2023
guest/filter.go Outdated Show resolved Hide resolved
@codefromthecrypt
Copy link
Contributor Author

codefromthecrypt commented May 31, 2023

I should confess one downside, which is that the wasm is significantly larger (+110KB) However, the performance was quite bad and I think this is an easy tradeoff.

@codefromthecrypt
Copy link
Contributor Author

@anuraaga had a good idea about a start panic instead. maybe hold off and we can use a version that has an instructive error message even if still SDK users should copy/pasta the args to tinygo (or use a supplied script to invoke it)

@anuraaga
Copy link
Contributor

anuraaga commented Jun 1, 2023

Updated nottinygc to print a grokkable message when the TinyGo flags haven't been set

https://github.com/wasilibs/nottinygc/releases/tag/v0.3.0

@codefromthecrypt
Copy link
Contributor Author

Thanks @anuraaga I updated like so

$ for I in `find . -name go.mod -exec egrep -q nottinygc {} \; -print`; do (cd `dirname $I`; go get  github.com/wasilibs/nottinygc@v0.3.0; go mod tidy); done

@codefromthecrypt
Copy link
Contributor Author

so here's the current error if the plugin wasn't built with the flags:

    plugin_perf_test.go:38: failed to create plugin: wasm: error instantiating guest: module[filter-simple-1] function[_start] failed: wasm error: unreachable
        wasm stack trace:
        	.runtime._panic(i32,i32)
        	._start.command_export()

@anuraaga
Copy link
Contributor

anuraaga commented Jun 1, 2023

@codefromthecrypt Oh, I wonder if printing the error message of a panic would require stderr/stdout to be available. That might make the approach not very applicable...

@codefromthecrypt
Copy link
Contributor Author

would require stderr/stdout to be available

this might be ok. lemme wire up and see

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 1, 2023
@codefromthecrypt
Copy link
Contributor Author

@anuraaga I changed to prioritize stdout/stderr over a generic stack trace. If looks good I'll try to add a test of some kind:

failed to create plugin: wasm: error instantiating guest: panic: nottinygc requires passing -gc=custom and -tags=custommalloc to TinyGo when compiling.

@sanposhiho
Copy link
Member

@codefromthecrypt

I should confess one downside, which is that the wasm is significantly larger (+110KB) However, the performance was quite bad and I think this is an easy tradeoff.
#25 (comment)

In the server-side wasm, what gonna be the problem/disadvantage of a bigger wasm size? The performance to load the wasm in the host side?

@codefromthecrypt
Copy link
Contributor Author

In the server-side wasm, what gonna be the problem/disadvantage of a bigger wasm size? The performance to load the wasm in the host side?

One reason is I think some people move wasm around with config maps? cc @salaboy about any size constraints there, in case folks here aren't aware of the nuance. Basically, we may be 110kb closer to that limit. Once past that limit, we need to use http refs like most proxy-wasm are moved around by istio (OCI is another way, such as what trivy does).

@salaboy
Copy link

salaboy commented Jun 3, 2023 via email

@codefromthecrypt
Copy link
Contributor Author

I'll go ahead and add the unit test here and also add an issue for docs on how to handle wasm distribution

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 4, 2023
@codefromthecrypt
Copy link
Contributor Author

ok I backfilled tests for panic messages PTAL!

@@ -23,6 +23,7 @@ import (
_ "k8s.io/component-base/metrics/prometheus/clientgo" // for rest client metric registration
_ "k8s.io/component-base/metrics/prometheus/version" // for version metric registration
"k8s.io/kubernetes/cmd/kube-scheduler/app"
wasm "sigs.k8s.io/kube-scheduler-wasm-extension/scheduler/plugin"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this broke earlier. After this PR is in I will setup CI to test all the src directories.

node: testdata.NodeSmall,
expectedCode: framework.Error,
expectedMessage: `wasm: filter error: panic!
wasm error: unreachable
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used manual webassembly (%.wat) because it is easier to reach error conditions and has stable stack traces. The wasm binaries are tiny so we can create a bunch of these if we need to.

if results, err := g.filterFn.Call(ctx); err != nil {
return framework.AsStatus(err)
return framework.AsStatus(decorateError(g.out, "filter", err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this might literally save some prototypers' lives

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to do the same in http-wasm, I'm really glad you reminded me where panics go to die :)

@codefromthecrypt
Copy link
Contributor Author

rebased

@codefromthecrypt codefromthecrypt force-pushed the nottinygc branch 3 times, most recently from 45fb1c1 to 6fe0dc9 Compare June 4, 2023 11:03
@codefromthecrypt
Copy link
Contributor Author

added a make testdata target which makes all the test wasm

@kerthcet
Copy link
Contributor

kerthcet commented Jun 5, 2023

/lgtm
/approve
/hold

Free free to unhold this.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 5, 2023
@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jun 5, 2023
@codefromthecrypt
Copy link
Contributor Author

I will add a RATIONALE section then ping @sanposhiho for final review

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 5, 2023
@codefromthecrypt
Copy link
Contributor Author

PTAL I added the RATIONALE.md cc also @anuraaga to keep me honest if I mentioned pros/cons correctly!

guest/RATIONALE.md Outdated Show resolved Hide resolved
guest/RATIONALE.md Outdated Show resolved Hide resolved
guest/RATIONALE.md Show resolved Hide resolved
Copy link
Contributor

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! RATIONALE.md is as usual a great read

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: anuraaga, codefromthecrypt, kerthcet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

After profiling, I noticed over half the time in wasm was spent in
garbage collection. This could have been guessed, but it was nice to
verify it with wzprof.

This uses an optimized garbage collector, nottinygc, dramatically
improving performance.

Signed-off-by: Adrian Cole <adrian@tetrate.io>
@codefromthecrypt
Copy link
Contributor Author

squashed

@kerthcet
Copy link
Contributor

kerthcet commented Jun 6, 2023

/hold cancel
The RATIONALE.md is excellent.

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 6, 2023
@kerthcet
Copy link
Contributor

kerthcet commented Jun 6, 2023

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 6, 2023
@k8s-ci-robot k8s-ci-robot merged commit d99775d into kubernetes-sigs:main Jun 6, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants