-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crypto/rsa: Some severe performance regressions in Go 1.20 #59442
Comments
CC @golang/security |
CC @golang/runtime |
On the amd64 platforms I was able to benchmark the slowdown of RSA-2048 was 20% as noted in the release notes. Looks like AMD EPYC is suffering more than the Intel CPUs I tested. We have work planned as part of #57752 that should bring the performance of Go 1.21 almost in line with Go 1.19 (if not better). |
@FiloSottile Should we mark #57752 as a 1.21 release blocker? |
Do we know what's different about the AMD EPYCs that accounts for 60% slowdown instead of 20% slowdown? |
Honestly, I don't know what AMD EPYCs are doing differently, but https://go.dev/cl/471259 is switching the hot loop back to the same assembly that math/big was using, so I am reasonably confident it will revert the slowdown. Filed #59463 specifically to track that change, added it to #57752. Let's keep this open to benchmark the new code on EPYCs? |
Can confirm https://go-review.googlesource.com/c/go/+/471259/ reduces part of the overhead from switch to bigmod.
Compared with Go 1.19:
|
Is there any chance that this change to |
@evanj We'll have to see a completed fix, and give it plenty of time for testing, before we are able to consider a backport. While performance is important, it would of course be a bad idea to backport a risky fix to security code. |
Looks like |
Thank you for the benchmark @sywhang. Still surprising there would be a +18% difference when the code spend most of its time in the same assembly core now, and on Intel I see +5%. I'll rent an AMD EPYC to get some profiles. Indeed, this is a bigger change than can be backported, at least quickly. In the future, this is the kind of feedback that is very useful during the release candidate phase: we had a rollback ready in case issues like this arose before Go 1.20 was released. |
@FiloSottile I'd be happy to share the profiles with you, if that helps. Certainly, I agree with you this regression would've been much better to catch during rc phase. The issue though is that not many service owners want to deploy rc builds to prod or or even staging. My team doesn't own any services that have meaningful traffic, or even one that exercises the code path. We only heard about the regression from service owners after we upgraded the whole company to the new version after several steps of verification. |
You make a great point that it would be better if we could test release candidates in production. I would like to do this, but prioritizing that work has been hard, as you might be able to guess by the fact that we are just migrating to 1.20 now, after 1.20.3 has been released! FWIW I'm not sure our slowdown is specifically EPYC related. I'll try to extract a reproduction case from our workload and share some details once we have more fully investigated it. |
Ugh well we tracked down one of our problems: Some code was inadvertently calling |
Oh! That's a very interesting observation @evanj. I wonder if we should implement a custom Would you mind opening a separate issue about marhsal/unmarshal round-tripping? |
On the general topic of marshal/unmarshal changes, we had a long conversation to decide the policy on #10275, specifically #10275 (comment).
I would amend that to say that if we're already generating a JSON struct, then we can change it in backwards compatible ways, as long as it really is backwards compatible for both new-marshal/old-unmarshal and old-marshal/new-unmarshal. It sounds like we broke the rule to some extent with the RSA changes, and we should look into some kind of fix, perhaps calling Precompute during Unmarshal. |
I created #59695 for the |
I reproduced a 70% slowdown on 3rd gen EPYCs, and although I haven't diagnosed the cause, the new strategy and assembly in CL 471259 regains all that and then some.
|
Change https://go.dev/cl/471259 mentions this issue: |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes (only on latest release).
What operating system and processor architecture are you using (
go env
)?linux/amd64
go env
OutputWhat did you do?
Some services at Uber started seeing severe performance degradation after upgrading to Go 1.20.
Profiles revealed crypto/rsa related stacks showing up everywhere.
Here is a repro benchmark that shows around ~60% regression compared to Go 1.19:
What did you expect to see?
I am aware of the new crypto/rsa changes that were introduced in Go 1.20 that involves removing big.Int to bigmod changes, which could be related to the regression. (#56980).
This benchmark was created based on profile from a single service that reported this issue internally, and there may be more paths in crypto/rsa that has similar issues. Will update as we find more such paths if we find any.
What did you see instead?
The text was updated successfully, but these errors were encountered: