-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crypto/md5: remove assembly cores #49248
Comments
Here's a data point for the M1. You get a ~30% decrease, go binary built with 1.17.2. optimized:
generic:
|
It's not currently within 80% on this machine either.
|
I poked at this a bit as a palate cleanser after working on package net/syscall. There are several things going wrong here. The most important by is that the compiler takes Also, the register allocator puts too many things in registers in the outer loop, so by the time it gets to the heavy numeric lifting, it doesn't have enough to work with. This is fixable by massaging the code a bit in some (admittedly weird) ways. See https://gist.github.com/josharian/42e66bf022f32da3da0a9b1bdf0a974b. Without filing the other issues mentioned here, this rewrite itself doesn't improve performance. Also, the CSE pass combines a bunch of the xNN memory loads. These then require registers, and the register allocator decides to spill them rather than reload from memory. Filed #49332. I'm done investigating this for now. The next step is to fix #49331, or find some clever way to write the code to force the compiler to generate more balanced computation trees. |
Since this issue was filed, two new assembly implementations have been added (loong64 and riscv64). |
crypto/md5 has a series of assembly cores, all looking pretty much the same and very similar to the generic code.
The generic code looks like something that should intrinsify well, and the assembly doesn't use any special platform instructions as far as I can tell.
https://github.com/golang/go/blob/2ebe77a2fda1ee9ff6fd9a3e08933ad1ebaea039/src/crypto/md5/md5block.go
MD5 is broken and obsolete, so ideally we'd reduce the complexity budget we spend on it over time.
If we can get the generic code within 80% of the performance of the assembly, I'd be happy to remove the cores.
The text was updated successfully, but these errors were encountered: