New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LoongArch64 assembly pack: Fix ChaCha20 ABI breakage #22817
Conversation
I'm not sure how to properly add a test for psABI conformance... |
574dedb
to
bf0eadc
Compare
The [LP64D ABI][1] requires the floating-point registers f24-f31 (aka fs0-fs7) callee-saved. The low 64 bits of a LSX/LASX vector register aliases with the corresponding FPR, so we must save and restore the callee-saved FPR when we writes into the corresponding vector register. This ABI breakage can be easily demonstrated by injecting the use of a saved FPR into the test in bio_enc_test.c: static int test_bio_enc_chacha20(int idx) { register double fs7 asm("f31") = 114.514; asm("#optimize barrier":"+f"(fs7)); return do_test_bio_cipher(EVP_chacha20(), idx) && fs7 == 114.514; } So fix it. To make the logic simpler, jump into the scalar implementation earlier when LSX and LASX are not enumerated in AT_HWCAP, or the input is too short. [1]: https://github.com/loongson/la-abi-specs/blob/v2.20/lapcs.adoc#floating-point-registers
bf0eadc
to
3fa5d56
Compare
#22056 discusses this a bit. Given the number of ABI issues that have come up recently, OpenSSL investing in this area seems worthwhile. The ABI testing framework has been incredibly useful for us in BoringSSL. |
@paulidale Could you help reviewing this and giving another approve? Sorry for being impatient and impolite but this is an important bug fix (TM). |
@openssl/committers Ping for a second approve. |
This pull request is ready to merge |
Removing the test TODO box for now... |
Merged to the master and 3.2 branches. Thank you for your contribution. |
The [LP64D ABI][1] requires the floating-point registers f24-f31 (aka fs0-fs7) callee-saved. The low 64 bits of a LSX/LASX vector register aliases with the corresponding FPR, so we must save and restore the callee-saved FPR when we writes into the corresponding vector register. This ABI breakage can be easily demonstrated by injecting the use of a saved FPR into the test in bio_enc_test.c: static int test_bio_enc_chacha20(int idx) { register double fs7 asm("f31") = 114.514; asm("#optimize barrier":"+f"(fs7)); return do_test_bio_cipher(EVP_chacha20(), idx) && fs7 == 114.514; } So fix it. To make the logic simpler, jump into the scalar implementation earlier when LSX and LASX are not enumerated in AT_HWCAP, or the input is too short. [1]: https://github.com/loongson/la-abi-specs/blob/v2.20/lapcs.adoc#floating-point-registers Reviewed-by: Neil Horman <nhorman@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from #22817)
The [LP64D ABI][1] requires the floating-point registers f24-f31 (aka fs0-fs7) callee-saved. The low 64 bits of a LSX/LASX vector register aliases with the corresponding FPR, so we must save and restore the callee-saved FPR when we writes into the corresponding vector register. This ABI breakage can be easily demonstrated by injecting the use of a saved FPR into the test in bio_enc_test.c: static int test_bio_enc_chacha20(int idx) { register double fs7 asm("f31") = 114.514; asm("#optimize barrier":"+f"(fs7)); return do_test_bio_cipher(EVP_chacha20(), idx) && fs7 == 114.514; } So fix it. To make the logic simpler, jump into the scalar implementation earlier when LSX and LASX are not enumerated in AT_HWCAP, or the input is too short. [1]: https://github.com/loongson/la-abi-specs/blob/v2.20/lapcs.adoc#floating-point-registers Reviewed-by: Neil Horman <nhorman@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from #22817) (cherry picked from commit b46de72)
The [LP64D ABI][1] requires the floating-point registers f24-f31 (aka fs0-fs7) callee-saved. The low 64 bits of a LSX/LASX vector register aliases with the corresponding FPR, so we must save and restore the callee-saved FPR when we writes into the corresponding vector register. This ABI breakage can be easily demonstrated by injecting the use of a saved FPR into the test in bio_enc_test.c: static int test_bio_enc_chacha20(int idx) { register double fs7 asm("f31") = 114.514; asm("#optimize barrier":"+f"(fs7)); return do_test_bio_cipher(EVP_chacha20(), idx) && fs7 == 114.514; } So fix it. To make the logic simpler, jump into the scalar implementation earlier when LSX and LASX are not enumerated in AT_HWCAP, or the input is too short. [1]: https://github.com/loongson/la-abi-specs/blob/v2.20/lapcs.adoc#floating-point-registers Reviewed-by: Neil Horman <nhorman@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from openssl/openssl#22817) Signed-off-by: lanming1120 <lanming1120@126.com>
The [LP64D ABI][1] requires the floating-point registers f24-f31 (aka fs0-fs7) callee-saved. The low 64 bits of a LSX/LASX vector register aliases with the corresponding FPR, so we must save and restore the callee-saved FPR when we writes into the corresponding vector register. This ABI breakage can be easily demonstrated by injecting the use of a saved FPR into the test in bio_enc_test.c: static int test_bio_enc_chacha20(int idx) { register double fs7 asm("f31") = 114.514; asm("#optimize barrier":"+f"(fs7)); return do_test_bio_cipher(EVP_chacha20(), idx) && fs7 == 114.514; } So fix it. To make the logic simpler, jump into the scalar implementation earlier when LSX and LASX are not enumerated in AT_HWCAP, or the input is too short. [1]: https://github.com/loongson/la-abi-specs/blob/v2.20/lapcs.adoc#floating-point-registers Reviewed-by: Neil Horman <nhorman@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from openssl/openssl#22817) (cherry picked from commit b46de72c260e7c4d9bfefa35b02295ba32ad2ac6) Signed-off-by: lanming1120 <lanming1120@126.com>
The [LP64D ABI][1] requires the floating-point registers f24-f31 (aka fs0-fs7) callee-saved. The low 64 bits of a LSX/LASX vector register aliases with the corresponding FPR, so we must save and restore the callee-saved FPR when we writes into the corresponding vector register. This ABI breakage can be easily demonstrated by injecting the use of a saved FPR into the test in bio_enc_test.c: static int test_bio_enc_chacha20(int idx) { register double fs7 asm("f31") = 114.514; asm("#optimize barrier":"+f"(fs7)); return do_test_bio_cipher(EVP_chacha20(), idx) && fs7 == 114.514; } So fix it. To make the logic simpler, jump into the scalar implementation earlier when LSX and LASX are not enumerated in AT_HWCAP, or the input is too short. [1]: https://github.com/loongson/la-abi-specs/blob/v2.20/lapcs.adoc#floating-point-registers Reviewed-by: Neil Horman <nhorman@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from openssl#22817)
In that pull request, the input length check was moved forward, but the related ori instruction was missing, and it will cause input of any length down to the much slower scalar implementation.
In that pull request, the input length check was moved forward, but the related ori instruction was missing, and it will cause input of any length down to the much slower scalar implementation. CLA: trivial
In that pull request, the input length check was moved forward, but the related ori instruction was missing, and it will cause input of any length down to the much slower scalar implementation. Fixes openssl#23300 CLA: trivial
The regression was introduced in PR #22817. In that pull request, the input length check was moved forward, but the related ori instruction was missing, and it will cause input of any length down to the much slower scalar implementation. Fixes #23300 CLA: trivial Reviewed-by: Shane Lontis <shane.lontis@oracle.com> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from #23301)
The regression was introduced in PR #22817. In that pull request, the input length check was moved forward, but the related ori instruction was missing, and it will cause input of any length down to the much slower scalar implementation. Fixes #23300 CLA: trivial Reviewed-by: Shane Lontis <shane.lontis@oracle.com> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from #23301) (cherry picked from commit 9710285)
The LP64D ABI requires the floating-point registers f24-f31 (aka fs0-fs7) callee-saved. The low 64 bits of a LSX/LASX vector register aliases with the corresponding FPR, so we must save and restore the callee-saved FPR when we writes into the corresponding vector register.
This ABI breakage can be easily demonstrated by injecting the use of a saved FPR into the test in bio_enc_test.c:
So fix it. To make the logic simpler, jump into the scalar implementation earlier when LSX and LASX are not enumerated in AT_HWCAP, or the input is too short.