Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LoongArch64 assembly pack: Fix ChaCha20 ABI breakage #22817

Closed
wants to merge 1 commit into from

Conversation

xry111
Copy link
Contributor

@xry111 xry111 commented Nov 25, 2023

The LP64D ABI requires the floating-point registers f24-f31 (aka fs0-fs7) callee-saved. The low 64 bits of a LSX/LASX vector register aliases with the corresponding FPR, so we must save and restore the callee-saved FPR when we writes into the corresponding vector register.

This ABI breakage can be easily demonstrated by injecting the use of a saved FPR into the test in bio_enc_test.c:

static int test_bio_enc_chacha20(int idx)
{
    register double fs7 asm("f31") = 114.514;
    asm("#optimize barrier":"+f"(fs7));
    return do_test_bio_cipher(EVP_chacha20(), idx) && fs7 == 114.514;
}

So fix it. To make the logic simpler, jump into the scalar implementation earlier when LSX and LASX are not enumerated in AT_HWCAP, or the input is too short.

@xry111
Copy link
Contributor Author

xry111 commented Nov 25, 2023

I'm not sure how to properly add a test for psABI conformance...

The [LP64D ABI][1] requires the floating-point registers f24-f31
(aka fs0-fs7) callee-saved.  The low 64 bits of a LSX/LASX vector
register aliases with the corresponding FPR, so we must save and restore
the callee-saved FPR when we writes into the corresponding vector
register.

This ABI breakage can be easily demonstrated by injecting the use of a
saved FPR into the test in bio_enc_test.c:

    static int test_bio_enc_chacha20(int idx)
    {
        register double fs7 asm("f31") = 114.514;
        asm("#optimize barrier":"+f"(fs7));
        return do_test_bio_cipher(EVP_chacha20(), idx) && fs7 == 114.514;
    }

So fix it.  To make the logic simpler, jump into the scalar
implementation earlier when LSX and LASX are not enumerated in AT_HWCAP,
or the input is too short.

[1]: https://github.com/loongson/la-abi-specs/blob/v2.20/lapcs.adoc#floating-point-registers
@t8m t8m added branch: master Merge to master branch approval: review pending This pull request needs review by a committer approval: otc review pending This pull request needs review by an OTC member triaged: bug The issue/pr is/fixes a bug tests: exempted The PR is exempt from requirements for testing branch: 3.2 Merge to openssl-3.2 labels Nov 28, 2023
@t8m t8m removed the approval: otc review pending This pull request needs review by an OTC member label Nov 28, 2023
@davidben
Copy link
Contributor

I'm not sure how to properly add a test for psABI conformance...

#22056 discusses this a bit. Given the number of ABI issues that have come up recently, OpenSSL investing in this area seems worthwhile. The ABI testing framework has been incredibly useful for us in BoringSSL.

@xry111
Copy link
Contributor Author

xry111 commented Dec 4, 2023

@paulidale Could you help reviewing this and giving another approve?

Sorry for being impatient and impolite but this is an important bug fix (TM).

@t8m t8m requested a review from a team December 4, 2023 09:43
@xry111
Copy link
Contributor Author

xry111 commented Dec 12, 2023

@openssl/committers Ping for a second approve.

@t8m t8m added approval: done This pull request has the required number of approvals and removed approval: review pending This pull request needs review by a committer labels Dec 12, 2023
@openssl-machine openssl-machine added approval: ready to merge The 24 hour grace period has passed, ready to merge and removed approval: done This pull request has the required number of approvals labels Dec 13, 2023
@openssl-machine
Copy link
Collaborator

This pull request is ready to merge

@xry111
Copy link
Contributor Author

xry111 commented Dec 19, 2023

Removing the test TODO box for now...

@t8m
Copy link
Member

t8m commented Dec 19, 2023

Merged to the master and 3.2 branches. Thank you for your contribution.

@t8m t8m closed this Dec 19, 2023
openssl-machine pushed a commit that referenced this pull request Dec 19, 2023
The [LP64D ABI][1] requires the floating-point registers f24-f31
(aka fs0-fs7) callee-saved.  The low 64 bits of a LSX/LASX vector
register aliases with the corresponding FPR, so we must save and restore
the callee-saved FPR when we writes into the corresponding vector
register.

This ABI breakage can be easily demonstrated by injecting the use of a
saved FPR into the test in bio_enc_test.c:

    static int test_bio_enc_chacha20(int idx)
    {
        register double fs7 asm("f31") = 114.514;
        asm("#optimize barrier":"+f"(fs7));
        return do_test_bio_cipher(EVP_chacha20(), idx) && fs7 == 114.514;
    }

So fix it.  To make the logic simpler, jump into the scalar
implementation earlier when LSX and LASX are not enumerated in AT_HWCAP,
or the input is too short.

[1]: https://github.com/loongson/la-abi-specs/blob/v2.20/lapcs.adoc#floating-point-registers

Reviewed-by: Neil Horman <nhorman@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from #22817)
openssl-machine pushed a commit that referenced this pull request Dec 19, 2023
The [LP64D ABI][1] requires the floating-point registers f24-f31
(aka fs0-fs7) callee-saved.  The low 64 bits of a LSX/LASX vector
register aliases with the corresponding FPR, so we must save and restore
the callee-saved FPR when we writes into the corresponding vector
register.

This ABI breakage can be easily demonstrated by injecting the use of a
saved FPR into the test in bio_enc_test.c:

    static int test_bio_enc_chacha20(int idx)
    {
        register double fs7 asm("f31") = 114.514;
        asm("#optimize barrier":"+f"(fs7));
        return do_test_bio_cipher(EVP_chacha20(), idx) && fs7 == 114.514;
    }

So fix it.  To make the logic simpler, jump into the scalar
implementation earlier when LSX and LASX are not enumerated in AT_HWCAP,
or the input is too short.

[1]: https://github.com/loongson/la-abi-specs/blob/v2.20/lapcs.adoc#floating-point-registers

Reviewed-by: Neil Horman <nhorman@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from #22817)

(cherry picked from commit b46de72)
wanghao75 pushed a commit to openeuler-mirror/openssl that referenced this pull request Dec 22, 2023
The [LP64D ABI][1] requires the floating-point registers f24-f31
(aka fs0-fs7) callee-saved.  The low 64 bits of a LSX/LASX vector
register aliases with the corresponding FPR, so we must save and restore
the callee-saved FPR when we writes into the corresponding vector
register.

This ABI breakage can be easily demonstrated by injecting the use of a
saved FPR into the test in bio_enc_test.c:

    static int test_bio_enc_chacha20(int idx)
    {
        register double fs7 asm("f31") = 114.514;
        asm("#optimize barrier":"+f"(fs7));
        return do_test_bio_cipher(EVP_chacha20(), idx) && fs7 == 114.514;
    }

So fix it.  To make the logic simpler, jump into the scalar
implementation earlier when LSX and LASX are not enumerated in AT_HWCAP,
or the input is too short.

[1]: https://github.com/loongson/la-abi-specs/blob/v2.20/lapcs.adoc#floating-point-registers

Reviewed-by: Neil Horman <nhorman@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from openssl/openssl#22817)

Signed-off-by: lanming1120 <lanming1120@126.com>
wanghao75 pushed a commit to openeuler-mirror/openssl that referenced this pull request Dec 22, 2023
The [LP64D ABI][1] requires the floating-point registers f24-f31
(aka fs0-fs7) callee-saved.  The low 64 bits of a LSX/LASX vector
register aliases with the corresponding FPR, so we must save and restore
the callee-saved FPR when we writes into the corresponding vector
register.

This ABI breakage can be easily demonstrated by injecting the use of a
saved FPR into the test in bio_enc_test.c:

    static int test_bio_enc_chacha20(int idx)
    {
        register double fs7 asm("f31") = 114.514;
        asm("#optimize barrier":"+f"(fs7));
        return do_test_bio_cipher(EVP_chacha20(), idx) && fs7 == 114.514;
    }

So fix it.  To make the logic simpler, jump into the scalar
implementation earlier when LSX and LASX are not enumerated in AT_HWCAP,
or the input is too short.

[1]: https://github.com/loongson/la-abi-specs/blob/v2.20/lapcs.adoc#floating-point-registers

Reviewed-by: Neil Horman <nhorman@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from openssl/openssl#22817)

(cherry picked from commit b46de72c260e7c4d9bfefa35b02295ba32ad2ac6)
Signed-off-by: lanming1120 <lanming1120@126.com>
wbeck10 pushed a commit to wbeck10/openssl that referenced this pull request Jan 8, 2024
The [LP64D ABI][1] requires the floating-point registers f24-f31
(aka fs0-fs7) callee-saved.  The low 64 bits of a LSX/LASX vector
register aliases with the corresponding FPR, so we must save and restore
the callee-saved FPR when we writes into the corresponding vector
register.

This ABI breakage can be easily demonstrated by injecting the use of a
saved FPR into the test in bio_enc_test.c:

    static int test_bio_enc_chacha20(int idx)
    {
        register double fs7 asm("f31") = 114.514;
        asm("#optimize barrier":"+f"(fs7));
        return do_test_bio_cipher(EVP_chacha20(), idx) && fs7 == 114.514;
    }

So fix it.  To make the logic simpler, jump into the scalar
implementation earlier when LSX and LASX are not enumerated in AT_HWCAP,
or the input is too short.

[1]: https://github.com/loongson/la-abi-specs/blob/v2.20/lapcs.adoc#floating-point-registers

Reviewed-by: Neil Horman <nhorman@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from openssl#22817)
lrzlin added a commit to lrzlin/openssl that referenced this pull request Jan 14, 2024
In that pull request, the input length check was moved forward,
but the related ori instruction was missing, and it will cause
input of any length down to the much slower scalar implementation.
lrzlin added a commit to lrzlin/openssl that referenced this pull request Jan 14, 2024
In that pull request, the input length check was moved forward,
but the related ori instruction was missing, and it will cause
input of any length down to the much slower scalar implementation.

CLA: trivial
lrzlin added a commit to lrzlin/openssl that referenced this pull request Jan 15, 2024
In that pull request, the input length check was moved forward,
but the related ori instruction was missing, and it will cause
input of any length down to the much slower scalar implementation.

Fixes openssl#23300

CLA: trivial
openssl-machine pushed a commit that referenced this pull request Jan 17, 2024
The regression was introduced in PR #22817.

In that pull request, the input length check was moved forward,
but the related ori instruction was missing, and it will cause
input of any length down to the much slower scalar implementation.

Fixes #23300

CLA: trivial

Reviewed-by: Shane Lontis <shane.lontis@oracle.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from #23301)
openssl-machine pushed a commit that referenced this pull request Jan 17, 2024
The regression was introduced in PR #22817.

In that pull request, the input length check was moved forward,
but the related ori instruction was missing, and it will cause
input of any length down to the much slower scalar implementation.

Fixes #23300

CLA: trivial

Reviewed-by: Shane Lontis <shane.lontis@oracle.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from #23301)

(cherry picked from commit 9710285)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approval: ready to merge The 24 hour grace period has passed, ready to merge branch: master Merge to master branch branch: 3.2 Merge to openssl-3.2 tests: exempted The PR is exempt from requirements for testing triaged: bug The issue/pr is/fixes a bug
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

5 participants