Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditional jump or move depends on uninitialised value(s) #450

Open
huitema opened this issue Dec 2, 2022 · 9 comments
Open

Conditional jump or move depends on uninitialised value(s) #450

huitema opened this issue Dec 2, 2022 · 9 comments

Comments

@huitema
Copy link
Collaborator

huitema commented Dec 2, 2022

Got this warning when running valgrind to check picoquic code. The issue happens inside openssl, accessed through picotls

This is part of the valgrind report:

==4255== 1526 errors in context 1 of 1:
==4255== Conditional jump or move depends on uninitialised value(s)
==4255== at 0x4AF62A4: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==4255== by 0x4AF6581: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==4255== by 0x49F70F4: EVP_DecryptFinal_ex (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==4255== by 0x1C6724: aead_do_decrypt (openssl.c:962)
==4255== by 0x182F26: picoquic_remove_packet_protection (packet.c:677)
==4255== by 0x18381D: picoquic_parse_header_and_decrypt (packet.c:783)
==4255== by 0x186149: picoquic_incoming_segment (packet.c:2157)

Have we seen that already?

@huitema
Copy link
Collaborator Author

huitema commented Dec 2, 2022

This is new. Previously, Picoquic tests were run using the Picotls version of Thu Sep 15 13:29:56 2022 +0900, after commit 2bb3e454a7aae0f0542374973c9daf83bfef8ffe. The PR that found the issue tests execution with the Picotls version of Wed Nov 30 08:49:47 2022 +0900, after commit 7e97d3e48e237b9a01227cf7f4b116325b356fc6. Something changed between these two commits.

@huitema
Copy link
Collaborator Author

huitema commented Dec 2, 2022

I tried synching at multiple commits, and it seems that the picoquic tests keep failing. One of the issues was found in fusion.c, line 1025...:

    ptls_fusion_aesgcm_context_t *newp;
    if ((newp = aligned_alloc(32, new_ctx_size)) == NULL)
        return NULL;
    memcpy(newp, ctx, old_ctx_size);
    ptls_clear_memory(ctx, old_ctx_size);
    free(ctx);
    ctx = newp;

In theory, the combination of aligned_alloc() and free() should work -- that's what the CPP reference says. I know there was an issue in the past, and that the sanitizer insisted on aligned_alloc() and aligned_free(). I believe that the sanitizer has been fixed, but there may be an issue.

@huitema
Copy link
Collaborator Author

huitema commented Dec 9, 2022

I was able to work around the problem in picoquic by forcing use of "fusion" for AES GCM as explained in this picoquic issue.

Turns out that the issue is already reported in OpenSSL.

@huitema
Copy link
Collaborator Author

huitema commented Dec 10, 2022

I ran the openssl tests through "valgrind". The issue mentioned above reproes, as well as a few others:

valgrind -v --track-origins=yes ./test-openssl.t 
...
==5117== HEAP SUMMARY:
==5117==     in use at exit: 147,948 bytes in 392 blocks
==5117==   total heap usage: 5,494,446 allocs, 5,494,054 frees, 10,132,994,211 bytes allocated
==5117== 
==5117== Searching for pointers to 392 not-freed blocks
==5117== Checked 135,216 bytes
==5117== 
==5117== LEAK SUMMARY:
==5117==    definitely lost: 138,229 bytes in 284 blocks
==5117==    indirectly lost: 5,602 bytes in 54 blocks
==5117==      possibly lost: 0 bytes in 0 blocks
==5117==    still reachable: 4,117 bytes in 54 blocks
==5117==         suppressed: 0 bytes in 0 blocks
==5117== Rerun with --leak-check=full to see details of leaked memory
==5117== 
==5117== ERROR SUMMARY: 2606 errors from 1 contexts (suppressed: 0 from 0)
==5117== 
==5117== 2606 errors in context 1 of 1:
==5117== Conditional jump or move depends on uninitialised value(s)
==5117==    at 0x4AFB234: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==5117==    by 0x4AFB511: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==5117==    by 0x49FC0F4: EVP_DecryptFinal_ex (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==5117==    by 0x142AA4: aead_do_decrypt (openssl.c:1072)
==5117==    by 0x12ECF3: ptls_aead_decrypt (picotls.h:1873)
==5117==    by 0x12ECF3: test_ciphersuite.isra.0 (picotls.c:196)
==5117==    by 0x12EF35: test_aes128gcm (picotls.c:374)
==5117==    by 0x121194: subtest (picotest.c:96)
==5117==    by 0x1412AF: test_picotls (picotls.c:1970)
==5117==    by 0x121194: subtest (picotest.c:96)
==5117==    by 0x112B69: main (openssl.c:580)
==5117==  Uninitialised value was created by a stack allocation
==5117==    at 0x4B73063: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==5117== 
==5117== ERROR SUMMARY: 2606 errors from 1 contexts (suppressed: 0 from 0)

@huitema
Copy link
Collaborator Author

huitema commented Dec 10, 2022

I think we should consider adding a valgrind CI test. Maybe not the full run of test_openssl.t. Valgrind is slow, and the test above took a long time to execute. But it did find leaks. Most of the time was spent in the many-handshake tests: 2300 seconds for the non-async tests, 2000 seconds for the async tests. Is there a simple way to exclude them?


@kazuho
Copy link
Member

kazuho commented Dec 12, 2022

Re aligned_alloc in comment #450 (comment): please check #454.

@kazuho
Copy link
Member

kazuho commented Dec 12, 2022

Generally speaking, I'm not sure how much we can trust Valgrind here.

In optimized crypto code, we often load out-of-bounds intentionally. loadn128 and loadn256 functions in fusion.c are good examples. You would notice the NO_SANITIZE_ADDRESS option being set there. They are set because ASAN complains about out-of-bounds access.

I would assume AES-GCM code of libcryto doing something similar.

@huitema
Copy link
Collaborator Author

huitema commented Dec 12, 2022

I have found some real issues using valgrind, issues that were not flagged by ASAN/UBSAN. I like having it as one of the CI checks in picoquic.

The code in fusion does not trip valgrind. This allowed me to work around the openssl issues. I just run valgrind in a configuration that uses fusion instead of OpenSSL's version of AES GCM.

I am not sure if there are macros similar to NO_SANITIZE_ADDRESS for valgrind. If there was, that would be nice.

@kazuho
Copy link
Member

kazuho commented Dec 13, 2022

I share your irritation. This (possibly bogus) warning could be specific to a particular combination of valgrind and openssl.

I did not see such a warning when I tried to repro on Ubuntu 22.04 (valgrind 3.18.1 / openssl 3.0.2).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants