-
Notifications
You must be signed in to change notification settings - Fork 889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nanopb breaks Clang optimizations by violating strict aliasing on _Bool #434
Comments
Some background: we have run into this problem while fuzzing an x86_64 binary that was built with recent versions of clang and includes the stable The specific code issue in the target code is triggered via the nanopb undefined behavior in combination with the relevant clang optimizations at |
I've seen a similar effect myself with GCC before: But yeah, I didn't realize this could occur when deserializing bool fields. Not sure if better option is to separate bool deserialization from other varints, or to make custom pb_bool_t as in issue #287. |
After considering this, I think most reasonable approach is to add separate decoder for Whether to go pb_bool_t route for 0.4.0, I'm still not sure. It feels annoying to introduce a separate type, but the problems in #287 crop up from time to time also. But I guess better to solve one issue at a time, add the separate decoder for 0.4.0 also and worry about #287 later. |
Hmm, the reason why this isn't picked up by existing fuzzing is here: The bool sanitizer was skipped for fuzz test because the fuzzer writes random stuff to the structure being encoded also, and encoder reads that via bool pointer. I guess that is something to be fixed also, as it's undefined behavior anyway and the input structure to pb_decode() is declared as mostly untrusted. |
Previously nanopb didn't enforce that decoded bool fields had valid true/false values. This could lead to undefined behavior in user code. This has potential security implications when 1) message contains bool field (has_ fields are safe) and 2) user code uses ternary operator dependent on the field value, such as: int value = msg.my_bool ? 1234 : 0 and 3) the value returned from ternary operator affects a memory access, such as: data_array[value] = 9999
Previously nanopb didn't enforce that decoded bool fields had valid true/false values. This could lead to undefined behavior in user code. This has potential security implications when 1) message contains bool field (has_ fields are safe) and 2) user code uses ternary operator dependent on the field value, such as: int value = msg.my_bool ? 1234 : 0 and 3) the value returned from ternary operator affects a memory access, such as: data_array[value] = 9999
Fixed now both in master and maintenance_0.3 branches. I'll be publishing the fix in 0.3.9.4 soon, but if you could try with your fuzz tests also, that would be great. |
@PetteriAimonen: Thank you for the swift fix.
( At the moment, this looks like a regression in the current master that was introduced before the patches for this issue. I'll look into it later today. |
Can you post more of the backtrace from that error? i.e. where in nanopb_generator.py that occurs? If it is in |
@PetteriAimonen: I can confirm that the target binary no longer runs into the undefined behavior problem when built with nanopb from the Backtrace:
I plan to provide more information later today. |
Yeah, those generator errors have been appearing on the master branch before, though I can't know what's causing it this time. It's due to the new default value generation that seems to be a bit prone to break with particular field types. I guess "<removed-type>" is some custom type? But messages and enums are already handled specially. |
@PetteriAimonen: it might make sense to move this to a new Github issue. Here is the specific type:
This is related to https://github.com/trezor/trezor-firmware/tree/master/legacy/firmware/protob . |
Hmm, I don't get that error with those files. I did get an error about unhandled field type 'MESSAGE' when compiling with only some files present, but that seems unrelated. |
was also seen with the recent However, it appears that the current fix for the bool handling is faulty, as my target binary runs into a segfault through the null pointer:
|
Yep, that's mostly why 0.4.0 has been stuck on master branch for a year now and not released, I haven't had time to test it enough that I would trust it enough to release. That particular bug about MESSAGE should be now fixed. I guess you'll have to look a bit deeper into the two issues you are currently having, as with the information you've given I can't reproduce them here. One possible cause for that jump to null pointer could be if you haven't recompiled some of the source code files, as the LTYPE numbering changed. |
I've further debugged the segfault. Backtrace:
Stepping through to the error:
For some reason Line 337 in 252f419
0x0 as the function pointer that is later called, leading to the crash.
This is the type in question:
(Side note: gdb reports the line numbers with an offset, but as far as I can see, everything else is normal and relates to the linked commit.) |
That Line 55 in 252f419
&pb_enc_submessage . But there was one entry added to that array in the bool fix, if that is missing from your code base for some reason, that would cause it to read the NULL pointer for extensions instead.
Considering your line numbers are messed up also, it seems like there may be some git merge error or other local modification in your repo. Can you diff your |
@PetteriAimonen the problem has been resolved. The mentioned segfault was indeed caused by a local toolchain issue and the upstream code appears to be fine. The nanopb git repo was on the correct commit (I had double-checked this) without local changes and the hashes over relevant files such as The resulting structures were a bit puzzling, but explain the error behavior:
Thanks for looking into this. |
@invd From what I can tell, Nanopb 0.4 will replace |
@PetteriAimonen: I think this issue can be closed. Thanks! |
@invd I'll close it once the fix is released, so it's easier to find if someone else hits it. |
@PetteriAimonen Any ETA for 0.3.9.4 release? |
@prusnak I hope to find time to do it in the next few weeks, but currently I'm too busy with for-profit work. |
Okay, I will use the tip of |
Fix released in 0.3.9.4. |
Previously nanopb didn't enforce that decoded bool fields had valid true/false values. This could lead to undefined behavior in user code. This has potential security implications when 1) message contains bool field (has_ fields are safe) and 2) user code uses ternary operator dependent on the field value, such as: int value = msg.my_bool ? 1234 : 0 and 3) the value returned from ternary operator affects a memory access, such as: data_array[value] = 9999
According to the C99 standard,
_Bool
will only be assigned0
or1
. However, Nanopb violates strict aliasing rules by using anint_least8_t
pointer to write to the underlying memory. This allows the_Bool
to contain a value greater than one.🚨 UNDEFINED BEHAVIOUR ALERT 🚨
Clang heavily relies on this assumption. For example,
message.bool_field ? X : Y
is often optimized intomessage.bool_field ^ N
, causing bugs in a codebase using Nanopb.The text was updated successfully, but these errors were encountered: