Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak when parsing a protobuf message with duplicate fields #615

Closed
niooss-ledger opened this issue Nov 24, 2020 · 5 comments
Closed

Comments

@niooss-ledger
Copy link

Hello,

While fuzzing a project that relies on Nanopb to parse (untrusted) user input, I found a memory leak which is triggered by sending some message where fields are duplicated.

Steps to reproduce the issue

In order to test this memleak on several versions of Nanopb (and several Linux distributions), I have written the following script:

#!/bin/sh
# Reproduce a memory leak issue in nanopb parser
#
# Dependencies on Debian: sudo apt install clang git protobuf-compiler python3 python3-protobuf
set -e -x

# Clone nanopb
if ! [ -d nanopb ] ; then
    git clone https://github.com/nanopb/nanopb
fi

# Create a protobuf file for some message with a header
cat > mypackage.proto << EOF
syntax = "proto3";
package mypackage;

import "nanopb.proto";

message HeaderField {
  bytes mydata = 1 [(nanopb).type = FT_POINTER];
}

message Header {
  option (nanopb_msgopt).anonymous_oneof = true;
  oneof one {
    HeaderField field = 1;
  }
}

message MessageWithHeader {
  Header head = 1;
}
EOF

# Create a fuzzer on this message
cat > fuzz_decode_message.c << EOF
#include <stdint.h>
#include <stdio.h>

#include <pb_decode.h>
#include "mypackage.pb.h"

int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    mypackage_MessageWithHeader req = {};

    pb_istream_t is = pb_istream_from_buffer(data, size);
    if (!pb_decode(&is, mypackage_MessageWithHeader_fields, &req)) {
        printf("Failed to decode input: %s\n", PB_GET_ERROR(&is));
        return 0;
    }
    printf("Parsing ok, req.head.which_one = %u\n", req.head.which_one);
    pb_release(mypackage_MessageWithHeader_fields, &req);
    return 0;
}
EOF

# Compile the .proto and the fuzzer
protoc \
    -Inanopb/generator \
    -Inanopb/generator/proto \
    -I. \
    --plugin=protoc-gen-nanopb=nanopb/generator/protoc-gen-nanopb \
    --nanopb_opt= \
    --nanopb_out=. \
    mypackage.proto

clang -g -ggdb -O1 -fsanitize=fuzzer,address,undefined \
    -Wall -Wextra -Inanopb -DPB_ENABLE_MALLOC -DPB_FIELD_32BIT \
    -o fuzz_decode_message.out \
    fuzz_decode_message.c mypackage.pb.c nanopb/pb_decode.c nanopb/pb_common.c

# Run on a test case that leaks some bytes
python3 -c 'import sys;sys.stdout.buffer.write(bytes.fromhex("0a06 0a020a00 0a00"))' > memleak_message
./fuzz_decode_message.out memleak_message

What happens?

On a up-to-date Debian 10 machine, this leads to the following output:

./fuzz_decode_message.out: Running 1 inputs 1 time(s) each.
Running: memleak_message
Parsing ok, req.head.which_one = 1
Parsing ok, req.head.which_one = 1

=================================================================
==3937==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 4 byte(s) in 1 object(s) allocated from:
    #0 0x4f25a2 in realloc (/fuzz_decode_message.out+0x4f25a2)
    #1 0x536a80 in allocate_field /nanopb/pb_decode.c:581:11
    #2 0x533f3a in pb_dec_bytes /nanopb/pb_decode.c:1479:14
    #3 0x52ed88 in decode_pointer_field /nanopb/pb_decode.c
    #4 0x525632 in pb_decode_inner /nanopb/pb_decode.c:1083:14
    #5 0x5359cd in pb_dec_submessage /nanopb/pb_decode.c:1589:18
    #6 0x52d008 in decode_static_field /nanopb/pb_decode.c:532:20
    #7 0x525632 in pb_decode_inner /nanopb/pb_decode.c:1083:14
    #8 0x5359cd in pb_dec_submessage /nanopb/pb_decode.c:1589:18
    #9 0x52cea9 in decode_static_field /nanopb/pb_decode.c
    #10 0x525632 in pb_decode_inner /nanopb/pb_decode.c:1083:14
    #11 0x526c24 in pb_decode /nanopb/pb_decode.c:1159:14
    #12 0x52143d in LLVMFuzzerTestOneInput /fuzz_decode_message.c:11:10
    #13 0x42edfa in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) (/fuzz_decode_message.out+0x42edfa)
    #14 0x422003 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) (/fuzz_decode_message.out+0x422003)
    #15 0x426b31 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) (/fuzz_decode_message.out+0x426b31)
    #16 0x44a3f2 in main (/fuzz_decode_message.out+0x44a3f2)
    #17 0x7fd6f93e609a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

SUMMARY: AddressSanitizer: 4 byte(s) leaked in 1 allocation(s).

INFO: a leak has been found in the initial corpus.

With my program, 0a06 0a020a00 0a00 leaks 4 bytes, 0a0a 0a020a00 0a020a00 0a00 leaks 8 bytes, etc.

What should happen?

I believe that parsing untrusted input should not leak allocated memory. You might disagree with this belief, in which case it would be nice to indicate in https://github.com/nanopb/nanopb/security/policy that Nanopb may leak memory when parsing untrusted data which was maliciously crafted.

@PetteriAimonen
Copy link
Member

I agree with you that parsing untrusted input shouldn't leak memory, and the fuzztest included in nanopb tries to verify that also.

It appears that your test case is hitting some situation that is not covered by the fuzztest. I'll have to take a look.

PetteriAimonen added a commit that referenced this issue Nov 24, 2020
Nanopb would leak memory when all of the following conditions were true:
- PB_ENABLE_MALLOC is defined at the compile time
- Message definitions contains an oneof field,
  the oneof contains a static submessage, and
  the static submessage contains a pointer field.
- Data being decoded contains two values for the submessage.

The logic in pb_release_union_field would detect that the same
submessage occurs twice, and wouldn't release it because keeping
the old values is necessary to match the C++ library behavior
regarding message merges.

But then decode_static_field() would go to memset() the whole
submessage to zero, because it unconditionally assumed it to
be uninitialized memory. This would normally happen when the
contents of the union field is switched to a different oneof
item, instead of merging with the same one.

This commit changes it so that the field is memset() only when
`which_field` contains a different tag. Also the setting of the
default values for the submessage was moved to decode_static_field()
so that it wouldn't overwrite the values that must be merged.

Test cases must still be extended to cover this problem.
@PetteriAimonen
Copy link
Member

Looks like the current test cases did not find this because they contained only a fully static fields inside a static submessage inside oneof, and pointer fields inside a pointer submessage inside oneof. And the bug is only triggered by pointer fields inside a static submessage inside oneof.

There is a preliminary fix on git now, but test cases must still be expanded to cover this. Something like #143 would be nice to help finding cases like this, but I'm not sure how to go about actually implementing that.

PetteriAimonen added a commit that referenced this issue Nov 25, 2020
This also covers the fairly rarely used behavior of protobuf C++
library regarding oneof merges: if an oneof submessage occurs
multiple times in a message, their contents are merged together.
This behavior was also previously broken in nanopb.
PetteriAimonen added a commit that referenced this issue Nov 25, 2020
This gives the fuzzer a chance to find bugs like #615 in the future.
@niooss-ledger
Copy link
Author

Thanks for your quick reply. I confirm your patches fix the memory leak in the project that I am fuzzing (I updated Nanopb to master branch, which includes edf6dcb).

I am not familiar enough with the project to help adding what would be necessary to generate .proto files by fuzzing which would have detected this issue (like what is described in #143) but I will continue to fuzz some projects that use Nanopb and report other issues that I might find.

For this issue, do you know when a release will include a fix? (In a few weeks/months/...?). Also, is there any plan regarding back-porting the fix to branch maintenance_0.3?

@PetteriAimonen
Copy link
Member

Yeah, I'm working on making a release, you can expect it today or tomorrow. And yeah, I will backport the fix and the test to 0.3 also.

PetteriAimonen added a commit that referenced this issue Nov 25, 2020
This also covers the fairly rarely used behavior of protobuf C++
library regarding oneof merges: if an oneof submessage occurs
multiple times in a message, their contents are merged together.
This behavior was also previously broken in nanopb.
PetteriAimonen added a commit that referenced this issue Nov 25, 2020
Nanopb would leak memory when all of the following conditions were true:
- PB_ENABLE_MALLOC is defined at the compile time
- Message definitions contains an oneof field,
  the oneof contains a static submessage, and
  the static submessage contains a pointer field.
- Data being decoded contains two values for the submessage.

The logic in pb_release_union_field would detect that the same
submessage occurs twice, and wouldn't release it because keeping
the old values is necessary to match the C++ library behavior
regarding message merges.

But then decode_static_field() would go to memset() the whole
submessage to zero, because it unconditionally assumed it to
be uninitialized memory. This would normally happen when the
contents of the union field is switched to a different oneof
item, instead of merging with the same one.

This commit changes it so that the field is memset() only when
`which_field` contains a different tag.
@PetteriAimonen
Copy link
Member

Fix is now released in 0.4.4 and 0.3.9.7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants