Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MemorySanitizer: use-of-uninitialized-value with parser::unescape() #1965

Closed
greenrobot opened this issue Mar 9, 2023 · 3 comments
Closed

Comments

@greenrobot
Copy link

==6334==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0xbb30e1 in simdjson::haswell::(anonymous namespace)::trailing_zeroes(unsigned long) /builds/objectbox/objectbox/cbuild/RelWithDebInfo-msan/objectbox/src/main/cpp/external/simdjson/../../../../../../../../objectbox/src/main/cpp/external/simdjson/simdjson.h:16154:10
    #1 0xbb30e1 in simdjson::haswell::(anonymous namespace)::backslash_and_quote::quote_index() /builds/objectbox/objectbox/cbuild/RelWithDebInfo-msan/objectbox/src/main/cpp/external/simdjson/../../../../../../../../objectbox/src/main/cpp/external/simdjson/simdjson.h:16793:46
    #2 0xbb30e1 in simdjson::haswell::(anonymous namespace)::stringparsing::parse_string(unsigned char const*, unsigned char*, bool) /builds/objectbox/objectbox/cbuild/RelWithDebInfo-msan/objectbox/src/main/cpp/external/simdjson/../../../../../../../../objectbox/src/main/cpp/external/simdjson/simdjson.cpp:10779:29
    #3 0xbb30e1 in simdjson::haswell::dom_parser_implementation::parse_string(unsigned char const*, unsigned char*, bool) const /builds/objectbox/objectbox/cbuild/RelWithDebInfo-msan/objectbox/src/main/cpp/external/simdjson/../../../../../../../../objectbox/src/main/cpp/external/simdjson/simdjson.cpp:11708:10
    #4 0x5f22f2 in simdjson::fallback::ondemand::parser::unescape(simdjson::fallback::ondemand::raw_json_string, unsigned char*&, bool) const /builds/objectbox/objectbox/cbuild/RelWithDebInfo-msan/objectbox/src/main/cpp/server/../../../../../../../objectbox/src/main/cpp/external/simdjson/simdjson.h:31112:34
    #5 0x5f22f2 in objectbox::server::GraphQLJson::extractGraphQLStringFromJson(objectbox::Bytes&, unsigned long) /builds/objectbox/objectbox/cbuild/RelWithDebInfo-msan/objectbox/src/main/cpp/server/../../../../../../../objectbox/src/main/cpp/server/graphql/GraphQLJson.cpp:57:24

To Reproduce

Note: error handling code was removed for brevity:

// Clear the padding to avoid "MemorySanitizer: use-of-uninitialized-value":
// memset(buffer.bytesMutable() + bodySize, 0, simdjson::SIMDJSON_PADDING);

simdjson::padded_string_view jsonStringView(buffer.bytes(), bodySize, buffer.size());
thread_local simdjson::ondemand::parser parser;
simdjson::ondemand::document jsonDoc = parser.iterate(jsonStringView);
simdjson::ondemand::object root;
simdjson::error_code errorCode = jsonDoc.get_object().get(root);
simdjson::ondemand::value queryField;
errorCode = root.find_field("query").get(queryField);

// Note: not using get_string as we want to re-use the (dst) buffer and append a zero byte
simdjson::ondemand::raw_json_string rawJsonString;
errorCode = queryField.get_raw_json_string().get(rawJsonString);

std::string_view graphQLString;
errorCode = parser.unescape(rawJsonString, dst).get(graphQLString);

The last line triggers the msan issue. The commented out memset in the second line prevents it.

simjson 3.1.3 (amalgamation)

Configuration

  • OS: Linux, x64
  • Compiler clang 14
@lemire
Copy link
Member

lemire commented Mar 9, 2023

Thanks for the report. For efficiency reasons, simdjson requires a string with a few bytes (simdjson::SIMDJSON_PADDING) at the end, these bytes may be read but their content does not affect the parsing. In practice, it means that the JSON inputs should be stored in a memory region with simdjson::SIMDJSON_PADDING extra bytes at the end. You do not have to set these bytes to specific values though you may want to if you want to avoid runtime warnings with some sanitizers.

The warning from the sanitizer is a false report in this instance. It does not indicate a bug.

@lemire lemire closed this as completed Mar 9, 2023
@lemire
Copy link
Member

lemire commented Mar 9, 2023

I have closed this issue. If the reporter wants to use a memory sanitizer and they want to avoid false reports, they may want to initialize the padding bytes. The padding bytes are definitively read, by design.

@greenrobot
Copy link
Author

greenrobot commented Mar 9, 2023

Thanks for making your point of view clear. Our team already figured out the solution and thus we're fine as it. For future users, maybe consider at least mentioning this behavior in the docs to save you from further support efforts? Also, it takes a while for others to pinpoint the issue, so I think it would be kind to your lib users to initialize those 64 bytes while technically it's not a bug.

PS.: Thanks for an awesome lib and maintaining it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants