-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Throw exception on null characters #65
Conversation
e9016ac
to
69feaee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic, thanks for finding out this bug and fixing! I only have a comment on the algorithm performance 👇🏽
c_src/exml.cpp
Outdated
@@ -480,6 +481,13 @@ static ERL_NIF_TERM parse_next(ErlNifEnv *env, int argc, | |||
if (!parser->copy_buffer(env, argv[1])) | |||
return enif_make_badarg(env); | |||
|
|||
// Raise an exception when null character is found. | |||
const char *data = reinterpret_cast<const char*>(&Parser::buffer[0]); | |||
if (std::strlen(data) != Parser::buffer.size() - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it is correct, the problem is that std:strlen
is an O(n) operation. Decoding xml is already O(n) so this would mean we're scanning the string twice. I'd prefer here introduce the check on the scan that is already happening, I introduced something similar here: #62 (I'm wondering if this bug now is somehow related to those last changes of mine 🤔 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right, this seems more optimal, specially that in the most common case result.rest
will be empty, so std::strlen
will be a noop, perfect 👌🏽
exml_stream
doesn't validate null characters correctly. Here is an example:Any further data passed via
exml_stream:parse/2
to the parser won't be processed as well because rapidxml works on null-terminated strings and treats the whole input as empty. The similar issue happens when null character is in the middle of input (other exception is thrown but it haseof
flag as well).I implemented the fix that checks (in advance) whether the buffer contains null character. If so, it throws the error immediately. This way we can make sure that input received from Erlang will be valid in term of processing.