Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throw exception on null characters #65

Merged
merged 3 commits into from
Nov 14, 2022
Merged

Throw exception on null characters #65

merged 3 commits into from
Nov 14, 2022

Conversation

kamilwaz
Copy link

exml_stream doesn't validate null characters correctly. Here is an example:

> {ok, Parser} = exml_stream:new_parser(),
> exml_stream:parse(Parser, <<"\0<xml>">>).
{ok,{parser,#Ref<0.816531939.1357774849.238018>,
            [<<0,60,120,109,108,62>>]},
    []}

Any further data passed via exml_stream:parse/2 to the parser won't be processed as well because rapidxml works on null-terminated strings and treats the whole input as empty. The similar issue happens when null character is in the middle of input (other exception is thrown but it has eof flag as well).

I implemented the fix that checks (in advance) whether the buffer contains null character. If so, it throws the error immediately. This way we can make sure that input received from Erlang will be valid in term of processing.

@kamilwaz kamilwaz force-pushed the fix-null branch 2 times, most recently from e9016ac to 69feaee Compare November 10, 2022 16:02
@kamilwaz kamilwaz marked this pull request as ready for review November 10, 2022 16:26
Copy link

@NelsonVides NelsonVides left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic, thanks for finding out this bug and fixing! I only have a comment on the algorithm performance 👇🏽

c_src/exml.cpp Outdated
@@ -480,6 +481,13 @@ static ERL_NIF_TERM parse_next(ErlNifEnv *env, int argc,
if (!parser->copy_buffer(env, argv[1]))
return enif_make_badarg(env);

// Raise an exception when null character is found.
const char *data = reinterpret_cast<const char*>(&Parser::buffer[0]);
if (std::strlen(data) != Parser::buffer.size() - 1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it is correct, the problem is that std:strlen is an O(n) operation. Decoding xml is already O(n) so this would mean we're scanning the string twice. I'd prefer here introduce the check on the scan that is already happening, I introduced something similar here: #62 (I'm wondering if this bug now is somehow related to those last changes of mine 🤔 )

Copy link

@NelsonVides NelsonVides left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, this seems more optimal, specially that in the most common case result.rest will be empty, so std::strlen will be a noop, perfect 👌🏽

@NelsonVides NelsonVides merged commit 7921d7e into master Nov 14, 2022
@NelsonVides NelsonVides deleted the fix-null branch November 14, 2022 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants