-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests: Migrate more tests to variable chunk size parsing #787
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. If the tests pass, go for it.
388c53c
to
d275fe5
Compare
@Snild-Sony a few are left (that would need more work) and a fix like bae3069 just turned out needed to plug undefined behavior. My goal was to go as far as cheaply possible for the moment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment; the current version is also fine.
expat/tests/common.c
Outdated
@@ -185,14 +185,16 @@ _XML_Parse_SINGLE_BYTES(XML_Parser parser, const char *s, int len, | |||
if (chunksize > 0) { | |||
// parse in chunks of `chunksize` bytes as long as possible | |||
for (; offset + chunksize < len; offset += chunksize) { | |||
enum XML_Status res = XML_Parse(parser, s + offset, chunksize, XML_FALSE); | |||
const char *const start = (s == NULL) ? NULL : (s + offset); | |||
enum XML_Status res = XML_Parse(parser, start, chunksize, XML_FALSE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be simpler to just move s forward after each parse: s += chunksize
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Snild-Sony addressed in more detail at 7f0120d#r131929667 , adjusted.
d275fe5
to
2b2b4ab
Compare
expat/tests/common.c
Outdated
for (; offset + chunksize < len; offset += chunksize) { | ||
enum XML_Status res = XML_Parse(parser, s + offset, chunksize, XML_FALSE); | ||
for (; chunksize <= len; | ||
len -= chunksize, s = (s != NULL) ? (s + chunksize) : NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NULL condition is unnecessary: chunksize is known to be non-zero here (we're only concerned about NULL+0
, right?).
(also, len is known to be non-zero, and if someone sends in s=NULL
and len>0
, they deserve the UB error.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NULL condition is unnecessary: chunksize is known to be non-zero here (we're only concerned about
NULL+0
, right?).
@Snild-Sony my patch was done with the deliberate "requirement" (or assumption) in mind to make _XML_Parse_SINGLE_BYTES
support the full interface of XML_Parse
but with chunking support added — a drop-in replacement — so that eventually we can replace all calls to XML_Parse
by _XML_Parse_SINGLE_BYTES
except those two inside _XML_Parse_SINGLE_BYTES
. The tests in test_empty_parse
below are the ones that challenge that requirement and triggered undefined behavior:
libexpat/expat/tests/basic_tests.c
Lines 2730 to 2758 in 3a0c5d6
START_TEST(test_empty_parse) { | |
const char *text = "<doc></doc>"; | |
const char *partial = "<doc>"; | |
if (XML_Parse(g_parser, NULL, 0, XML_FALSE) == XML_STATUS_ERROR) | |
fail("Parsing empty string faulted"); | |
if (XML_Parse(g_parser, NULL, 0, XML_TRUE) != XML_STATUS_ERROR) | |
fail("Parsing final empty string not faulted"); | |
if (XML_GetErrorCode(g_parser) != XML_ERROR_NO_ELEMENTS) | |
fail("Parsing final empty string faulted for wrong reason"); | |
/* Now try with valid text before the empty end */ | |
XML_ParserReset(g_parser, NULL); | |
if (_XML_Parse_SINGLE_BYTES(g_parser, text, (int)strlen(text), XML_FALSE) | |
== XML_STATUS_ERROR) | |
xml_failure(g_parser); | |
if (XML_Parse(g_parser, NULL, 0, XML_TRUE) == XML_STATUS_ERROR) | |
fail("Parsing final empty string faulted"); | |
/* Now try with invalid text before the empty end */ | |
XML_ParserReset(g_parser, NULL); | |
if (_XML_Parse_SINGLE_BYTES(g_parser, partial, (int)strlen(partial), | |
XML_FALSE) | |
== XML_STATUS_ERROR) | |
xml_failure(g_parser); | |
if (XML_Parse(g_parser, NULL, 0, XML_TRUE) != XML_STATUS_ERROR) | |
fail("Parsing final incomplete empty string not faulted"); | |
} | |
END_TEST |
And these include NULL
. Of course we could exclude those cases but I would vote for making _XML_Parse_SINGLE_BYTES
robust enough to handle all and then we don't need to check again for leftovers or document why particular calls are exempt. But maybe that's just my preference.
If you only concern left is to resolve (s != NULL) ? x : y
to x
then I would want to vote to leave that part in and go forward. One alternative would be to revert the whole patch to _XML_Parse_SINGLE_BYTES
and make test test_empty_parse
exercise plain XML_Parse
. Both can be argued for.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NULL condition is unnecessary: chunksize is known to be non-zero here (we're only concerned about
NULL+0
, right?).@Snild-Sony my patch was done with the deliberate "requirement" (or assumption) in mind to make
_XML_Parse_SINGLE_BYTES
support the full interface ofXML_Parse
but with chunking support added — a drop-in replacement — so that eventually we can replace all calls toXML_Parse
by_XML_Parse_SINGLE_BYTES
I was under the impression that we were only trying to avoid adding 0
, since that was the specific error we had. My bad.
However, my other point about len=0 making sure we don't reach that code still stands:
The tests in test_empty_parse below are the ones that challenge that requirement and triggered undefined behavior: [...] And these include NULL.
But all of those also have len=0, which means the execution would not reach the line of code that we're discussing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you only concern left is to resolve
(s != NULL) ? x : y
tox
It is.
then I would want to vote to leave that part in and go forward.
Making that choice is your privilege and your burden as maintainer. My words hold no power here. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you only concern left is to resolve
(s != NULL) ? x : y
tox
It is.
@Snild-Sony I think I may have found (and pushed) a compromise that gets rid of that and still doesn't do undefined behavior:
https://github.com/libexpat/libexpat/compare/421551302a37ddbdf7874c877070df2fa4d79567..6b5691bcf8b93668d8733aa3e99510be6e717a3b
Any better? What do you think?
then I would want to vote to leave that part in and go forward.
Making that choice is your privilege and your burden as maintainer. My words hold no power here. :)
Let's try with consensus first. Maybe we're already close to consensus now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That part looks good now, thanks.
4215513
to
6b5691b
Compare
expat/tests/common.c
Outdated
if (chunksize > 0) { | ||
// parse in chunks of `chunksize` bytes as long as possible | ||
for (; offset + chunksize < len; offset += chunksize) { | ||
enum XML_Status res = XML_Parse(parser, s + offset, chunksize, XML_FALSE); | ||
for (; chunksize <= len; len -= chunksize, s += chunksize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I missed this previously:
Why chunksize <= len
? It means the XML_Parse()
at line 206 will have len=0 if the len
arg is divisible by chunksize
, essentially adding an extra XML_Parse()
call that was not previously there.
Note that for chunksize 1, len will always be divisible, so we'll never make a len=1 isFinal=1
call, just some amount of len=1 isFinal=0
and the concluding len=0 isFinal=1
.
It probably doesn't matter in practice for now, but could potentially help hide problems in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Snild-Sony nice catches, maybe that's the last drop needed to reveal this is the wrong approach altogether. Let's flip this around and make _XML_Parse_SINGLE_BYTES
reject pathological input like s == NULL
loudly so that the tests in test_empty_parse
have to resort to plain XML_Parse
and keep testing the raw actual XML_Parse
API. Pushing in a second…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just wanted it to be chunksize < len
instead, then it would've been perfect... 😭
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Snild-Sony I have re-added a refactoring of _XML_Parse_SINGLE_BYTES
using chunksize < len
as a dedicated commit just for you now 😃 The assertion still blocks pathological input altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Snild-Sony PS: I should note that GitHub web UI is behind in displayed pushed changes at the moment. Branch https://github.com/libexpat/libexpat/commits/tests-more-chunk-size-coverage has the commit but the pull request does not show it 😱 PPS: As with all status pages, https://www.githubstatus.com/ is all green, lovely.
6b5691b
to
225fd93
Compare
225fd93
to
3e5f6d6
Compare
Co-authored-by: Snild Dolkow <snild@sony.com>
@Snild-Sony thanks for the review! |
Follow up to #755 in some sense
CC @Snild-Sony