Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: Migrate more tests to variable chunk size parsing #787

Merged
merged 3 commits into from
Nov 13, 2023

Conversation

hartwork
Copy link
Member

@hartwork hartwork commented Nov 7, 2023

Follow up to #755 in some sense

CC @Snild-Sony

@hartwork hartwork added this to the 2.6.0 milestone Nov 7, 2023
Copy link
Contributor

@Snild-Sony Snild-Sony left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. If the tests pass, go for it.

@hartwork hartwork force-pushed the tests-more-chunk-size-coverage branch 2 times, most recently from 388c53c to d275fe5 Compare November 7, 2023 14:57
@hartwork
Copy link
Member Author

hartwork commented Nov 7, 2023

@Snild-Sony a few are left (that would need more work) and a fix like bae3069 just turned out needed to plug undefined behavior. My goal was to go as far as cheaply possible for the moment.

Copy link
Contributor

@Snild-Sony Snild-Sony left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment; the current version is also fine.

@@ -185,14 +185,16 @@ _XML_Parse_SINGLE_BYTES(XML_Parser parser, const char *s, int len,
if (chunksize > 0) {
// parse in chunks of `chunksize` bytes as long as possible
for (; offset + chunksize < len; offset += chunksize) {
enum XML_Status res = XML_Parse(parser, s + offset, chunksize, XML_FALSE);
const char *const start = (s == NULL) ? NULL : (s + offset);
enum XML_Status res = XML_Parse(parser, start, chunksize, XML_FALSE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be simpler to just move s forward after each parse: s += chunksize.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Snild-Sony addressed in more detail at 7f0120d#r131929667 , adjusted.

@hartwork hartwork force-pushed the tests-more-chunk-size-coverage branch from d275fe5 to 2b2b4ab Compare November 7, 2023 15:30
@hartwork hartwork mentioned this pull request Nov 7, 2023
35 tasks
for (; offset + chunksize < len; offset += chunksize) {
enum XML_Status res = XML_Parse(parser, s + offset, chunksize, XML_FALSE);
for (; chunksize <= len;
len -= chunksize, s = (s != NULL) ? (s + chunksize) : NULL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NULL condition is unnecessary: chunksize is known to be non-zero here (we're only concerned about NULL+0, right?).

(also, len is known to be non-zero, and if someone sends in s=NULL and len>0, they deserve the UB error.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NULL condition is unnecessary: chunksize is known to be non-zero here (we're only concerned about NULL+0, right?).

@Snild-Sony my patch was done with the deliberate "requirement" (or assumption) in mind to make _XML_Parse_SINGLE_BYTES support the full interface of XML_Parse but with chunking support added — a drop-in replacement — so that eventually we can replace all calls to XML_Parse by _XML_Parse_SINGLE_BYTES except those two inside _XML_Parse_SINGLE_BYTES. The tests in test_empty_parse below are the ones that challenge that requirement and triggered undefined behavior:

START_TEST(test_empty_parse) {
const char *text = "<doc></doc>";
const char *partial = "<doc>";
if (XML_Parse(g_parser, NULL, 0, XML_FALSE) == XML_STATUS_ERROR)
fail("Parsing empty string faulted");
if (XML_Parse(g_parser, NULL, 0, XML_TRUE) != XML_STATUS_ERROR)
fail("Parsing final empty string not faulted");
if (XML_GetErrorCode(g_parser) != XML_ERROR_NO_ELEMENTS)
fail("Parsing final empty string faulted for wrong reason");
/* Now try with valid text before the empty end */
XML_ParserReset(g_parser, NULL);
if (_XML_Parse_SINGLE_BYTES(g_parser, text, (int)strlen(text), XML_FALSE)
== XML_STATUS_ERROR)
xml_failure(g_parser);
if (XML_Parse(g_parser, NULL, 0, XML_TRUE) == XML_STATUS_ERROR)
fail("Parsing final empty string faulted");
/* Now try with invalid text before the empty end */
XML_ParserReset(g_parser, NULL);
if (_XML_Parse_SINGLE_BYTES(g_parser, partial, (int)strlen(partial),
XML_FALSE)
== XML_STATUS_ERROR)
xml_failure(g_parser);
if (XML_Parse(g_parser, NULL, 0, XML_TRUE) != XML_STATUS_ERROR)
fail("Parsing final incomplete empty string not faulted");
}
END_TEST

And these include NULL. Of course we could exclude those cases but I would vote for making _XML_Parse_SINGLE_BYTES robust enough to handle all and then we don't need to check again for leftovers or document why particular calls are exempt. But maybe that's just my preference.

If you only concern left is to resolve (s != NULL) ? x : y to x then I would want to vote to leave that part in and go forward. One alternative would be to revert the whole patch to _XML_Parse_SINGLE_BYTES and make test test_empty_parse exercise plain XML_Parse. Both can be argued for.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NULL condition is unnecessary: chunksize is known to be non-zero here (we're only concerned about NULL+0, right?).

@Snild-Sony my patch was done with the deliberate "requirement" (or assumption) in mind to make _XML_Parse_SINGLE_BYTES support the full interface of XML_Parse but with chunking support added — a drop-in replacement — so that eventually we can replace all calls to XML_Parse by _XML_Parse_SINGLE_BYTES

I was under the impression that we were only trying to avoid adding 0, since that was the specific error we had. My bad.

However, my other point about len=0 making sure we don't reach that code still stands:

The tests in test_empty_parse below are the ones that challenge that requirement and triggered undefined behavior: [...] And these include NULL.

But all of those also have len=0, which means the execution would not reach the line of code that we're discussing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you only concern left is to resolve (s != NULL) ? x : y to x

It is.

then I would want to vote to leave that part in and go forward.

Making that choice is your privilege and your burden as maintainer. My words hold no power here. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you only concern left is to resolve (s != NULL) ? x : y to x

It is.

@Snild-Sony I think I may have found (and pushed) a compromise that gets rid of that and still doesn't do undefined behavior:
https://github.com/libexpat/libexpat/compare/421551302a37ddbdf7874c877070df2fa4d79567..6b5691bcf8b93668d8733aa3e99510be6e717a3b
Any better? What do you think?

then I would want to vote to leave that part in and go forward.

Making that choice is your privilege and your burden as maintainer. My words hold no power here. :)

Let's try with consensus first. Maybe we're already close to consensus now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That part looks good now, thanks.

@hartwork hartwork force-pushed the tests-more-chunk-size-coverage branch 2 times, most recently from 4215513 to 6b5691b Compare November 8, 2023 16:23
if (chunksize > 0) {
// parse in chunks of `chunksize` bytes as long as possible
for (; offset + chunksize < len; offset += chunksize) {
enum XML_Status res = XML_Parse(parser, s + offset, chunksize, XML_FALSE);
for (; chunksize <= len; len -= chunksize, s += chunksize) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed this previously:

Why chunksize <= len? It means the XML_Parse() at line 206 will have len=0 if the len arg is divisible by chunksize, essentially adding an extra XML_Parse() call that was not previously there.

Note that for chunksize 1, len will always be divisible, so we'll never make a len=1 isFinal=1 call, just some amount of len=1 isFinal=0 and the concluding len=0 isFinal=1.

It probably doesn't matter in practice for now, but could potentially help hide problems in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Snild-Sony nice catches, maybe that's the last drop needed to reveal this is the wrong approach altogether. Let's flip this around and make _XML_Parse_SINGLE_BYTES reject pathological input like s == NULL loudly so that the tests in test_empty_parse have to resort to plain XML_Parse and keep testing the raw actual XML_Parse API. Pushing in a second…

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted it to be chunksize < len instead, then it would've been perfect... 😭

Copy link
Member Author

@hartwork hartwork Nov 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Snild-Sony I have re-added a refactoring of _XML_Parse_SINGLE_BYTES using chunksize < len as a dedicated commit just for you now 😃 The assertion still blocks pathological input altogether.

Copy link
Member Author

@hartwork hartwork Nov 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Snild-Sony PS: I should note that GitHub web UI is behind in displayed pushed changes at the moment. Branch https://github.com/libexpat/libexpat/commits/tests-more-chunk-size-coverage has the commit but the pull request does not show it 😱 PPS: As with all status pages, https://www.githubstatus.com/ is all green, lovely.

expat/tests/common.c Outdated Show resolved Hide resolved
@hartwork hartwork force-pushed the tests-more-chunk-size-coverage branch from 6b5691b to 225fd93 Compare November 9, 2023 19:15
Co-authored-by: Snild Dolkow <snild@sony.com>
@hartwork
Copy link
Member Author

@Snild-Sony thanks for the review!

@hartwork hartwork merged commit 3588720 into master Nov 13, 2023
64 checks passed
@hartwork hartwork deleted the tests-more-chunk-size-coverage branch November 13, 2023 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants