tests: Migrate more tests to variable chunk size parsing #787

hartwork · 2023-11-07T14:42:57Z

Follow up to #755 in some sense

Snild-Sony

LGTM. If the tests pass, go for it.

hartwork · 2023-11-07T14:59:14Z

@Snild-Sony a few are left (that would need more work) and a fix like bae3069 just turned out needed to plug undefined behavior. My goal was to go as far as cheaply possible for the moment.

Snild-Sony

Minor comment; the current version is also fine.

Snild-Sony · 2023-11-07T15:10:59Z

expat/tests/common.c

@@ -185,14 +185,16 @@ _XML_Parse_SINGLE_BYTES(XML_Parser parser, const char *s, int len,
  if (chunksize > 0) {
    // parse in chunks of `chunksize` bytes as long as possible
    for (; offset + chunksize < len; offset += chunksize) {
-      enum XML_Status res = XML_Parse(parser, s + offset, chunksize, XML_FALSE);
+      const char *const start = (s == NULL) ? NULL : (s + offset);
+      enum XML_Status res = XML_Parse(parser, start, chunksize, XML_FALSE);


It might be simpler to just move s forward after each parse: s += chunksize.

@Snild-Sony addressed in more detail at 7f0120d#r131929667 , adjusted.

Snild-Sony · 2023-11-08T07:29:21Z

expat/tests/common.c

-    for (; offset + chunksize < len; offset += chunksize) {
-      enum XML_Status res = XML_Parse(parser, s + offset, chunksize, XML_FALSE);
+    for (; chunksize <= len;
+         len -= chunksize, s = (s != NULL) ? (s + chunksize) : NULL) {


The NULL condition is unnecessary: chunksize is known to be non-zero here (we're only concerned about NULL+0, right?).

(also, len is known to be non-zero, and if someone sends in s=NULL and len>0, they deserve the UB error.)

The NULL condition is unnecessary: chunksize is known to be non-zero here (we're only concerned about NULL+0, right?).

@Snild-Sony my patch was done with the deliberate "requirement" (or assumption) in mind to make _XML_Parse_SINGLE_BYTES support the full interface of XML_Parse but with chunking support added — a drop-in replacement — so that eventually we can replace all calls to XML_Parse by _XML_Parse_SINGLE_BYTES except those two inside _XML_Parse_SINGLE_BYTES. The tests in test_empty_parse below are the ones that challenge that requirement and triggered undefined behavior:

libexpat/expat/tests/basic_tests.c

Lines 2730 to 2758 in 3a0c5d6

START_TEST(test_empty_parse) {

const char *text = "<doc></doc>";

const char *partial = "<doc>";

if (XML_Parse(g_parser, NULL, 0, XML_FALSE) == XML_STATUS_ERROR)

fail("Parsing empty string faulted");

if (XML_Parse(g_parser, NULL, 0, XML_TRUE) != XML_STATUS_ERROR)

fail("Parsing final empty string not faulted");

if (XML_GetErrorCode(g_parser) != XML_ERROR_NO_ELEMENTS)

fail("Parsing final empty string faulted for wrong reason");

/* Now try with valid text before the empty end */

XML_ParserReset(g_parser, NULL);

if (_XML_Parse_SINGLE_BYTES(g_parser, text, (int)strlen(text), XML_FALSE)

== XML_STATUS_ERROR)

xml_failure(g_parser);

if (XML_Parse(g_parser, NULL, 0, XML_TRUE) == XML_STATUS_ERROR)

fail("Parsing final empty string faulted");

/* Now try with invalid text before the empty end */

XML_ParserReset(g_parser, NULL);

if (_XML_Parse_SINGLE_BYTES(g_parser, partial, (int)strlen(partial),

XML_FALSE)

== XML_STATUS_ERROR)

xml_failure(g_parser);

if (XML_Parse(g_parser, NULL, 0, XML_TRUE) != XML_STATUS_ERROR)

fail("Parsing final incomplete empty string not faulted");

}

END_TEST

And these include NULL. Of course we could exclude those cases but I would vote for making _XML_Parse_SINGLE_BYTES robust enough to handle all and then we don't need to check again for leftovers or document why particular calls are exempt. But maybe that's just my preference.

If you only concern left is to resolve (s != NULL) ? x : y to x then I would want to vote to leave that part in and go forward. One alternative would be to revert the whole patch to _XML_Parse_SINGLE_BYTES and make test test_empty_parse exercise plain XML_Parse. Both can be argued for.

What do you think?

The NULL condition is unnecessary: chunksize is known to be non-zero here (we're only concerned about NULL+0, right?).

@Snild-Sony my patch was done with the deliberate "requirement" (or assumption) in mind to make _XML_Parse_SINGLE_BYTES support the full interface of XML_Parse but with chunking support added — a drop-in replacement — so that eventually we can replace all calls to XML_Parse by _XML_Parse_SINGLE_BYTES

I was under the impression that we were only trying to avoid adding 0, since that was the specific error we had. My bad.

However, my other point about len=0 making sure we don't reach that code still stands:

The tests in test_empty_parse below are the ones that challenge that requirement and triggered undefined behavior: [...] And these include NULL.

But all of those also have len=0, which means the execution would not reach the line of code that we're discussing.

If you only concern left is to resolve (s != NULL) ? x : y to x

It is.

then I would want to vote to leave that part in and go forward.

Making that choice is your privilege and your burden as maintainer. My words hold no power here. :)

If you only concern left is to resolve (s != NULL) ? x : y to x

It is.

@Snild-Sony I think I may have found (and pushed) a compromise that gets rid of that and still doesn't do undefined behavior:
https://github.com/libexpat/libexpat/compare/421551302a37ddbdf7874c877070df2fa4d79567..6b5691bcf8b93668d8733aa3e99510be6e717a3b
Any better? What do you think?

then I would want to vote to leave that part in and go forward.

Making that choice is your privilege and your burden as maintainer. My words hold no power here. :)

Let's try with consensus first. Maybe we're already close to consensus now.

That part looks good now, thanks.

Snild-Sony · 2023-11-09T08:34:42Z

expat/tests/common.c

  if (chunksize > 0) {
    // parse in chunks of `chunksize` bytes as long as possible
-    for (; offset + chunksize < len; offset += chunksize) {
-      enum XML_Status res = XML_Parse(parser, s + offset, chunksize, XML_FALSE);
+    for (; chunksize <= len; len -= chunksize, s += chunksize) {


Sorry, I missed this previously:

Why chunksize <= len? It means the XML_Parse() at line 206 will have len=0 if the len arg is divisible by chunksize, essentially adding an extra XML_Parse() call that was not previously there.

Note that for chunksize 1, len will always be divisible, so we'll never make a len=1 isFinal=1 call, just some amount of len=1 isFinal=0 and the concluding len=0 isFinal=1.

It probably doesn't matter in practice for now, but could potentially help hide problems in the future.

@Snild-Sony nice catches, maybe that's the last drop needed to reveal this is the wrong approach altogether. Let's flip this around and make _XML_Parse_SINGLE_BYTES reject pathological input like s == NULL loudly so that the tests in test_empty_parse have to resort to plain XML_Parse and keep testing the raw actual XML_Parse API. Pushing in a second…

I just wanted it to be chunksize < len instead, then it would've been perfect... 😭

@Snild-Sony I have re-added a refactoring of _XML_Parse_SINGLE_BYTES using chunksize < len as a dedicated commit just for you now 😃 The assertion still blocks pathological input altogether.

@Snild-Sony PS: I should note that GitHub web UI is behind in displayed pushed changes at the moment. Branch https://github.com/libexpat/libexpat/commits/tests-more-chunk-size-coverage has the commit but the pull request does not show it 😱 PPS: As with all status pages, https://www.githubstatus.com/ is all green, lovely.

expat/tests/common.c

Co-authored-by: Snild Dolkow <snild@sony.com>

hartwork · 2023-11-13T14:21:01Z

@Snild-Sony thanks for the review!

hartwork added this to the 2.6.0 milestone Nov 7, 2023

hartwork added the enhancement label Nov 7, 2023

Snild-Sony approved these changes Nov 7, 2023

View reviewed changes

hartwork force-pushed the tests-more-chunk-size-coverage branch 2 times, most recently from 388c53c to d275fe5 Compare November 7, 2023 14:57

Snild-Sony reviewed Nov 7, 2023

View reviewed changes

hartwork force-pushed the tests-more-chunk-size-coverage branch from d275fe5 to 2b2b4ab Compare November 7, 2023 15:30

hartwork mentioned this pull request Nov 7, 2023

Prepare release 2.6.0 (part of #775) #776

Merged

35 tasks

Snild-Sony reviewed Nov 8, 2023

View reviewed changes

hartwork force-pushed the tests-more-chunk-size-coverage branch 2 times, most recently from 4215513 to 6b5691b Compare November 8, 2023 16:23

Snild-Sony reviewed Nov 9, 2023

View reviewed changes

hartwork force-pushed the tests-more-chunk-size-coverage branch from 6b5691b to 225fd93 Compare November 9, 2023 19:15

hartwork added 2 commits November 9, 2023 20:26

tests: Make _XML_Parse_SINGLE_BYTES deny use with pathological input

795cf99

tests: Migrate more tests to variable chunk size parsing

3e5f6d6

hartwork force-pushed the tests-more-chunk-size-coverage branch from 225fd93 to 3e5f6d6 Compare November 9, 2023 19:36

hartwork requested a review from Snild-Sony November 9, 2023 19:37

Snild-Sony reviewed Nov 10, 2023

View reviewed changes

expat/tests/common.c Show resolved Hide resolved

tests: Simplify _XML_Parse_SINGLE_BYTES

d2b3176

Co-authored-by: Snild Dolkow <snild@sony.com>

hartwork requested a review from Snild-Sony November 11, 2023 01:21

Snild-Sony approved these changes Nov 13, 2023

View reviewed changes

hartwork merged commit 3588720 into master Nov 13, 2023
64 checks passed

hartwork deleted the tests-more-chunk-size-coverage branch November 13, 2023 14:57

Snild-Sony mentioned this pull request Nov 15, 2023

[CVE-2023-52425] Speed up parsing of big tokens #789

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests: Migrate more tests to variable chunk size parsing #787

tests: Migrate more tests to variable chunk size parsing #787

hartwork commented Nov 7, 2023

Snild-Sony left a comment

hartwork commented Nov 7, 2023

Snild-Sony left a comment

Snild-Sony Nov 7, 2023

hartwork Nov 7, 2023

Snild-Sony Nov 8, 2023

hartwork Nov 8, 2023

Snild-Sony Nov 8, 2023

Snild-Sony Nov 8, 2023

hartwork Nov 8, 2023

Snild-Sony Nov 9, 2023

Snild-Sony Nov 9, 2023

hartwork Nov 9, 2023

Snild-Sony Nov 10, 2023

hartwork Nov 11, 2023 •

edited

hartwork Nov 11, 2023 •

edited

hartwork commented Nov 13, 2023

	START_TEST(test_empty_parse) {
	const char *text = "<doc></doc>";
	const char *partial = "<doc>";

	if (XML_Parse(g_parser, NULL, 0, XML_FALSE) == XML_STATUS_ERROR)
	fail("Parsing empty string faulted");
	if (XML_Parse(g_parser, NULL, 0, XML_TRUE) != XML_STATUS_ERROR)
	fail("Parsing final empty string not faulted");
	if (XML_GetErrorCode(g_parser) != XML_ERROR_NO_ELEMENTS)
	fail("Parsing final empty string faulted for wrong reason");

	/* Now try with valid text before the empty end */
	XML_ParserReset(g_parser, NULL);
	if (_XML_Parse_SINGLE_BYTES(g_parser, text, (int)strlen(text), XML_FALSE)
	== XML_STATUS_ERROR)
	xml_failure(g_parser);
	if (XML_Parse(g_parser, NULL, 0, XML_TRUE) == XML_STATUS_ERROR)
	fail("Parsing final empty string faulted");

	/* Now try with invalid text before the empty end */
	XML_ParserReset(g_parser, NULL);
	if (_XML_Parse_SINGLE_BYTES(g_parser, partial, (int)strlen(partial),
	XML_FALSE)
	== XML_STATUS_ERROR)
	xml_failure(g_parser);
	if (XML_Parse(g_parser, NULL, 0, XML_TRUE) != XML_STATUS_ERROR)
	fail("Parsing final incomplete empty string not faulted");
	}
	END_TEST

tests: Migrate more tests to variable chunk size parsing #787

tests: Migrate more tests to variable chunk size parsing #787

Conversation

hartwork commented Nov 7, 2023

Snild-Sony left a comment

Choose a reason for hiding this comment

hartwork commented Nov 7, 2023

Snild-Sony left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hartwork Nov 11, 2023 • edited

Choose a reason for hiding this comment

hartwork Nov 11, 2023 • edited

Choose a reason for hiding this comment

hartwork commented Nov 13, 2023

hartwork Nov 11, 2023 •

edited

hartwork Nov 11, 2023 •

edited