Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fuzz: improve coverage #797

Merged
merged 1 commit into from Jan 10, 2024
Merged

fuzz: improve coverage #797

merged 1 commit into from Jan 10, 2024

Conversation

catenacyber
Copy link
Contributor

Using other parsers comes from https://introspector.oss-fuzz.com/project-profile?project=expat

Another way to achieve so would be to create multiple targets based on the parser

No bugs found in a few minutes from oss-fuzz public corpus, but coverage grows and reaches functions like doIgnoreSection

Copy link
Member

@hartwork hartwork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @catenacyber,

I'll just do a braindump for a quicker reply:

  • I get and like the general idea 👍
  • 🔴 clang-format CI doesn't like the formatting. Script apply-clang-format.sh can help.
  • 🔴 This is now creating four parsers instead of one but still only freeing a single one. It likely needs three more calls to XML_parserFree but it is freeing five times. Let's no free inside ParseOneInput but in LLVMFuzzerTestOneInput and do it matching four times, please.
  • It is unclear to me how this approach has more chances of reaching doIgnoreSection than status quo. Has that really changed or is it luck in fuzzing entropy?
  • The parser variable names are not ideal.
  • Pull request Fuzzer #617 is related, needs work, haven't gotten to taking it over, yet.
  • Issue Suggest fuzzer for XML_Parse() and XML_GetCurrentLineNumber(). #660 is related, unclear.
  • (We have similar code in fuzz/xml_parsebuffer_fuzzer.c that could benefinit from the same approach, but let's only duplicate things once we have one side right, to not waste your time.)
  • (I'll need to re-check if CI even builds (not runs) the fuzzing code as of now. If not, that's something I should make the CI do.)

Best, Sebastian

@catenacyber
Copy link
Contributor Author

Thanks for the quick feedback Sebastian

  • I get and like the general idea 👍

Cool :-)

Trying git clang-format HEAD~

  • 🔴 This is now creating four parsers instead of one but still only freeing a single one. It likely needs three more calls to XML_parserFree but it is freeing five times. Let's no free inside ParseOneInput but in LLVMFuzzerTestOneInput and do it matching four times, please.

Ok for moving XML_parserFree calls to LLVMFuzzerTestOneInput

  • It is unclear to me how this approach has more chances of reaching doIgnoreSection than status quo. Has that really changed or is it luck in fuzzing entropy?

I do not know this doIgnoreSection function but

#22862	NEW    cov: 5160 ft: 22618 corp: 10661/39Mb lim: 1048576 exec/s: 170 rss: 804Mb L: 8/1048576 MS: 1 ChangeByte-
	NEW_FUNC[1/4]: 0x58da90 in doIgnoreSection /src/expat/expat/lib/xmlparse.c:4194
	NEW_FUNC[2/4]: 0x58e680 in ignoreSectionProcessor /src/expat/expat/lib/xmlparse.c:4176
  • The parser variable names are not ideal.

Renaming pparent to entParser as used in tests code, and pparent2 to ext_parser
Any other suggestions for naming ?

  • Pull request Fuzzer #617 is related, needs work, haven't gotten to taking it over, yet.

This looks a good idea to test API completeness. You can check https://github.com/catenacyber/ngolo-fuzzing that does a similar thing against golang.
And also good for testing the callbacks...
For fail_allocations, you can try https://github.com/catenacyber/nallocfuzz ;-)
But I would say it is beyond the scope of this PR. Shame it is stale...

The only thing I see there is calling XML_GetCurrentLineNumber and XML_ErrorString
I do not think there are many bugs to find in these...

  • (We have similar code in fuzz/xml_parsebuffer_fuzzer.c that could benefinit from the same approach, but let's only duplicate things once we have one side right, to not waste your time.)

Ok, let me know if I need to change there as well...

  • (I'll need to re-check if CI even builds (not runs) the fuzzing code as of now. If not, that's something I should make the CI do.)

Should I add CIFuzz ? cf https://google.github.io/oss-fuzz/getting-started/continuous-integration/ as easy as adding a cifuzz.yml file in .github repo like https://github.com/OISF/libhtp/pull/374/files (just changing to use expat oss-fuzz project)

@catenacyber
Copy link
Contributor Author

I pushed additional commits to fix each comment.
Let me know if I should squash them all together

Copy link
Member

@hartwork hartwork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying git clang-format HEAD~

@catenacyber that's an interesting pointer, thanks!

  • This is now creating four parsers instead of one but still only freeing a single one. It likely needs three more calls to XML_parserFree but it is freeing five times. Let's no free inside ParseOneInput but in LLVMFuzzerTestOneInput and do it matching four times, please.

Ok for moving XML_parserFree calls to LLVMFuzzerTestOneInput

Thanks!

  • The parser variable names are not ideal.

Renaming pparent to entParser as used in tests code, and pparent2 to ext_parser Any other suggestions for naming ?

The the name is still quite inconsistent, e.g. one uses snake case and the other camel case. I'd be good with weither of these two…

  • parentParser, namespaceParser, externalEntityParser, externalDtdParser
  • p1, p2, p3, p4

…or some consistent variation of the former. Would that work for you?

  • Pull request Fuzzer #617 is related, needs work, haven't gotten to taking it over, yet.

This looks a good idea to test API completeness. You can check https://github.com/catenacyber/ngolo-fuzzing that does a similar thing against golang. And also good for testing the callbacks... For fail_allocations, you can try https://github.com/catenacyber/nallocfuzz ;-) But I would say it is beyond the scope of this PR. Shame it is stale...

I have plans to finish it myself but the build system part needs a re-write and there are more pressing priorities before that.
Thanks for the two links.

The only thing I see there is calling XML_GetCurrentLineNumber and XML_ErrorString I do not think there are many bugs to find in these...

XML_GetCurrentLineNumber does more than one would expect: issue #317 is an example of a security bug in it from the past. XML_ErrorString is trivial indeed.

  • (I'll need to re-check if CI even builds (not runs) the fuzzing code as of now. If not, that's something I should make the CI do.)

Should I add CIFuzz ? cf https://google.github.io/oss-fuzz/getting-started/continuous-integration/ as easy as adding a cifuzz.yml file in .github repo like https://github.com/OISF/libhtp/pull/374/files (just changing to use expat oss-fuzz project)

No, CIFuzz is quite broken, my decision against use of CIFuzz for Expat is documented at #677 (comment) . CIFuzz does not take this seriously, as can be seen at google/oss-fuzz#10504 . That's the "printable version" of how I feel about it.

Good to learn through https://github.com/libexpat/libexpat/actions/runs/7458547770/job/20306703388?pr=797#step:6:386 that the CI already does cover building that code.

I pushed additional commits to fix each comment. Let me know if I should squash them all together

Squashing would be welcome here, yes please 🙏

XML_Parse(p, (const XML_Char *)data, size, 1);
if (XML_Parse(p, (const XML_Char *)data, size, 1) == XML_STATUS_ERROR) {
XML_ErrorString(XML_GetErrorCode(p));
XML_GetCurrentLineNumber(parser);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parser would need to be p here to work.

Copy link
Member

@hartwork hartwork Jan 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS: Also, we should be able to call XML_GetCurrentLineNumber(p); in the non-error case also. Let's not limit it to the error case then, for more coverage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, fixing this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least, it showed CI tests the fuzzers build ;-)

Comment on lines 88 to 90
XML_ParserFree(p);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have the empty line before that special (parent parser) free, not after.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@catenacyber
Copy link
Contributor Author

  • parentParser, namespaceParser, externalEntityParser, externalDtdParser
    …or some consistent variation of the former. Would that work for you?

Looks good

  • Pull request Fuzzer #617 is related, needs work, haven't gotten to taking it over, yet.

This looks a good idea to test API completeness. You can check https://github.com/catenacyber/ngolo-fuzzing that does a similar thing against golang. And also good for testing the callbacks... For fail_allocations, you can try https://github.com/catenacyber/nallocfuzz ;-) But I would say it is beyond the scope of this PR. Shame it is stale...

I have plans to finish it myself but the build system part needs a re-write and there are more pressing priorities before that. Thanks for the two links.

Leaving to you then, and not in this PR

The only thing I see there is calling XML_GetCurrentLineNumber and XML_ErrorString I do not think there are many bugs to find in these...

XML_GetCurrentLineNumber does more than one would expect: issue #317 is an example of a security bug in it from the past.

Oh...

  • (I'll need to re-check if CI even builds (not runs) the fuzzing code as of now. If not, that's something I should make the CI do.)

Should I add CIFuzz ? cf https://google.github.io/oss-fuzz/getting-started/continuous-integration/ as easy as adding a cifuzz.yml file in .github repo like https://github.com/OISF/libhtp/pull/374/files (just changing to use expat oss-fuzz project)

No, CIFuzz is quite broken, my decision against use of CIFuzz for Expat is documented at #677 (comment) . CIFuzz does not take this seriously, as can be seen at google/oss-fuzz#10504 . That's the "printable version" of how I feel about it.

So you look like you know CIFuzz better than me...

Good to learn through https://github.com/libexpat/libexpat/actions/runs/7458547770/job/20306703388?pr=797#step:6:386 that the CI already does cover building that code.

I pushed additional commits to fix each comment. Let me know if I should squash them all together

Squashing would be welcome here, yes please 🙏

Done

Copy link
Member

@hartwork hartwork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@catenacyber thanks for the adjustments!

We can merge as-is (assuming the CI will be all-green soon) OR first get xml_parsebuffer_fuzzer.c closer together, as you like. I guess the ideall diff between the two would be nothing but one calling XML_ParseBuffer and the other XML_ParseBuffer?

For the current diff:

# diff -u fuzz/*.c 
--- fuzz/xml_parsebuffer_fuzzer.c       2022-02-21 15:46:39.960516443 +0100
+++ fuzz/xml_parse_fuzzer.c     2022-02-21 15:46:39.960516443 +0100
@@ -16,7 +16,6 @@
 
 #include <assert.h>
 #include <stdint.h>
-#include <string.h>
 
 #include "expat.h"
 #include "siphash.h"
@@ -50,22 +49,16 @@
 
 int
 LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
-  if (size == 0)
-    return 0;
-
   XML_Parser p = XML_ParserCreate(xstr(ENCODING_FOR_FUZZING));
   assert(p);
-  XML_SetElementHandler(p, start, end);
 
   // Set the hash salt using siphash to generate a deterministic hash.
   struct sipkey *key = sip_keyof(hash_key);
   XML_SetHashSalt(p, (unsigned long)siphash24(data, size, key));
 
-  void *buf = XML_GetBuffer(p, size);
-  assert(buf);
-
-  memcpy(buf, data, size);
-  XML_ParseBuffer(p, size, size == 0);
+  XML_SetElementHandler(p, start, end);
+  XML_Parse(p, (const XML_Char *)data, size, 0);
+  XML_Parse(p, (const XML_Char *)data, size, 1);
   XML_ParserFree(p);
   return 0;
 }

What do you think?

@catenacyber
Copy link
Contributor Author

Pushed the changes to the parse buffer version...

Comment on lines 63 to 65
XML_ParseBuffer(p, size, 0);
if (XML_ParseBuffer(p, size, 1) == XML_STATUS_ERROR) {
Copy link
Member

@hartwork hartwork Jan 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In between these two calls to XML_ParseBuffer we would need to add something like:

buf = XML_GetBuffer(p, size);
assert(buf);
memcpy(buf, data, size);

Because the second call to XML_ParseBuffer starts off after how far the first call got.

Also: Can we move the XML_SetElementHandler call right before the first XML_ParseBuffer to reduce the cross-file diff?

PS: For the cross-file diff, I did:

cd "$(mktemp -d)"
git clone --depth 1 --branch fuzzcov https://github.com/catenacyber/libexpat
diff -u libexpat/expat/fuzz/*.c

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done and pushed :-)

@hartwork
Copy link
Member

hartwork commented Jan 9, 2024

Pushed the changes to the parse buffer version...

@catenacyber thanks! 👍

Copy link
Member

@hartwork hartwork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done and pushed :-)

@catenacyber thank you! 👍

I'm expecting CI to soon be all-green. In that case I will merge.

@hartwork hartwork linked an issue Jan 10, 2024 that may be closed by this pull request
@hartwork hartwork added this to the 2.6.0 milestone Jan 10, 2024
@hartwork hartwork merged commit 716fd10 into libexpat:master Jan 10, 2024
33 checks passed
@hartwork hartwork mentioned this pull request Jan 10, 2024
35 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Suggest fuzzer for XML_Parse() and XML_GetCurrentLineNumber().
2 participants