-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OSS-Fuzz/ClusterFuzz finding 66812 #839
Comments
...which made for a very confusing "regression" in CI runs for an innocent PR. :) |
Indeed. I had to disable the fuzzing regression testing workflow temporarily because of that. Would be great to understand if that leak was human- or machine-made. |
…arser When parsing DTD content with code like .. XML_Parser parser = XML_ParserCreate(NULL); XML_Parser ext_parser = XML_ExternalEntityParserCreate(parser, NULL, NULL); enum XML_Status status = XML_Parse(ext_parser, doc, (int)strlen(doc), XML_TRUE); .. there are 0 bytes accounted as direct input and all input from `doc` accounted as indirect input. Now function accountingGetCurrentAmplification cannot calculate the current amplification ratio as "(direct + indirect) / direct", and it did refuse to divide by 0 as one would expect, but it returned 1.0 for this case to indicate no amplification over direct input. As a result, billion laughs attacks from DTD-only input were not detected with this isolated way of using an external parser. The new appraoch is to assume direct input of length not 0 but 22 -- derived from ghost input "<!ENTITY a SYSTEM 'b'>", the shortest possible way to include an external DTD --, and do the usual "(direct + indirect) / direct" math with "direct := 22". GitHub issue #839 has more details on this issue and its origin in ClusterFuzz finding 66812.
…arser When parsing DTD content with code like .. XML_Parser parser = XML_ParserCreate(NULL); XML_Parser ext_parser = XML_ExternalEntityParserCreate(parser, NULL, NULL); enum XML_Status status = XML_Parse(ext_parser, doc, (int)strlen(doc), XML_TRUE); .. there are 0 bytes accounted as direct input and all input from `doc` accounted as indirect input. Now function accountingGetCurrentAmplification cannot calculate the current amplification ratio as "(direct + indirect) / direct", and it did refuse to divide by 0 as one would expect, but it returned 1.0 for this case to indicate no amplification over direct input. As a result, billion laughs attacks from DTD-only input were not detected with this isolated way of using an external parser. The new approach is to assume direct input of length not 0 but 22 -- derived from ghost input "<!ENTITY a SYSTEM 'b'>", the shortest possible way to include an external DTD --, and do the usual "(direct + indirect) / direct" math with "direct := 22". GitHub issue #839 has more details on this issue and its origin in ClusterFuzz finding 66812.
|
Thanks for the great report
I do not think any human intervened. |
…er-entity-recursion Reject direct parameter entity recursion (part of #839)
FYI I re-enabled that workflow after merge of #841 now. |
…ed-external-parser Prevent billion laughs attacks in isolated external parser (part of #839)
|
Sorry to bother you, I'm a bit confused about these two test cases. I can't see any difference between these two cases, why do they have different results? |
|
I believe this is what @hartwork was explaining with this quote above:
The full sentence makes it a little clearer:
So, the test added in #841 says:
|
|
Oh, I understand now. Thank you for your explanation. |
|
@Snild-Sony well said, thank you! |
Issue
This is the public side of soon-public (access protected) oss-fuzz Expat finding:
Issue 66812: expat:xml_parse_fuzzer_UTF-16: Timeout in xml_parse_fuzzer_UTF-16.
The three key (access protected) contained links are:
The reproducer (original attached as file clusterfuzz-testcase-minimized-xml_parse_fuzzer_UTF-16-5187173185814528-timeout-original.xml.txt) is essentially this two-file case:
File
one.xml:File
two.dtd:The original's SHA245 sum is
6b870c78cff9efe41f1060277f434da98b9f78e9159baf4cb95f034e890ac087.The regression link effectively links to be47f6d...716fd10 (which makes good sense since these changes increase fuzzing coverage).
Analysis
ClusterFuzz uncovered two things at once here:
a -> ain contrast to indirect recursiona -> b -> a) of parameter entities (reference syntax%name;) was previously not detected in the external subset — filetwo.dtdin the example above) — but it is forbidden by the XML spec and was also causing undefined behavior at runtime.XML_ExternalEntityParserCreate) caused a timeout revealing that due to the lack of any direct input bytes from the parent parser, the amplification ratio calculated byaccountingGetCurrentAmplificationwas constantly reported as 1.0 and hence had little chance of stopping billion laughs attacks in practice. Pull request [CVE-2024-28757] Prevent billion laughs attacks in isolated external parser (part of #839) #842 addresses that problem.Two related side notes:
ClusterFuzz leaked part of this finding — the recursion aspect — to the public corpus a few days ago as case
f9b6ba558667913f4554395e039c01f6d8217b43that later disappeared from the public corpus again.Statement…
…in section 4.2 Entity Declarations of the XML spec is worth noting, emphasis mine.
CC @catenacyber @RMJ10 @Snild-Sony
The text was updated successfully, but these errors were encountered: