Description
Issue
This is the public side of soon-public (access protected) oss-fuzz Expat finding:
Issue 66812: expat:xml_parse_fuzzer_UTF-16: Timeout in xml_parse_fuzzer_UTF-16.
The three key (access protected) contained links are:
- Detailed Report: https://oss-fuzz.com/testcase?key=5187173185814528
- Regressed: https://oss-fuzz.com/revisions?job=libfuzzer_asan_expat&range=202401100604:202401110602
- Reproducer Testcase: https://oss-fuzz.com/download?testcase_id=5187173185814528
The reproducer (original attached as file clusterfuzz-testcase-minimized-xml_parse_fuzzer_UTF-16-5187173185814528-timeout-original.xml.txt) is essentially this two-file case:
File one.xml:
<!DOCTYPE doc SYSTEM "two.dtd">
<doc>&g1;</doc>File two.dtd:
<!ENTITY % p1 '%p1'>
<!ENTITY g1 '%p1;'>The original's SHA245 sum is 6b870c78cff9efe41f1060277f434da98b9f78e9159baf4cb95f034e890ac087.
The regression link effectively links to be47f6d...716fd10 (which makes good sense since these changes increase fuzzing coverage).
Analysis
ClusterFuzz uncovered two things at once here:
- Direct recursion (
a -> ain contrast to indirect recursiona -> b -> a) of parameter entities (reference syntax%name;) was previously not detected in the external subset — filetwo.dtdin the example above) — but it is forbidden by the XML spec and was also causing undefined behavior at runtime.Pull request Reject direct parameter entity recursion (part of #839) #841 addresses that problem.# ./fuzz/xml_parse_fuzzer_UTF-8 [..]/clusterfuzz-testcase-minimized-xml_parse_fuzzer_UTF-16-5187173185814528-timeout-original.xml.txt INFO: Running with entropic power schedule (0xFF, 100). INFO: Seed: 3441536382 INFO: Loaded 1 modules (21351 inline 8-bit counters): 21351 [0x558df02c6000, 0x558df02cb367), INFO: Loaded 1 PC tables (21351 PCs): 21351 [0x558df02cb368,0x558df031e9d8), ./fuzz/xml_parse_fuzzer_UTF-8: Running 1 inputs 1 time(s) each. Running: [..]/clusterfuzz-testcase-minimized-xml_parse_fuzzer_UTF-16-5187173185814528-timeout-original.xml.txt [..]/expat/lib/xmlparse.c:6273:46: runtime error: applying zero offset to null pointer SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior [..]/expat/lib/xmlparse.c:6273:46 in
- The way that the fuzzing code uses external parsers (created via function
XML_ExternalEntityParserCreate) caused a timeout revealing that due to the lack of any direct input bytes from the parent parser, the amplification ratio calculated byaccountingGetCurrentAmplificationwas constantly reported as 1.0 and hence had little chance of stopping billion laughs attacks in practice. Pull request [CVE-2024-28757] Prevent billion laughs attacks in isolated external parser (part of #839) #842 addresses that problem.
Two related side notes:
-
ClusterFuzz leaked part of this finding — the recursion aspect — to the public corpus a few days ago as case
f9b6ba558667913f4554395e039c01f6d8217b43that later disappeared from the public corpus again. -
Statement…
If the same entity is declared more than once, the first declaration encountered is binding;
…in section 4.2 Entity Declarations of the XML spec is worth noting, emphasis mine.