[CVE-2022-25315] lib: Prevent integer overflow in storeRawNames #559
+6
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
It is possible to use an integer overflow in storeRawNames for out of
boundary heap writes. Default configuration is affected. If compiled
with XML_UNICODE then the attack does not work. Compiling with
-fsanitize=address confirms the following proof of concept.
The problem can be exploited by abusing the m_buffer expansion logic.
Even though the initial size of m_buffer is a power of two, eventually
it can end up a little bit lower, thus allowing allocations very close
to INT_MAX (since INT_MAX/2 can be surpassed). This means that tag
names can be parsed which are almost INT_MAX in size.
Unfortunately (from an attacker point of view) INT_MAX/2 is also a
limitation in string pools. Having a tag name of INT_MAX/2 characters
or more is not possible.
Expat can convert between different encodings. UTF-16 documents which
contain only ASCII representable characters are twice as large as their
ASCII encoded counter-parts.
The proof of concept works by taking these three considerations into
account:
short root node . This allows the m_buffer to grow very close
to INT_MAX.
INT_MAX/2, so keep the attack tag name smaller than that.
limited at INT_MAX/2-1 (nul byte) we use UTF-16 encoding and a tag
which only contains ASCII characters. UTF-16 always stores two
bytes per character while the tag name is converted to using only
one. Our attack node byte count must be a bit higher than
2/3 INT_MAX so the converted tag name is around INT_MAX/3 which
in sum can overflow INT_MAX.
Thanks to our small root node, m_buffer can handle 2/3 INT_MAX bytes
without running into INT_MAX boundary check. The string pooling is
able to store INT_MAX/3 as tag name because the amount is below
INT_MAX/2 limitation. And creating the sum of both eventually overflows
in storeRawNames.
Proof of Concept:
Compile expat with -fsanitize=address.
Create Proof of Concept binary which iterates through input
file 16 MB at once for better performance and easier integer
calculations: