tidy heap-buffer-overflow #217

Closed
geoffmcl opened this Issue Jun 3, 2015 · 3 comments

Comments

Projects
None yet
2 participants
@geoffmcl
Contributor

geoffmcl commented Jun 3, 2015

Fernando has found and reported a heap-buffer-overflow directly to me, the text so far is attached below, and I have now been able to REPEAT the event, and am now searching for a fix.

I do not know anything about Valgrind, but luckily the Windows MSVC Debug configuration does the same. It inserts a 'deb_malloc' which allocates much more than the user request, and fills the leading and trailing memory with a marker, and returns the appropriate pointer to the user.

Leading is 0xfd, and trailing is 0xab, Then on 'deb_free', it checks the leading marker for buffer under-run, and the trailing marker for buffer over-run.

This is a view of the memory as allocated in this case

fd fd fd fd fd fd fd fd ab ab ab ab ab ab ab ab ab ab ab ab ab ab ...

The event happens in ParseValue, where the length is held in an int len, which gets to contain a -1! MINUS ONE!! It is parsing the anchor href in this strange string <a b=<a <?xm ?>b="c"G href="�"»

ParseValue then calls -
value = TY_(tmbstrndup)(doc->allocator, lexer->lexbuf + start, len);
to do the allocation...

Inexplicably tmpstrndup is decalared as -
tmbstr TY_(tmbstrndup)( TidyAllocator *allocator, ctmbstr str, uint len);

Notice the int has now become a uint, so can not test for less that zero... it does if (len > 0), but of course len is now 4294967295, but like for all such allocations tmbstrndup does -

    tmbstr cp = s = (tmbstr) TidyAlloc( allocator, 1+len );

Notice the plus 1, so it arrives at TidyAlloc with a ZERO!!!

Now it seems malloc does not mind a zero value, malloc(0), and dutifully returns a pointer!!!! But that is a pointer to what exactly?

Then tmbstrndup does the corruption, with -

while ( len-- > 0 && (*cp++ = *str++) ) /**/;

Of course ( len-- > 0 ) will be true until the 4294967295 expires ;=))

But thankfully the corruption stops when a 0 is reached in the lexer with (*cp++ = *str++). As indicated in this case it is storing the attribute "href", but that is 4+ bytes of corruption.

Then a final corruption, the reason for the plus 1 - *cp = 0;

This is a view of the memory after -

fd fd fd fd 68 72 65 66 00 00 ab ab ab ab ab ab ab ab

Note in this case, since the loop did not end when len became zero, so we end up with a double 0! But I get a big popup dialog when this memory is passed to 'deb_free'... advising me of possible heap corruption...

So that is a detailed desciption of the problem. Now to decide what would be the best fix! Ideas...

Since tmbstrndup does check is len > 0, could change the uint to an int, and then it would return a NULL memory. Probably not a good thing in all case...

I am still trying to understand exactly why the int len back in ParseValue went negative!

Will work on this... As indicated any ideas welcome...

And again thanks to Fernando for finding this case... What follows is our direct conversation to date...

Hey Geoff,

I believe I tried both versions, the old (which is the one that Debian
stable ships) and the new with HTML5 support from github.

Did you try to run it under valgrind? Valgrind reports the errors but
the app won't segfault.

I can give it a try later again and tell you the exact version I used.

Thanks

On 6/1/15 5:28 AM, Geoff McLane wrote:

Hi Fernando,

Can you show me what version of tidy you are
using?

$ tidy-asan -v

From the dump you provided, this shows
libtidy-0.99.so, which is now a *** VERY ***
OLD version ;=((.

I tested your sample html -

$ printf
"\x3c\x61\x20\x62\x3d\x3c\x61\x20\x3c\x3f\x78\x6d\x0d\x3f\x3e\x62\x3d\x22\x63\x22\x47\x20\x68\x72\x65\x66\x3d\x22\x12\x22\xbb"

with the latest tidy5 version 4.9.30 built from
this source -

https://github.com/htacg/tidy-html5

and had no problem in Ubuntu 14.04 linux, 64-bit, nor
in Windows 7 64-bits... many runs...

I tried (remember tidy5 defaults to UTF-8 if no other
indication) -

$ tidy5 err.html

Got 10 warnings, including invalid UTF-8 bytes (char.
code U+00BB)

$ tidy5 --show-body-only yes err.html

This reduces the warnings to 7 for this html snippet,
still with invalid UTF-8

Even tried -

$ tidy5 --show-body-only yes -big5 err.html

The warning reduce to 6, dropping the invalid
UTF-8 warning...

BTW, all tidy builds default to adding -

-DSUPPORT_ASIAN_ENCODING=1

Of course I can do nothing about earlier
versions, except urge an upgrade. libtidy-0.99.so
is circa 2008/2009...

We have had some contact with some Debian
maintainers, and hopefully they are in the
process of upgrading tidy...

We also provide some binary releases -
http://www.htacg.org/binaries/
however still to get around to adding 4.9.30

Also note the binary in these is named 'tidy5'
to avoid overwriting any existing 'tidy'. This
will change when the 5.0.0 is released...

But if the problem persists in the latest 4.9.30
(or later), then please file an Issue here -

https://github.com/htacg/tidy-html5/issues

Also add any configuration used, including any
TIDY_CONIFG_FILE if defined...

Thank you for reporting...

Regards,
Geoff.

On 01/06/15 10:10, Fernando Muñoz wrote:

Hello,

I contacted Debian about the issue on May 17, so far I have not
received a response about a CVE assignment.

Thanks!

---------- Forwarded message ----------
From: Fernando Muñoz fernando@null-life.com
Date: Sun, May 17, 2015 at 8:11 PM
Subject: tidy heap-buffer-overflow
To: security@debian.org

Package: tidy

tidy is affected by a write out of bounds when processing malformed
html files.

$ printf
"\x3c\x61\x20\x62\x3d\x3c\x61\x20\x3c\x3f\x78\x6d\x0d\x3f\x3e\x62\x3d\x22\x63\x22\x47\x20\x68\x72\x65\x66\x3d\x22\x12\x22\xbb"

err.html
An asan-enabled build of tidy outputs:

$ tidy-asan err.html

==2196==ERROR: AddressSanitizer: heap-buffer-overflow on address
0xb53006b1 at pc 0xb71df8fe bp 0xbfac9928 sp 0xbfac9918
WRITE of size 1 at 0xb53006b1 thread T0
#0 0xb71df8fd in prvTidytmbstrndup
(/usr/lib/libtidy-0.99.so.0+0x15c8fd)
#1 0xb7141060 in prvTidyGetToken
(/usr/lib/libtidy-0.99.so.0+0xbe060)
#2 0xb711856e in prvTidyParseDocument
(/usr/lib/libtidy-0.99.so.0+0x9556e)
#3 0xb71f2a58 in prvTidyDocParseStream
(/usr/lib/libtidy-0.99.so.0+0x16fa58)
#4 0xb71f34a5 in tidyParseFile (/usr/lib/libtidy-0.99.so.0+0x1704a5)
#5 0x804bfa9 (/usr/bin/tidy+0x804bfa9)
#6 0xb6edf72d in __libc_start_main
(/lib/i386-linux-gnu/libc.so.6+0x1872d)
#7 0x804fa4e (/usr/bin/tidy+0x804fa4e)

0xb53006b1 is located 0 bytes to the right of 1-byte region
[0xb53006b0,0xb53006b1)
allocated by thread T0 here:
#0 0xb72af18c in __interceptor_malloc
(/usr/lib/i386-linux-gnu/libasan.so.1+0x5118c)
#1 0xb71c5963 (/usr/lib/libtidy-0.99.so.0+0x142963)
...

Valgrind with the standard build:

$ valgrind tidy err.html
...
==30499== Invalid write of size 1
==30499== at 0x408805C: prvTidytmbstrndup (tmbstr.c:39)
==30499== by 0x40738A8: ParseValue (lexer.c:3486)
...

==30499== Invalid write of size 1
==30499== at 0x4088065: prvTidytmbstrndup (tmbstr.c:41)
==30499== by 0x40738A8: ParseValue (lexer.c:3486)
==30499== by 0x4075F39: ParseAttrs (lexer.c:3603)
==30499== by 0x4075F39: GetTokenFromStream (lexer.c:2416)

...
file: tmbstr.c

39 while ( len-- > 0 && (cp++ = *str++) )
40 /
*/;
41 *cp = 0;

Credit: Fernando Muñoz

@geoffmcl geoffmcl added the Bug label Jun 3, 2015

@geoffmcl geoffmcl added this to the 5.0.0 milestone Jun 3, 2015

@geoffmcl geoffmcl self-assigned this Jun 3, 2015

geoffmcl added a commit that referenced this issue Jun 3, 2015

Add some mem alloc and free debug to chase Issue #217
Such debug is OFF by default, and only added by defining DEBUG_MEMORY. And
is only available for the Debug configuration compiled with MSVC, but this
could be easily extended...

geoffmcl added a commit that referenced this issue Jun 3, 2015

@geoffmcl

This comment has been minimized.

Show comment
Hide comment
@geoffmcl

geoffmcl Jun 3, 2015

Contributor

Ok, found the problem and what seems like a suitable solution...

This is a case where we effectively have href="", although here we have ="\x12", but the ReadChar() function will drop all control chars...

So since there is no attribute value, len will be zero, which is ok... it is the truth - there is no attribute value length...

Now the code has if (len > 0 || delim). Well it is delimited, "", so this block is entered, and it is in here that len can be reduced...

So added to the code in this block -

    /* Issue #217 - Also only if/while (len > 0) - MUST NEVER GO NEGATIVE! */
    if ((len > 0) && munge &&

And even more, if len does have a value, then protect against len ever going negative with -

    while (TY_(IsWhite)(lexer->lexbuf[start+len-1]) && (len > 0))
       --len;

    while (TY_(IsWhite)(lexer->lexbuf[start]) && (start < len) && (len > 0))
    {
        ++start;
        --len;
    }

It was the first of these two that did the damage.

start=4, len=0, and just coincidently lexer->lexbuf[3] contained an 0xa, thus len was decremented to -1 ;=((.

That in itself is quite unique since there are only a few special cases where control chars are added to the lexer. But you guessed it, one is when parsing 'code', like <?xm..., to preserve the newlines in the 'code'...

Now when the code does value = TY_(tmbstrndup)(doc->allocator, lexer->lexbuf + start, len); if len == 0 it will return a NULL, which is exactly what the attribute value is, a blank!

In some cases this bug could exibit a different problem like parsing the snippet <a <?xm \0xd?> href="">.

Now the lexer buffer will contain 2, or more IsWhite() chars and len would be reduced to -2, or less, which means the malloc buffer allocation would be a giant 4,294,967,295 byte allocation, a value lots of OSes will reject...

And I can confirm this BUG exists in the 2008/9 libtidy.0.99.so last release, the sourceforge cvs tidy, which is still present in some distributions. Just the quite unique nature of using 'code' ending in spaces or a newline just before an attribute with a 'blank' value prevents it from being seens more often...

Interestingly, it is NOT present in TidyAug2000, the earliest tidy source I have, since (a) it did not have that additional len decrement, and (b) used an int in wstrndup, the predecessor of tmbstrndup, so the code while (len-- > 0 && (*p++ = *str++)); drops out on the first loop when len is 0, and thus just 1 byte has been allocated, so the final *p = '\0'; is ok...

Anyway, now no more buffer corruption, or massive allocs, for this latest tidy ;=)). I love crushing such ancient bugs...

Also bumped the version to 4.9.31 for this important fix.

Contributor

geoffmcl commented Jun 3, 2015

Ok, found the problem and what seems like a suitable solution...

This is a case where we effectively have href="", although here we have ="\x12", but the ReadChar() function will drop all control chars...

So since there is no attribute value, len will be zero, which is ok... it is the truth - there is no attribute value length...

Now the code has if (len > 0 || delim). Well it is delimited, "", so this block is entered, and it is in here that len can be reduced...

So added to the code in this block -

    /* Issue #217 - Also only if/while (len > 0) - MUST NEVER GO NEGATIVE! */
    if ((len > 0) && munge &&

And even more, if len does have a value, then protect against len ever going negative with -

    while (TY_(IsWhite)(lexer->lexbuf[start+len-1]) && (len > 0))
       --len;

    while (TY_(IsWhite)(lexer->lexbuf[start]) && (start < len) && (len > 0))
    {
        ++start;
        --len;
    }

It was the first of these two that did the damage.

start=4, len=0, and just coincidently lexer->lexbuf[3] contained an 0xa, thus len was decremented to -1 ;=((.

That in itself is quite unique since there are only a few special cases where control chars are added to the lexer. But you guessed it, one is when parsing 'code', like <?xm..., to preserve the newlines in the 'code'...

Now when the code does value = TY_(tmbstrndup)(doc->allocator, lexer->lexbuf + start, len); if len == 0 it will return a NULL, which is exactly what the attribute value is, a blank!

In some cases this bug could exibit a different problem like parsing the snippet <a <?xm \0xd?> href="">.

Now the lexer buffer will contain 2, or more IsWhite() chars and len would be reduced to -2, or less, which means the malloc buffer allocation would be a giant 4,294,967,295 byte allocation, a value lots of OSes will reject...

And I can confirm this BUG exists in the 2008/9 libtidy.0.99.so last release, the sourceforge cvs tidy, which is still present in some distributions. Just the quite unique nature of using 'code' ending in spaces or a newline just before an attribute with a 'blank' value prevents it from being seens more often...

Interestingly, it is NOT present in TidyAug2000, the earliest tidy source I have, since (a) it did not have that additional len decrement, and (b) used an int in wstrndup, the predecessor of tmbstrndup, so the code while (len-- > 0 && (*p++ = *str++)); drops out on the first loop when len is 0, and thus just 1 byte has been allocated, so the final *p = '\0'; is ok...

Anyway, now no more buffer corruption, or massive allocs, for this latest tidy ;=)). I love crushing such ancient bugs...

Also bumped the version to 4.9.31 for this important fix.

uqs pushed a commit to freebsd/freebsd-ports that referenced this issue Jun 8, 2015

thierry
Upgrade to 5.9.32.
This fixes a security problem (heap-buffer-overflow):
see htacg/tidy-html5#217

PR:		ports/200631
Submitted by:	Walter Hop
Security:	htacg/tidy-html5#217


git-svn-id: svn+ssh://svn.freebsd.org/ports/head@388845 35697150-7ecd-e111-bb59-0022644237b5

uqs pushed a commit to freebsd/freebsd-ports that referenced this issue Jun 8, 2015

Upgrade to 5.9.32.
This fixes a security problem (heap-buffer-overflow):
see htacg/tidy-html5#217

PR:		ports/200631
Submitted by:	Walter Hop
Security:	htacg/tidy-html5#217
@geoffmcl

This comment has been minimized.

Show comment
Hide comment
@geoffmcl

geoffmcl Jun 21, 2015

Contributor

Since no further comment for a few weeks will close this now...

Please feel free to re-open, or open another issues...

Contributor

geoffmcl commented Jun 21, 2015

Since no further comment for a few weeks will close this now...

Please feel free to re-open, or open another issues...

@geoffmcl geoffmcl closed this Jun 21, 2015

bluerise pushed a commit to bitrig/bitrig-ports that referenced this issue Jun 25, 2015

Imported From OpenBSD
add a patch for htacg/tidy-html5#217,
a heap-buffer overflow which also affects tidyp

Written by: Stuart Henderson <sthen@openbsd.org>

jcvernaleo added a commit to bitrig/bitrig-ports that referenced this issue Jul 22, 2015

add a patch for htacg/tidy-html5#217,
a heap-buffer overflow which also affects tidyp

Written by: Stuart Henderson <sthen@openbsd.org>
@vielmetti

This comment has been minimized.

Show comment
Hide comment
@vielmetti

vielmetti Sep 13, 2015

Contributor

A note that this is CVE-2015-5522 and CVE-2015-5523 and that Ubuntu has fixed this particular bug in their version https://launchpad.net/ubuntu/+source/tidy/20091223cvs-1.5 in 'wily'.

(Adding the CVE numbers here to make it clear when searching for that CVE that the bug has been fixed.)

Contributor

vielmetti commented Sep 13, 2015

A note that this is CVE-2015-5522 and CVE-2015-5523 and that Ubuntu has fixed this particular bug in their version https://launchpad.net/ubuntu/+source/tidy/20091223cvs-1.5 in 'wily'.

(Adding the CVE numbers here to make it clear when searching for that CVE that the bug has been fixed.)

hakrdinesh pushed a commit to hakrtech/openbsd-ports0-test that referenced this issue Jan 16, 2018

add a patch for htacg/tidy-html5#217,
a heap-buffer overflow which also affects tidyp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment