Skip to content

AddByte allocAmt overflows for large input files #761

@destroyhimmyrobots

Description

@destroyhimmyrobots

Given a 2.8 gigabyte input file, the function AddByte (https://github.com/htacg/tidy-html5/blob/next/src/lexer.c#L949) will enter an infinite loop when called from prvTidyAddCharToLexer on both modern Linux & Darwin systems.

This is likely a result of the allocation strategy (multiplying by 2) and because the uint type used to define the allocAmt variable is an unsigned 32-bit integer on these systems. For example, the sys/types.h header on one system defines that type as unsigned int: https://github.com/apple/darwin-xnu/blob/master/bsd/sys/types.h#L92

The initial lexer state when the problem surfaces looks like this:

lexer->lexsize = 2147483646
lexer->lexlength = 2147483648
allocAmt = 0

Eventually, in my debugger it shows the value of allocAmt wrapping to 0 after reaching

allocAmt = 2147483648

at https://github.com/htacg/tidy-html5/blob/next/src/lexer.c#L955 when trying to increase the buffer by one more factor of two. The result overflows uint32 by one.

One solution may be to make the allocAmt a 64-bit integer type.

I searched for some alternative APIs in http://api.html-tidy.org/tidy/tidylib_api_5.6.0/group__IO.html, but it is not clear if these would solve this overflow issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions