Skip to content
Permalink
Browse files Browse the repository at this point in the history
Fix crash in lexer refill (reported by Agostino Sarubbo).
The crash happened in a rare case of a very long lexeme that doen't fit
into the buffer, forcing buffer reallocation.

The crash was caused by an incorrect calculation of the shift offset
(it was smaller than necessary). As a consequence, the data from buffer
start and up to the beginning of the current lexeme was not discarded
(as it should have been), resulting in less free space for new data than
expected.
  • Loading branch information
skvadrik committed Apr 17, 2020
1 parent 187fff3 commit c4603ba
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion src/parse/scanner.cc
Expand Up @@ -155,13 +155,14 @@ bool Scanner::fill(size_t need)
if (!buf) fatal("out of memory");

memmove(buf, tok, copy);
shift_ptrs_and_fpos(buf - bot);
shift_ptrs_and_fpos(buf - tok);
delete [] bot;
bot = buf;

free = BSIZE - copy;
}

DASSERT(lim + free <= bot + BSIZE);
if (!read(free)) {
eof = lim;
memset(lim, 0, YYMAXFILL);
Expand Down

6 comments on commit c4603ba

@kirotawa
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I`m trying to reproduce this issue in versions old than 1.3. Is there any PoC available?
I saw that code is quite diff between 1.3 versions < 1.2.

@skvadrik
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the bug was introduced in commit 2f3e597, in re2c-1.2 (the previous version without the bug is re2c-1.1.1). I'm attaching the input file which crashed re2c: 119.crashes.re.txt (constructed with a fuzzer).

@carnil
Copy link

@carnil carnil commented on c4603ba Apr 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skvadrik I did a git bisect which lead me to

git bisect start
# bad: [f5af0a118ae689a8128b5b734337fa9eb89a7904] Release 1.2.
git bisect bad f5af0a118ae689a8128b5b734337fa9eb89a7904
# good: [d06bf2be1ff92afb5de6f198f12d97dfe51dc782] Release 1.1.1.
git bisect good d06bf2be1ff92afb5de6f198f12d97dfe51dc782
# bad: [baa24bdbd912ef95d9f8386705de38fb7910006d] libre2c: added constant-memory (non trie-based) TNFA matching algorithm for leftmost greedy.
git bisect bad baa24bdbd912ef95d9f8386705de38fb7910006d
# bad: [00e529ff451a63989bd1e21bd395425f17bc5d80] Order initial configurations by POSIX precedence in GOR1.
git bisect bad 00e529ff451a63989bd1e21bd395425f17bc5d80
# good: [66a93ac35b77404909250d92355e0613f82686dd] Tweaking condition list lexer.
git bisect good 66a93ac35b77404909250d92355e0613f82686dd
# bad: [b94c5af9a2d150c9421ca3148baa3a625ecce682] Added -I option (paths to include directories).
git bisect bad b94c5af9a2d150c9421ca3148baa3a625ecce682
# good: [4b511e541a880fd2d6311fcf94c23fb1a6de7927] Fixed read past the end of buffer in configuration parser.
git bisect good 4b511e541a880fd2d6311fcf94c23fb1a6de7927
# good: [b5471c4fcaf1ddd66df71d17334e5d7778c8468b] Paper: added two output() functions that convert t-string to parse tree and offsets.
git bisect good b5471c4fcaf1ddd66df71d17334e5d7778c8468b
# good: [f691f14f01cdb1fe0b286ecd3379a32de092a17c] configure.ac: set -Wreturn-type to error.
git bisect good f691f14f01cdb1fe0b286ecd3379a32de092a17c
# bad: [df3949c6ee4864ccd84b9cc3c68effc437d3d0b4] Added /*!include:re2c ... */ directive.
git bisect bad df3949c6ee4864ccd84b9cc3c68effc437d3d0b4
# bad: [1edd26a35457c5835afd58b8fa8330d33e7a1192] Preparations to support #include: keep input files in a stack.
git bisect bad 1edd26a35457c5835afd58b8fa8330d33e7a1192
# first bad commit: [1edd26a35457c5835afd58b8fa8330d33e7a1192] Preparations to support #include: keep input files in a stack.

Which indicates the first bad commit would be 1edd26a which would be later than 2f3e597 but the issue was just covered? Does this make sense?

@skvadrik
Copy link
Owner Author

@skvadrik skvadrik commented on c4603ba Apr 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carnil Yes. The logical error was introduced in 2f3e597abce36fb7f41413373308b7f13fc98181in in using shift_ptrs(buf - bot); instead of shift_ptrs(buf - tok); here. However, it was as you said covered by the following condition, so actual error was introduced in 1edd26a. Thanks for bisecting!

@neoclust
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems it is still not completly fixed : https://bugs.mageia.org/show_bug.cgi?id=26549

@skvadrik
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neoclust The bug you linked is related to #219 (comment), not this bug. This one was a buffer overflow (which was fixed). The repro in https://bugs.mageia.org/show_bug.cgi?id=26549#c7 limits stack size to 256 bytes --- it is very small, so one of the recursive functions causes a stack overflow.

Please sign in to comment.