Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASan heap-buffer-overflow in re2c::Scanner::scan #226

Closed
fgeek opened this issue Oct 28, 2018 · 8 comments
Closed

ASan heap-buffer-overflow in re2c::Scanner::scan #226

fgeek opened this issue Oct 28, 2018 · 8 comments

Comments

@fgeek
Copy link

@fgeek fgeek commented Oct 28, 2018

Tested commit: 22b73ed
Credit: Henri Salo from Nixu Corporation
Tools: american fuzzy lop 2.52b, afl-utils

Reproducer re2c-2018-10-28-crash-001.txt.zip (SHA1 bb5bd8951a063f76d8dbc3525c6e127e9094407b)

00000000  25 7b 7b 22                                       |%{{"|
00000004

ASan output:

==21929==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x62500000e900 at pc 0x5624ce069fae bp 0x7ffc7c718110 sp 0x7ffc7c718108
READ of size 1 at 0x62500000e900 thread T0
    #0 0x5624ce069fad in re2c::Scanner::scan(re2c::conopt_t const*) src/ast/lex.cc:1818
    #1 0x5624ce07169e in yylex src/ast/parser.ypp:260
    #2 0x5624ce07169e in yyparse(re2c::context_t&) src/ast/parser.cc:1219
    #3 0x5624ce076eed in re2c::parse(re2c::Scanner&, std::vector<re2c::spec_t, std::allocator<re2c::spec_t> >&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, re2c::AST const*, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, re2c::AST const*> > >&, re2c::Opt&) src/ast/parser.ypp:271
    #4 0x5624cdf9b642 in re2c::compile(re2c::Scanner&, re2c::Output&, re2c::Opt&) src/compile.cc:155
    #5 0x5624cdd96795 in main src/main.cc:31
    #6 0x7f085714c2e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0)
    #7 0x5624cdd97899 in _start (/home/hsalo/builds/re2c/22b73eddedd41b2ebb10d17a3e669121dd6418d2/bin/re2c+0x1e899)

0x62500000e900 is located 0 bytes to the right of 8192-byte region [0x62500000c900,0x62500000e900)
allocated by thread T0 here:
    #0 0x7f0857e2ad70 in operator new[](unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.3+0xc2d70)
    #1 0x5624ce036ad3 in re2c::Scanner::fill(unsigned int) src/ast/scanner.cc:63
    #2 0x5624ce050053 in re2c::Scanner::echo(re2c::OutputFile&) src/ast/lex.cc:94
    #3 0x5624cdf9b50a in re2c::compile(re2c::Scanner&, re2c::Output&, re2c::Opt&) src/compile.cc:140
    #4 0x5624cdd96795 in main src/main.cc:31
    #5 0x7f085714c2e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0)

SUMMARY: AddressSanitizer: heap-buffer-overflow src/ast/lex.cc:1818 in re2c::Scanner::scan(re2c::conopt_t const*)
Shadow bytes around the buggy address:
  0x0c4a7fff9cd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c4a7fff9ce0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c4a7fff9cf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c4a7fff9d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c4a7fff9d10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c4a7fff9d20:[fa]fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c4a7fff9d30: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c4a7fff9d40: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c4a7fff9d50: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c4a7fff9d60: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c4a7fff9d70: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==21929==ABORTING
(gdb) run
Starting program: ./bin/re2c re2c-2018-10-28-crash-001.txt

Program received signal SIGSEGV, Segmentation fault.
re2c::Scanner::scan (this=0x7fffffffd0b0, globopts=<optimized out>) at src/ast/lex.cc:1818
1818            yych = (YYCTYPE)*YYCURSOR;
(gdb) bt
#0  re2c::Scanner::scan (this=0x7fffffffd0b0, globopts=<optimized out>) at src/ast/lex.cc:1818
#1  0x00005555555a201d in yylex (context=...) at ./src/ast/parser.ypp:260
#2  yyparse (context=...) at src/ast/parser.cc:1219
#3  0x00005555555a301f in re2c::parse (input=..., specs=std::vector of length 0, capacity 0, symtab=std::map with 0 elements, opts=...) at ./src/ast/parser.ypp:271
#4  0x0000555555588f98 in re2c::compile (input=..., output=..., opts=...) at src/compile.cc:155
#5  0x0000555555557077 in main (argv=<optimized out>) at src/main.cc:31
@skvadrik
Copy link
Owner

@skvadrik skvadrik commented Oct 28, 2018

Confirmed, the error is caused by assuming that strings in the source code cannot contain a sequence of zero bytes (they can in C/C++). RE2C has to "parse" C/C++ code in order to find the closing curly brace in semantic actions.

There is a number of possible fixes and I'm trying to find the best one.Thanks for reporting!

@fgeek
Copy link
Author

@fgeek fgeek commented Oct 29, 2018

@skvadrik Thank you for quick reply. I have some other cases from fuzzing. Do you prefer issue report per unique case or zip file with all samples e.g. via email? I can also give you short introduction how to use afl, but I'm more than happy to continue fuzzing after fixes.

@skvadrik
Copy link
Owner

@skvadrik skvadrik commented Oct 29, 2018

Pushed a fix: 44737d3

@fgeek I'll be greedy, let's have the whole zip at once. :) I think you can attach it right here, but email is also fine (re2c-devel@lists.sourceforge.net is the official mailing list). I shall learn to use afl myself (probably not that hard), but let's fix what you've found first.

@fgeek
Copy link
Author

@fgeek fgeek commented Oct 30, 2018

I can confirm that commit fixes original issue. Two more cases re2c-2018-10-29-crashes.zip

SUMMARY: AddressSanitizer: heap-buffer-overflow src/ast/lex.cc:2084 in re2c::Scanner::lex_cls(bool)
SUMMARY: AddressSanitizer: heap-buffer-overflow src/ast/lex.cc:2426 in re2c::Scanner::lex_str_chr(char, bool&)

I'll bet using afl is very easy for you. See documentation in http://lcamtuf.coredump.cx/afl/README.txt and feel free to contact me if you need help via henri@nerv.fi email. Remember to build with ASan and use /dev/shm or other memory partition for your fuzzing to avoid wasting HDD/SSD.

@skvadrik
Copy link
Owner

@skvadrik skvadrik commented Oct 30, 2018

Pushed a fix for the other two crashes: f062a8b. Thanks for afl links!

@fgeek
Copy link
Author

@fgeek fgeek commented Oct 31, 2018

re2c-2018-10-31.zip

SUMMARY: AddressSanitizer: heap-buffer-overflow src/ast/lex.cc:2113 in re2c::Scanner::lex_cls(bool)
SUMMARY: AddressSanitizer: heap-buffer-overflow src/ast/lex.cc:2464 in re2c::Scanner::lex_str_chr(char, bool&)
@skvadrik
Copy link
Owner

@skvadrik skvadrik commented Nov 1, 2018

Fixed! 0f7b449

@fgeek
Copy link
Author

@fgeek fgeek commented Nov 2, 2018

Thank you. I'll open new issue report if I find something else. Currently running 731 executions per second with nine CPU cores.

@fgeek fgeek closed this Nov 2, 2018
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Sep 20, 2020
2.0.3 (2020-08-22)
~~~~~~~~~~~~~~~~~~

- Fix issues when building re2c as a CMake subproject
  (`#302 <https://github.com/skvadrik/re2c/pull/302>`_:

- Final corrections in the SIMPA article "RE2C: A lexer generator based on
  lookahead-TDFA", https://doi.org/10.1016/j.simpa.2020.100027

2.0.2 (2020-08-08)
~~~~~~~~~~~~~~~~~~

- Enable re2go building by default.

- Package CMake files into release tarball.

2.0.1 (2020-07-29)
~~~~~~~~~~~~~~~~~~

- Updated version for CMake build system (forgotten in release 2.0).

- Added a short article about re2c for the Software Impacts journal.

2.0 (2020-07-20)
~~~~~~~~~~~~~~~~

- Added new code generation backend for Go and a new ``re2go`` program
  (`#272 <https://github.com/skvadrik/re2c/issues/272>`_: Go support).
  Added option ``--lang <c | go>``.

- Added CMake build system as an alternative to Autotools
  (`#275 <https://github.com/skvadrik/re2c/pull/275>`_:
  Add a CMake build system (thanks to ligfx),
  `#244 <https://github.com/skvadrik/re2c/issues/244>`_: Switching to CMake).

- Changes in generic API:

  + Removed primitives ``YYSTAGPD`` and ``YYMTAGPD``.
  + Added primitives ``YYSHIFT``, ``YYSHIFTSTAG``, ``YYSHIFTMTAG``
    that allow to express fixed tags in terms of generic API.
  + Added configurations ``re2c:api:style`` and ``re2c:api:sigil``.
  + Added named placeholders in interpolated configuration strings.

- Changes in reuse mode (``-r, --reuse`` option):

  + Do not reset API-related configurations in each `use:re2c` block
    (`#291 <https://github.com/skvadrik/re2c/issues/291>`_:
    Defines in rules block are not propagated to use blocks).
  + Use block-local options instead of last block options.
  + Do not accumulate options from rules/reuse blocks in whole-program options.
  + Generate non-overlapping YYFILL labels for reuse blocks.
  + Generate start label for each reuse block in storable state mode.

- Changes in start-conditions mode (``-c, --start-conditions`` option):

  + Allow to use normal (non-conditional) blocks in `-c` mode
    (`#263 <https://github.com/skvadrik/re2c/issues/263>`_:
    allow mixing conditional and non-conditional blocks with -c,
    `#296 <https://github.com/skvadrik/re2c/issues/296>`_:
    Conditions required for all lexers when using '-c' option).
  + Generate condition switch in every re2c block
    (`#295 <https://github.com/skvadrik/re2c/issues/295>`_:
    Condition switch generated for only one lexer per file).

- Changes in the generated labels:

  + Use ``yyeof`` label prefix instead of ``yyeofrule``.
  + Use ``yyfill`` label prefix instead of ``yyFillLabel``.
  + Decouple start label and initial label (affects label numbering).

- Removed undocumented configuration ``re2c🎏o``, ``re2c🎏output``.

- Changes in ``re2c🎏t``, ``re2c🎏type-header`` configuration:
  filename is now relative to the output file directory.

- Added option ``--case-ranges`` and configuration ``re2c🎏case-ranges``.

- Extended fixed tags optimization for the case of fixed-counter repetition.

- Fixed bugs related to EOF rule:

  + `#276 <https://github.com/skvadrik/re2c/issues/276>`_:
    Example 01_fill.re in docs is broken
  + `#280 <https://github.com/skvadrik/re2c/issues/280>`_:
    EOF rules with multiple blocks
  + `#284 <https://github.com/skvadrik/re2c/issues/284>`_:
    mismatched YYBACKUP and YYRESTORE
    (Add missing fallback states with EOF rule)

- Fixed miscellaneous bugs:

  + `#286 <https://github.com/skvadrik/re2c/issues/286>`_:
    Incorrect submatch values with fixed-length trailing context.
  + `#297 <https://github.com/skvadrik/re2c/issues/297>`_:
    configure error on ubuntu 18.04 / cmake 3.10

- Changed bootstrap process (require explicit configuration flags and a path to
  re2c executable to regenerate the lexers).

- Added internal options ``--posix-prectable <naive | complex>``.

- Added debug option ``--dump-dfa-tree``.

- Major revision of the paper "Efficient POSIX submatch extraction on NFA".

----
1.3x
----

1.3 (2019-12-14)
~~~~~~~~~~~~~~~~

- Added option: ``--stadfa``.

- Added warning: ``-Wsentinel-in-midrule``.

- Added generic API primitives:

  + ``YYSTAGPD``
  + ``YYMTAGPD``

- Added configurations:

  + ``re2c:sentinel = 0;``
  + ``re2c:define:YYSTAGPD = "YYSTAGPD";``
  + ``re2c:define:YYMTAGPD = "YYMTAGPD";``

- Worked on reproducible builds
  (`#258 <https://github.com/skvadrik/re2c/pull/258>`_:
  Make the build reproducible).

----
1.2x
----

1.2.1 (2019-08-11)
~~~~~~~~~~~~~~~~~~

- Fixed bug `#253 <https://github.com/skvadrik/re2c/issues/253>`_:
  re2c should install unicode_categories.re somewhere.

- Fixed bug `#254 <https://github.com/skvadrik/re2c/issues/254>`_:
  Turn off re2c:eof = 0.

1.2 (2019-08-02)
~~~~~~~~~~~~~~~~

- Added EOF rule ``$`` and configuration ``re2c:eof``.

- Added ``/*!include:re2c ... */`` directive and ``-I`` option.

- Added ``/*!header:re2c:on*/`` and ``/*!header:re2c:off*/`` directives.

- Added ``--input-encoding <ascii | utf8>`` option.

  + `#237 <https://github.com/skvadrik/re2c/issues/237>`_:
    Handle non-ASCII encoded characters in regular expressions
  + `#250 <https://github.com/skvadrik/re2c/issues/250>`_
    UTF8 enoding

- Added include file with a list of definitions for Unicode character classes.

  + `#235 <https://github.com/skvadrik/re2c/issues/235>`_:
    Unicode character classes

- Added ``--location-format <gnu | msvc>`` option.

  + `#195 <https://github.com/skvadrik/re2c/issues/195>`_:
    Please consider using Gnu format for error messages

- Added ``--verbose`` option that prints "success" message if re2c exits
  without errors.

- Added configurations for options:

  + ``-o --output`` (specify output file)
  + ``-t --type-header`` (specify header file)

- Removed configurations for internal/debug options.

- Extended ``-r`` option: allow to mix multiple ``/*!rules:re2c*/``,
  ``/*!use:re2c*/`` and ``/*!re2c*/`` blocks.

  + `#55 <https://github.com/skvadrik/re2c/issues/55>`_:
    allow standard re2c blocks in reuse mode

- Fixed ``-F --flex-support`` option: parsing and operator precedence.

  + `#229 <https://github.com/skvadrik/re2c/issues/229>`_:
    re2c option -F (flex syntax) broken
  + `#242 <https://github.com/skvadrik/re2c/issues/242>`_:
    Operator precedence with --flex-syntax is broken

- Changed difference operator ``/`` to apply before encoding expansion of
  operands.

  + `#236 <https://github.com/skvadrik/re2c/issues/236>`_:
    Support range difference with variable-length encodings

- Changed output generation of output file to be atomic.

  + `#245 <https://github.com/skvadrik/re2c/issues/245>`_:
    re2c output is not atomic

- Authored research paper "Efficient POSIX Submatch Extraction on NFA"
  together with Dr Angelo Borsotti.

- Added experimental libre2c library (``--enable-libs`` configure option) with
  the following algorithms:

  + TDFA with leftmost-greedy disambiguation
  + TDFA with POSIX disambiguation (Okui-Suzuki algorithm)
  + TNFA with leftmost-greedy disambiguation
  + TNFA with POSIX disambiguation (Okui-Suzuki algorithm)
  + TNFA with lazy POSIX disambiguation (Okui-Suzuki algorithm)
  + TNFA with POSIX disambiguation (Kuklewicz algorithm)
  + TNFA with POSIX disambiguation (Cox algorithm)

- Added debug subsystem (``--enable-debug`` configure option) and new debug
  options:

  + ``-dump-cfg`` (dump control flow graph of tag variables)
  + ``-dump-interf`` (dump interference table of tag variables)
  + ``-dump-closure-stats`` (dump epsilon-closure statistics)

- Added internal options:

  + ``--posix-closure <gor1 | gtop>`` (switch between shortest-path algorithms
    used for the construction of POSIX closure)

- Fixed a number of crashes found by American Fuzzy Lop fuzzer:

  + `#226 <https://github.com/skvadrik/re2c/issues/226>`_,
    `#227 <https://github.com/skvadrik/re2c/issues/227>`_,
    `#228 <https://github.com/skvadrik/re2c/issues/228>`_,
    `#231 <https://github.com/skvadrik/re2c/issues/231>`_,
    `#232 <https://github.com/skvadrik/re2c/issues/232>`_,
    `#233 <https://github.com/skvadrik/re2c/issues/233>`_,
    `#234 <https://github.com/skvadrik/re2c/issues/234>`_,
    `#238 <https://github.com/skvadrik/re2c/issues/238>`_

- Fixed handling of newlines:

  + correctly parse multi-character newlines CR LF in ``#line`` directives
  + consistently convert all newlines in the generated file to Unix-style LF

- Changed default tarball format from .gz to .xz.

  + `#221 <https://github.com/skvadrik/re2c/issues/221>`_:
    big source tarball

- Fixed a number of other bugs and resolved issues:

  + `#2 <https://github.com/skvadrik/re2c/issues/2>`_: abort
  + `#6 <https://github.com/skvadrik/re2c/issues/6>`_: segfault
  + `#10 <https://github.com/skvadrik/re2c/issues/10>`_:
    lessons/002_upn_calculator/calc_002 doesn't produce a useful example program
  + `#44 <https://github.com/skvadrik/re2c/issues/44>`_:
    Access violation when translating the attached file
  + `#49 <https://github.com/skvadrik/re2c/issues/49>`_:
    wildcard state \000 rules makes lexer behave weard
  + `#98 <https://github.com/skvadrik/re2c/issues/98>`_:
    Transparent handling of #line directives in input files
  + `#104 <https://github.com/skvadrik/re2c/issues/104>`_:
    Improve const-correctness
  + `#105 <https://github.com/skvadrik/re2c/issues/105>`_:
    Conversion of pointer parameters into references
  + `#114 <https://github.com/skvadrik/re2c/issues/114>`_:
    Possibility of fixing bug 2535084
  + `#120 <https://github.com/skvadrik/re2c/issues/120>`_:
    condition consisting of default rule only is ignored
  + `#167 <https://github.com/skvadrik/re2c/issues/167>`_:
    Add word boundary support
  + `#168 <https://github.com/skvadrik/re2c/issues/168>`_:
    Wikipedia's article on re2c
  + `#180 <https://github.com/skvadrik/re2c/issues/180>`_:
    Comment syntax?
  + `#182 <https://github.com/skvadrik/re2c/issues/182>`_:
    yych being set by YYPEEK () and then not used
  + `#196 <https://github.com/skvadrik/re2c/issues/196>`_:
    Implicit type conversion warnings
  + `#198 <https://github.com/skvadrik/re2c/issues/198>`_:
    no match for ‘operator!=’ in ‘i != std::vector<_Tp, _Alloc>::rend() [with _Tp = re2c::bitmap_t, _Alloc = std::allocator<re2c::bitmap_t>]()’
  + `#210 <https://github.com/skvadrik/re2c/issues/210>`_:
    How to build re2c in windows?
  + `#215 <https://github.com/skvadrik/re2c/issues/215>`_:
    A memory read overrun issue in s_to_n32_unsafe.cc
  + `#220 <https://github.com/skvadrik/re2c/issues/220>`_:
    src/dfa/dfa.h: simplify constructor to avoid g++-3.4 bug
  + `#223 <https://github.com/skvadrik/re2c/issues/223>`_:
    Fix typo
  + `#224 <https://github.com/skvadrik/re2c/issues/224>`_:
    src/dfa/closure_posix.cc: pack() tweaks
  + `#225 <https://github.com/skvadrik/re2c/issues/225>`_:
    Documentation link is broken in libre2c/README
  + `#230 <https://github.com/skvadrik/re2c/issues/230>`_:
    Changes for upcoming Travis' infra migration
  + `#239 <https://github.com/skvadrik/re2c/issues/239>`_:
    Push model example has wrong re2c invocation, breaks guide
  + `#241 <https://github.com/skvadrik/re2c/issues/241>`_:
    Guidance on how to use re2c for full-duplex command & response protocol
  + `#243 <https://github.com/skvadrik/re2c/issues/243>`_:
    A code generated for period (.) requires 4 bytes
  + `#246 <https://github.com/skvadrik/re2c/issues/246>`_:
    Please add a license to this repo
  + `#247 <https://github.com/skvadrik/re2c/issues/247>`_:
    Build failure on current Cygwin, probably caused by force-fed c++98 mode
  + `#248 <https://github.com/skvadrik/re2c/issues/248>`_:
    distcheck still looks for README
  + `#251 <https://github.com/skvadrik/re2c/issues/251>`_:
    Including what you use is find, but not without inclusion guards

- Updated documentation and website.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants