Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A memory read overrun issue in s_to_n32_unsafe.cc #215

Closed
mlite opened this issue Sep 3, 2018 · 10 comments
Closed

A memory read overrun issue in s_to_n32_unsafe.cc #215

mlite opened this issue Sep 3, 2018 · 10 comments

Comments

@mlite
Copy link

@mlite mlite commented Sep 3, 2018

This is the runtime error msg caused by the overrun.

DTS_MSG: Stensal C/C++ DTS detected a fatal program error!
DTS_MSG: Continuing the execution will cause unexpected behaviors, abort!
DTS_MSG: Reading 1 bytes at 0xffffc7dc will read undefined values.
DTS_MSG: Diagnostic information:

- The object to-be-read (start:0xffffc6dc, size:256 bytes) is allocated at
-     file:/home/sbuilder/workspace/re2c/re2c/src/test/s_to_n32_unsafe/test.cc::39, 10
-  0xffffc6dc               0xffffc7db
-  +------------------------+
-  | the object  to-be-read |......
-  +------------------------+
-                            ^~~~~~~~~~
-        the read starts at 0xffffc7dc that is right after the object end.
- Stack trace (most recent call first):
-[1]  file:/home/sbuilder/workspace/re2c/re2c/src/util/s_to_n32_unsafe.cc::28, 9
-[2]  file:/home/sbuilder/workspace/re2c/re2c/src/test/s_to_n32_unsafe/test.cc::50, 9
-[3]  file:/home/sbuilder/workspace/re2c/re2c/src/test/s_to_n32_unsafe/test.cc::85, 15
-[4]  file:/home/sbuilder/workspace/re2c/re2c/src/test/s_to_n32_unsafe/test.cc::101, 12
-[5]  file:/musl-1.1.10/src/env/__libc_start_main.c::168, 11
@skvadrik
Copy link
Owner

@skvadrik skvadrik commented Sep 4, 2018

Do you have the .re file which caused the failure?

@trofi
Copy link
Collaborator

@trofi trofi commented Sep 4, 2018

Looks like it's caused by the test itself: re2c/src/test/s_to_n32_unsafe/test.cc. I haven't found plausible call path yet but a few use-of-uninitialized-value reports are detectable by clang's fsanitize=memory:

$ ./configure CC=clang CXX=clang++ CFLAGS="-fsanitize=memory" CXXFLAGS="-fsanitize=memory"
...
$ make check VERBOSE=1
...
FAIL: run_tests.sh
SKIP: testrange
PASS: testston32unsafe
PASS: testvertovernum
==================================
   re2c 1.1.1: ./test-suite.log
==================================

# TOTAL: 4
# PASS:  2
# SKIP:  1
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: run_tests.sh
==================

Running in 8 thread(s)
FAIL       01_recognizing_integers.i.re
FAIL       config/cond_set/1_1_3.ci.re
FAIL       unicode_group_Cc.8--encoding-policy(fail).re
FAIL       posix_captures/implicit_grouping1.i--posix-captures.re
FAIL       range_dot.x.re
FAIL       parse_date.g.re
...
FAIL       parse_date.db.re
FAIL       unicode_group_No.u--encoding-policy(substitute).re
FAIL       posix_captures/gor3.i--posix-captures.re
Error: 1409 out 1409 tests failed.
FAIL run_tests.sh (exit status: 1)

SKIP: testrange
===============

==5661==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x49403d in re2c::Range::ran(unsigned int, unsigned int) (/home/slyfox/dev/git/re2c/re2c/testrange+0x49403d)
    #1 0x49390a in re2c::Range::append(re2c::Range**&, unsigned int, unsigned int) (/home/slyfox/dev/git/re2c/re2c/testrange+0x49390a)
    #2 0x491838 in main (/home/slyfox/dev/git/re2c/re2c/testrange+0x491838)
    #3 0x7fe65459579a in __libc_start_main /usr/src/debug/sys-libs/glibc-2.28/glibc-2.28/csu/../csu/libc-start.c:308:16
    #4 0x41b419 in _start (/home/slyfox/dev/git/re2c/re2c/testrange+0x41b419)

SUMMARY: MemorySanitizer: use-of-uninitialized-value (/home/slyfox/dev/git/re2c/re2c/testrange+0x49403d) in re2c::Range::ran(unsigned int, unsigned int)
Exiting
SKIP testrange (exit status: 77)

============================================================================
Testsuite summary for re2c 1.1.1
============================================================================

I'll try to add travis presubmit for a few sanitizers.

@skvadrik
Copy link
Owner

@skvadrik skvadrik commented Sep 4, 2018

Eh, I didn't read the bug report properly.

Valgrind also reports the error:

$ valgrind --track-origins=yes ./testston32unsafe 
==30881== Memcheck, a memory error detector
==30881== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==30881== Using Valgrind-3.14.0.GIT and LibVEX; rerun with -h for copyright info
==30881== Command: ./testston32unsafe
==30881== 
==30881== Conditional jump or move depends on uninitialised value(s)
==30881==    at 0x108C15: s_to_i32_unsafe(char const*, char const*, int&) (s_to_n32_unsafe.cc:28)
==30881==    by 0x108B56: re2c_test::test_i(long) (test.cc:50)
==30881==    by 0x1087BB: test (test.cc:85)
==30881==    by 0x1087BB: main (test.cc:101)
==30881==  Uninitialised value was created by a stack allocation
==30881==    at 0x108AB0: re2c_test::test_i(long) (test.cc:38)
skvadrik added a commit that referenced this issue Sep 4, 2018
The error was in the code of the test itself: the special case of zero
wasn't handled correctrly by the function that prepares input data for
the test. As a result, zero-length input string was passed to the test,
which is unexpected: the tested function is an "unsafe" one (as the
name suggests) and is meant to be used on an already validated input.
@skvadrik
Copy link
Owner

@skvadrik skvadrik commented Sep 4, 2018

The error has been fixed (at least the one I can reproduce with Valgrind), see a439ca0?diff=unified.

@mlite, can you confirm the fix with your analyzer?

@trofi, thank you! :)

@mlite
Copy link
Author

@mlite mlite commented Sep 5, 2018

the culprit is overflow-1.re. The memory read overrun is fixed, but I get stack overflow. What is the purpose of this test? to overflow the call stack?

@skvadrik
Copy link
Owner

@skvadrik skvadrik commented Sep 5, 2018

the culprit is overflow-1.re. The memory read overrun is fixed, but I get stack overflow.

It's a different bug. Can you open a new issue and specify what function overflowed? There is a number of recursive functions in re2c, and if the default stack size on your plaftorm is small (compared to that on the platforms where we test re2c), than it is quite possible that one of the recursive functions exhausted the stack.

What is the purpose of this test? to overflow the call stack?

No, actually it's to overflow re2c lexer buffer with an unexpectedly long lexeme: re2c used to crash at some point, but now it prints an error message.

What platform are you running re2c on? (My guess is, windows: I don't have it and the only kind of testing for windows is done by running Mingw-compiled re2c in Wine.)

@skvadrik
Copy link
Owner

@skvadrik skvadrik commented Sep 5, 2018

My guess is, windows

Eh, again I'm wrong. Your stacktrace shows this:

file:/musl-1.1.10/src/env/__libc_start_main.c
@mlite
Copy link
Author

@mlite mlite commented Sep 6, 2018

It's Linux. This issue is fixed. I will try it with a larger stack size.

@mlite mlite closed this Sep 6, 2018
@mlite
Copy link
Author

@mlite mlite commented Jan 13, 2019

In case you are interested in what tool I used. I just released the tool at https://stensal.com. It's called Stensal SDK. It's free for personal use.

@skvadrik
Copy link
Owner

@skvadrik skvadrik commented Jan 13, 2019

@mlite , thanks !

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Sep 20, 2020
2.0.3 (2020-08-22)
~~~~~~~~~~~~~~~~~~

- Fix issues when building re2c as a CMake subproject
  (`#302 <https://github.com/skvadrik/re2c/pull/302>`_:

- Final corrections in the SIMPA article "RE2C: A lexer generator based on
  lookahead-TDFA", https://doi.org/10.1016/j.simpa.2020.100027

2.0.2 (2020-08-08)
~~~~~~~~~~~~~~~~~~

- Enable re2go building by default.

- Package CMake files into release tarball.

2.0.1 (2020-07-29)
~~~~~~~~~~~~~~~~~~

- Updated version for CMake build system (forgotten in release 2.0).

- Added a short article about re2c for the Software Impacts journal.

2.0 (2020-07-20)
~~~~~~~~~~~~~~~~

- Added new code generation backend for Go and a new ``re2go`` program
  (`#272 <https://github.com/skvadrik/re2c/issues/272>`_: Go support).
  Added option ``--lang <c | go>``.

- Added CMake build system as an alternative to Autotools
  (`#275 <https://github.com/skvadrik/re2c/pull/275>`_:
  Add a CMake build system (thanks to ligfx),
  `#244 <https://github.com/skvadrik/re2c/issues/244>`_: Switching to CMake).

- Changes in generic API:

  + Removed primitives ``YYSTAGPD`` and ``YYMTAGPD``.
  + Added primitives ``YYSHIFT``, ``YYSHIFTSTAG``, ``YYSHIFTMTAG``
    that allow to express fixed tags in terms of generic API.
  + Added configurations ``re2c:api:style`` and ``re2c:api:sigil``.
  + Added named placeholders in interpolated configuration strings.

- Changes in reuse mode (``-r, --reuse`` option):

  + Do not reset API-related configurations in each `use:re2c` block
    (`#291 <https://github.com/skvadrik/re2c/issues/291>`_:
    Defines in rules block are not propagated to use blocks).
  + Use block-local options instead of last block options.
  + Do not accumulate options from rules/reuse blocks in whole-program options.
  + Generate non-overlapping YYFILL labels for reuse blocks.
  + Generate start label for each reuse block in storable state mode.

- Changes in start-conditions mode (``-c, --start-conditions`` option):

  + Allow to use normal (non-conditional) blocks in `-c` mode
    (`#263 <https://github.com/skvadrik/re2c/issues/263>`_:
    allow mixing conditional and non-conditional blocks with -c,
    `#296 <https://github.com/skvadrik/re2c/issues/296>`_:
    Conditions required for all lexers when using '-c' option).
  + Generate condition switch in every re2c block
    (`#295 <https://github.com/skvadrik/re2c/issues/295>`_:
    Condition switch generated for only one lexer per file).

- Changes in the generated labels:

  + Use ``yyeof`` label prefix instead of ``yyeofrule``.
  + Use ``yyfill`` label prefix instead of ``yyFillLabel``.
  + Decouple start label and initial label (affects label numbering).

- Removed undocumented configuration ``re2c🎏o``, ``re2c🎏output``.

- Changes in ``re2c🎏t``, ``re2c🎏type-header`` configuration:
  filename is now relative to the output file directory.

- Added option ``--case-ranges`` and configuration ``re2c🎏case-ranges``.

- Extended fixed tags optimization for the case of fixed-counter repetition.

- Fixed bugs related to EOF rule:

  + `#276 <https://github.com/skvadrik/re2c/issues/276>`_:
    Example 01_fill.re in docs is broken
  + `#280 <https://github.com/skvadrik/re2c/issues/280>`_:
    EOF rules with multiple blocks
  + `#284 <https://github.com/skvadrik/re2c/issues/284>`_:
    mismatched YYBACKUP and YYRESTORE
    (Add missing fallback states with EOF rule)

- Fixed miscellaneous bugs:

  + `#286 <https://github.com/skvadrik/re2c/issues/286>`_:
    Incorrect submatch values with fixed-length trailing context.
  + `#297 <https://github.com/skvadrik/re2c/issues/297>`_:
    configure error on ubuntu 18.04 / cmake 3.10

- Changed bootstrap process (require explicit configuration flags and a path to
  re2c executable to regenerate the lexers).

- Added internal options ``--posix-prectable <naive | complex>``.

- Added debug option ``--dump-dfa-tree``.

- Major revision of the paper "Efficient POSIX submatch extraction on NFA".

----
1.3x
----

1.3 (2019-12-14)
~~~~~~~~~~~~~~~~

- Added option: ``--stadfa``.

- Added warning: ``-Wsentinel-in-midrule``.

- Added generic API primitives:

  + ``YYSTAGPD``
  + ``YYMTAGPD``

- Added configurations:

  + ``re2c:sentinel = 0;``
  + ``re2c:define:YYSTAGPD = "YYSTAGPD";``
  + ``re2c:define:YYMTAGPD = "YYMTAGPD";``

- Worked on reproducible builds
  (`#258 <https://github.com/skvadrik/re2c/pull/258>`_:
  Make the build reproducible).

----
1.2x
----

1.2.1 (2019-08-11)
~~~~~~~~~~~~~~~~~~

- Fixed bug `#253 <https://github.com/skvadrik/re2c/issues/253>`_:
  re2c should install unicode_categories.re somewhere.

- Fixed bug `#254 <https://github.com/skvadrik/re2c/issues/254>`_:
  Turn off re2c:eof = 0.

1.2 (2019-08-02)
~~~~~~~~~~~~~~~~

- Added EOF rule ``$`` and configuration ``re2c:eof``.

- Added ``/*!include:re2c ... */`` directive and ``-I`` option.

- Added ``/*!header:re2c:on*/`` and ``/*!header:re2c:off*/`` directives.

- Added ``--input-encoding <ascii | utf8>`` option.

  + `#237 <https://github.com/skvadrik/re2c/issues/237>`_:
    Handle non-ASCII encoded characters in regular expressions
  + `#250 <https://github.com/skvadrik/re2c/issues/250>`_
    UTF8 enoding

- Added include file with a list of definitions for Unicode character classes.

  + `#235 <https://github.com/skvadrik/re2c/issues/235>`_:
    Unicode character classes

- Added ``--location-format <gnu | msvc>`` option.

  + `#195 <https://github.com/skvadrik/re2c/issues/195>`_:
    Please consider using Gnu format for error messages

- Added ``--verbose`` option that prints "success" message if re2c exits
  without errors.

- Added configurations for options:

  + ``-o --output`` (specify output file)
  + ``-t --type-header`` (specify header file)

- Removed configurations for internal/debug options.

- Extended ``-r`` option: allow to mix multiple ``/*!rules:re2c*/``,
  ``/*!use:re2c*/`` and ``/*!re2c*/`` blocks.

  + `#55 <https://github.com/skvadrik/re2c/issues/55>`_:
    allow standard re2c blocks in reuse mode

- Fixed ``-F --flex-support`` option: parsing and operator precedence.

  + `#229 <https://github.com/skvadrik/re2c/issues/229>`_:
    re2c option -F (flex syntax) broken
  + `#242 <https://github.com/skvadrik/re2c/issues/242>`_:
    Operator precedence with --flex-syntax is broken

- Changed difference operator ``/`` to apply before encoding expansion of
  operands.

  + `#236 <https://github.com/skvadrik/re2c/issues/236>`_:
    Support range difference with variable-length encodings

- Changed output generation of output file to be atomic.

  + `#245 <https://github.com/skvadrik/re2c/issues/245>`_:
    re2c output is not atomic

- Authored research paper "Efficient POSIX Submatch Extraction on NFA"
  together with Dr Angelo Borsotti.

- Added experimental libre2c library (``--enable-libs`` configure option) with
  the following algorithms:

  + TDFA with leftmost-greedy disambiguation
  + TDFA with POSIX disambiguation (Okui-Suzuki algorithm)
  + TNFA with leftmost-greedy disambiguation
  + TNFA with POSIX disambiguation (Okui-Suzuki algorithm)
  + TNFA with lazy POSIX disambiguation (Okui-Suzuki algorithm)
  + TNFA with POSIX disambiguation (Kuklewicz algorithm)
  + TNFA with POSIX disambiguation (Cox algorithm)

- Added debug subsystem (``--enable-debug`` configure option) and new debug
  options:

  + ``-dump-cfg`` (dump control flow graph of tag variables)
  + ``-dump-interf`` (dump interference table of tag variables)
  + ``-dump-closure-stats`` (dump epsilon-closure statistics)

- Added internal options:

  + ``--posix-closure <gor1 | gtop>`` (switch between shortest-path algorithms
    used for the construction of POSIX closure)

- Fixed a number of crashes found by American Fuzzy Lop fuzzer:

  + `#226 <https://github.com/skvadrik/re2c/issues/226>`_,
    `#227 <https://github.com/skvadrik/re2c/issues/227>`_,
    `#228 <https://github.com/skvadrik/re2c/issues/228>`_,
    `#231 <https://github.com/skvadrik/re2c/issues/231>`_,
    `#232 <https://github.com/skvadrik/re2c/issues/232>`_,
    `#233 <https://github.com/skvadrik/re2c/issues/233>`_,
    `#234 <https://github.com/skvadrik/re2c/issues/234>`_,
    `#238 <https://github.com/skvadrik/re2c/issues/238>`_

- Fixed handling of newlines:

  + correctly parse multi-character newlines CR LF in ``#line`` directives
  + consistently convert all newlines in the generated file to Unix-style LF

- Changed default tarball format from .gz to .xz.

  + `#221 <https://github.com/skvadrik/re2c/issues/221>`_:
    big source tarball

- Fixed a number of other bugs and resolved issues:

  + `#2 <https://github.com/skvadrik/re2c/issues/2>`_: abort
  + `#6 <https://github.com/skvadrik/re2c/issues/6>`_: segfault
  + `#10 <https://github.com/skvadrik/re2c/issues/10>`_:
    lessons/002_upn_calculator/calc_002 doesn't produce a useful example program
  + `#44 <https://github.com/skvadrik/re2c/issues/44>`_:
    Access violation when translating the attached file
  + `#49 <https://github.com/skvadrik/re2c/issues/49>`_:
    wildcard state \000 rules makes lexer behave weard
  + `#98 <https://github.com/skvadrik/re2c/issues/98>`_:
    Transparent handling of #line directives in input files
  + `#104 <https://github.com/skvadrik/re2c/issues/104>`_:
    Improve const-correctness
  + `#105 <https://github.com/skvadrik/re2c/issues/105>`_:
    Conversion of pointer parameters into references
  + `#114 <https://github.com/skvadrik/re2c/issues/114>`_:
    Possibility of fixing bug 2535084
  + `#120 <https://github.com/skvadrik/re2c/issues/120>`_:
    condition consisting of default rule only is ignored
  + `#167 <https://github.com/skvadrik/re2c/issues/167>`_:
    Add word boundary support
  + `#168 <https://github.com/skvadrik/re2c/issues/168>`_:
    Wikipedia's article on re2c
  + `#180 <https://github.com/skvadrik/re2c/issues/180>`_:
    Comment syntax?
  + `#182 <https://github.com/skvadrik/re2c/issues/182>`_:
    yych being set by YYPEEK () and then not used
  + `#196 <https://github.com/skvadrik/re2c/issues/196>`_:
    Implicit type conversion warnings
  + `#198 <https://github.com/skvadrik/re2c/issues/198>`_:
    no match for ‘operator!=’ in ‘i != std::vector<_Tp, _Alloc>::rend() [with _Tp = re2c::bitmap_t, _Alloc = std::allocator<re2c::bitmap_t>]()’
  + `#210 <https://github.com/skvadrik/re2c/issues/210>`_:
    How to build re2c in windows?
  + `#215 <https://github.com/skvadrik/re2c/issues/215>`_:
    A memory read overrun issue in s_to_n32_unsafe.cc
  + `#220 <https://github.com/skvadrik/re2c/issues/220>`_:
    src/dfa/dfa.h: simplify constructor to avoid g++-3.4 bug
  + `#223 <https://github.com/skvadrik/re2c/issues/223>`_:
    Fix typo
  + `#224 <https://github.com/skvadrik/re2c/issues/224>`_:
    src/dfa/closure_posix.cc: pack() tweaks
  + `#225 <https://github.com/skvadrik/re2c/issues/225>`_:
    Documentation link is broken in libre2c/README
  + `#230 <https://github.com/skvadrik/re2c/issues/230>`_:
    Changes for upcoming Travis' infra migration
  + `#239 <https://github.com/skvadrik/re2c/issues/239>`_:
    Push model example has wrong re2c invocation, breaks guide
  + `#241 <https://github.com/skvadrik/re2c/issues/241>`_:
    Guidance on how to use re2c for full-duplex command & response protocol
  + `#243 <https://github.com/skvadrik/re2c/issues/243>`_:
    A code generated for period (.) requires 4 bytes
  + `#246 <https://github.com/skvadrik/re2c/issues/246>`_:
    Please add a license to this repo
  + `#247 <https://github.com/skvadrik/re2c/issues/247>`_:
    Build failure on current Cygwin, probably caused by force-fed c++98 mode
  + `#248 <https://github.com/skvadrik/re2c/issues/248>`_:
    distcheck still looks for README
  + `#251 <https://github.com/skvadrik/re2c/issues/251>`_:
    Including what you use is find, but not without inclusion guards

- Updated documentation and website.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants