Transparent handling of #line directives in input files #98
I was playing around with a project where I pass the re2c input file through a preprocessor (m4) first. For best results in this situation, it’s best if re2c can handle #file directives in its input file.
The attached patch to 0.13.3 accomplishes this. While it takes reasonable precautions against false matches for #line directives, it does not strictly check for the directives to start at the beginning of a line (as far as I know, re2c does not yet support the ^pattern syntax, does it?)
While I tried to match the general style of re2c’s source code, I’m open to reworking the patch to meet any further standards I may have missed.
Original comment by: neeri
The text was updated successfully, but these errors were encountered:
Logged In: YES
I think it is a good idea and I am going to add it. However I need to change your patch a bit. And I will need a few tests for it. Also a patch for documentation might be good. Dunno how right now, tough.
Original comment by: helly
This doesn't appear to be working. I tried to use lemon output (with #line directives) as re2c input and large parts of the generated source got deleted. If I try feeding the output of running re2c on
(from the docs) back to re2c with
re2c example.c.re |re2c -
/* Generated by re2c 1.0.3 on Mon Jun 25 20:48:07 2018 */
(I'm on an older browser and I'm having some difficulties with the commenting tool, so apologies for the formatting.)
The correct behaviour was broken somewhere in between 0.16 and 1.0: re2c was forgetting to output the chunk of input file that precedes the #line directive. Reported by pskocik in #98.
2.0.3 (2020-08-22) ~~~~~~~~~~~~~~~~~~ - Fix issues when building re2c as a CMake subproject (`#302 <https://github.com/skvadrik/re2c/pull/302>`_: - Final corrections in the SIMPA article "RE2C: A lexer generator based on lookahead-TDFA", https://doi.org/10.1016/j.simpa.2020.100027 2.0.2 (2020-08-08) ~~~~~~~~~~~~~~~~~~ - Enable re2go building by default. - Package CMake files into release tarball. 2.0.1 (2020-07-29) ~~~~~~~~~~~~~~~~~~ - Updated version for CMake build system (forgotten in release 2.0). - Added a short article about re2c for the Software Impacts journal. 2.0 (2020-07-20) ~~~~~~~~~~~~~~~~ - Added new code generation backend for Go and a new ``re2go`` program (`#272 <https://github.com/skvadrik/re2c/issues/272>`_: Go support). Added option ``--lang <c | go>``. - Added CMake build system as an alternative to Autotools (`#275 <https://github.com/skvadrik/re2c/pull/275>`_: Add a CMake build system (thanks to ligfx), `#244 <https://github.com/skvadrik/re2c/issues/244>`_: Switching to CMake). - Changes in generic API: + Removed primitives ``YYSTAGPD`` and ``YYMTAGPD``. + Added primitives ``YYSHIFT``, ``YYSHIFTSTAG``, ``YYSHIFTMTAG`` that allow to express fixed tags in terms of generic API. + Added configurations ``re2c:api:style`` and ``re2c:api:sigil``. + Added named placeholders in interpolated configuration strings. - Changes in reuse mode (``-r, --reuse`` option): + Do not reset API-related configurations in each `use:re2c` block (`#291 <https://github.com/skvadrik/re2c/issues/291>`_: Defines in rules block are not propagated to use blocks). + Use block-local options instead of last block options. + Do not accumulate options from rules/reuse blocks in whole-program options. + Generate non-overlapping YYFILL labels for reuse blocks. + Generate start label for each reuse block in storable state mode. - Changes in start-conditions mode (``-c, --start-conditions`` option): + Allow to use normal (non-conditional) blocks in `-c` mode (`#263 <https://github.com/skvadrik/re2c/issues/263>`_: allow mixing conditional and non-conditional blocks with -c, `#296 <https://github.com/skvadrik/re2c/issues/296>`_: Conditions required for all lexers when using '-c' option). + Generate condition switch in every re2c block (`#295 <https://github.com/skvadrik/re2c/issues/295>`_: Condition switch generated for only one lexer per file). - Changes in the generated labels: + Use ``yyeof`` label prefix instead of ``yyeofrule``. + Use ``yyfill`` label prefix instead of ``yyFillLabel``. + Decouple start label and initial label (affects label numbering). - Removed undocumented configuration ``re2c
🎏o``, ``re2c 🎏output``. - Changes in ``re2c 🎏t``, ``re2c 🎏type-header`` configuration: filename is now relative to the output file directory. - Added option ``--case-ranges`` and configuration ``re2c 🎏case-ranges``. - Extended fixed tags optimization for the case of fixed-counter repetition. - Fixed bugs related to EOF rule: + `#276 <https://github.com/skvadrik/re2c/issues/276>`_: Example 01_fill.re in docs is broken + `#280 <https://github.com/skvadrik/re2c/issues/280>`_: EOF rules with multiple blocks + `#284 <https://github.com/skvadrik/re2c/issues/284>`_: mismatched YYBACKUP and YYRESTORE (Add missing fallback states with EOF rule) - Fixed miscellaneous bugs: + `#286 <https://github.com/skvadrik/re2c/issues/286>`_: Incorrect submatch values with fixed-length trailing context. + `#297 <https://github.com/skvadrik/re2c/issues/297>`_: configure error on ubuntu 18.04 / cmake 3.10 - Changed bootstrap process (require explicit configuration flags and a path to re2c executable to regenerate the lexers). - Added internal options ``--posix-prectable <naive | complex>``. - Added debug option ``--dump-dfa-tree``. - Major revision of the paper "Efficient POSIX submatch extraction on NFA". ---- 1.3x ---- 1.3 (2019-12-14) ~~~~~~~~~~~~~~~~ - Added option: ``--stadfa``. - Added warning: ``-Wsentinel-in-midrule``. - Added generic API primitives: + ``YYSTAGPD`` + ``YYMTAGPD`` - Added configurations: + ``re2c:sentinel = 0;`` + ``re2c:define:YYSTAGPD = "YYSTAGPD";`` + ``re2c:define:YYMTAGPD = "YYMTAGPD";`` - Worked on reproducible builds (`#258 <https://github.com/skvadrik/re2c/pull/258>`_: Make the build reproducible). ---- 1.2x ---- 1.2.1 (2019-08-11) ~~~~~~~~~~~~~~~~~~ - Fixed bug `#253 <https://github.com/skvadrik/re2c/issues/253>`_: re2c should install unicode_categories.re somewhere. - Fixed bug `#254 <https://github.com/skvadrik/re2c/issues/254>`_: Turn off re2c:eof = 0. 1.2 (2019-08-02) ~~~~~~~~~~~~~~~~ - Added EOF rule ``$`` and configuration ``re2c:eof``. - Added ``/*!include:re2c ... */`` directive and ``-I`` option. - Added ``/*!header:re2c:on*/`` and ``/*!header:re2c:off*/`` directives. - Added ``--input-encoding <ascii | utf8>`` option. + `#237 <https://github.com/skvadrik/re2c/issues/237>`_: Handle non-ASCII encoded characters in regular expressions + `#250 <https://github.com/skvadrik/re2c/issues/250>`_ UTF8 enoding - Added include file with a list of definitions for Unicode character classes. + `#235 <https://github.com/skvadrik/re2c/issues/235>`_: Unicode character classes - Added ``--location-format <gnu | msvc>`` option. + `#195 <https://github.com/skvadrik/re2c/issues/195>`_: Please consider using Gnu format for error messages - Added ``--verbose`` option that prints "success" message if re2c exits without errors. - Added configurations for options: + ``-o --output`` (specify output file) + ``-t --type-header`` (specify header file) - Removed configurations for internal/debug options. - Extended ``-r`` option: allow to mix multiple ``/*!rules:re2c*/``, ``/*!use:re2c*/`` and ``/*!re2c*/`` blocks. + `#55 <https://github.com/skvadrik/re2c/issues/55>`_: allow standard re2c blocks in reuse mode - Fixed ``-F --flex-support`` option: parsing and operator precedence. + `#229 <https://github.com/skvadrik/re2c/issues/229>`_: re2c option -F (flex syntax) broken + `#242 <https://github.com/skvadrik/re2c/issues/242>`_: Operator precedence with --flex-syntax is broken - Changed difference operator ``/`` to apply before encoding expansion of operands. + `#236 <https://github.com/skvadrik/re2c/issues/236>`_: Support range difference with variable-length encodings - Changed output generation of output file to be atomic. + `#245 <https://github.com/skvadrik/re2c/issues/245>`_: re2c output is not atomic - Authored research paper "Efficient POSIX Submatch Extraction on NFA" together with Dr Angelo Borsotti. - Added experimental libre2c library (``--enable-libs`` configure option) with the following algorithms: + TDFA with leftmost-greedy disambiguation + TDFA with POSIX disambiguation (Okui-Suzuki algorithm) + TNFA with leftmost-greedy disambiguation + TNFA with POSIX disambiguation (Okui-Suzuki algorithm) + TNFA with lazy POSIX disambiguation (Okui-Suzuki algorithm) + TNFA with POSIX disambiguation (Kuklewicz algorithm) + TNFA with POSIX disambiguation (Cox algorithm) - Added debug subsystem (``--enable-debug`` configure option) and new debug options: + ``-dump-cfg`` (dump control flow graph of tag variables) + ``-dump-interf`` (dump interference table of tag variables) + ``-dump-closure-stats`` (dump epsilon-closure statistics) - Added internal options: + ``--posix-closure <gor1 | gtop>`` (switch between shortest-path algorithms used for the construction of POSIX closure) - Fixed a number of crashes found by American Fuzzy Lop fuzzer: + `#226 <https://github.com/skvadrik/re2c/issues/226>`_, `#227 <https://github.com/skvadrik/re2c/issues/227>`_, `#228 <https://github.com/skvadrik/re2c/issues/228>`_, `#231 <https://github.com/skvadrik/re2c/issues/231>`_, `#232 <https://github.com/skvadrik/re2c/issues/232>`_, `#233 <https://github.com/skvadrik/re2c/issues/233>`_, `#234 <https://github.com/skvadrik/re2c/issues/234>`_, `#238 <https://github.com/skvadrik/re2c/issues/238>`_ - Fixed handling of newlines: + correctly parse multi-character newlines CR LF in ``#line`` directives + consistently convert all newlines in the generated file to Unix-style LF - Changed default tarball format from .gz to .xz. + `#221 <https://github.com/skvadrik/re2c/issues/221>`_: big source tarball - Fixed a number of other bugs and resolved issues: + `#2 <https://github.com/skvadrik/re2c/issues/2>`_: abort + `#6 <https://github.com/skvadrik/re2c/issues/6>`_: segfault + `#10 <https://github.com/skvadrik/re2c/issues/10>`_: lessons/002_upn_calculator/calc_002 doesn't produce a useful example program + `#44 <https://github.com/skvadrik/re2c/issues/44>`_: Access violation when translating the attached file + `#49 <https://github.com/skvadrik/re2c/issues/49>`_: wildcard state \000 rules makes lexer behave weard + `#98 <https://github.com/skvadrik/re2c/issues/98>`_: Transparent handling of #line directives in input files + `#104 <https://github.com/skvadrik/re2c/issues/104>`_: Improve const-correctness + `#105 <https://github.com/skvadrik/re2c/issues/105>`_: Conversion of pointer parameters into references + `#114 <https://github.com/skvadrik/re2c/issues/114>`_: Possibility of fixing bug 2535084 + `#120 <https://github.com/skvadrik/re2c/issues/120>`_: condition consisting of default rule only is ignored + `#167 <https://github.com/skvadrik/re2c/issues/167>`_: Add word boundary support + `#168 <https://github.com/skvadrik/re2c/issues/168>`_: Wikipedia's article on re2c + `#180 <https://github.com/skvadrik/re2c/issues/180>`_: Comment syntax? + `#182 <https://github.com/skvadrik/re2c/issues/182>`_: yych being set by YYPEEK () and then not used + `#196 <https://github.com/skvadrik/re2c/issues/196>`_: Implicit type conversion warnings + `#198 <https://github.com/skvadrik/re2c/issues/198>`_: no match for ‘operator!=’ in ‘i != std::vector<_Tp, _Alloc>::rend() [with _Tp = re2c::bitmap_t, _Alloc = std::allocator<re2c::bitmap_t>]()’ + `#210 <https://github.com/skvadrik/re2c/issues/210>`_: How to build re2c in windows? + `#215 <https://github.com/skvadrik/re2c/issues/215>`_: A memory read overrun issue in s_to_n32_unsafe.cc + `#220 <https://github.com/skvadrik/re2c/issues/220>`_: src/dfa/dfa.h: simplify constructor to avoid g++-3.4 bug + `#223 <https://github.com/skvadrik/re2c/issues/223>`_: Fix typo + `#224 <https://github.com/skvadrik/re2c/issues/224>`_: src/dfa/closure_posix.cc: pack() tweaks + `#225 <https://github.com/skvadrik/re2c/issues/225>`_: Documentation link is broken in libre2c/README + `#230 <https://github.com/skvadrik/re2c/issues/230>`_: Changes for upcoming Travis' infra migration + `#239 <https://github.com/skvadrik/re2c/issues/239>`_: Push model example has wrong re2c invocation, breaks guide + `#241 <https://github.com/skvadrik/re2c/issues/241>`_: Guidance on how to use re2c for full-duplex command & response protocol + `#243 <https://github.com/skvadrik/re2c/issues/243>`_: A code generated for period (.) requires 4 bytes + `#246 <https://github.com/skvadrik/re2c/issues/246>`_: Please add a license to this repo + `#247 <https://github.com/skvadrik/re2c/issues/247>`_: Build failure on current Cygwin, probably caused by force-fed c++98 mode + `#248 <https://github.com/skvadrik/re2c/issues/248>`_: distcheck still looks for README + `#251 <https://github.com/skvadrik/re2c/issues/251>`_: Including what you use is find, but not without inclusion guards - Updated documentation and website.