Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build failure on current Cygwin, probably caused by force-fed c++98 mode #247

Closed
HBBroeker opened this issue Apr 5, 2019 · 5 comments
Closed

Comments

@HBBroeker
Copy link

@HBBroeker HBBroeker commented Apr 5, 2019

Current git HEAD fails to build on current Cygwin, because fdopen() is not found. That's because configure.ac injects --std=c++98 into the CXXFLAGS, which has two effects: first, it nails the language version to C++98, which I'll assume is what you want it to do. But secondly it also turns off all non-ISO-C++ language and library features. As a result, POSIX libc functions like fdopen() are no longer visible to the compiled code, and that fails the build of src/util/temp_file.cc.

I see three possible solutions:

  1. don't inject any --std at all --- the default state of that option is most likely better anyway
  2. if you really have to, inject --std=gnu++98 instead, which will not have the unwieldy side effect of turning off all language and library extensions
  3. if you're absolutely sure it has to be --std=c++98, at least consider adding a mechanism like autoconf's standard AC_USE_SYSTEM_EXTENSIONS that re-enable the extensions
skvadrik added a commit that referenced this issue Apr 6, 2019
…OSIX extensions on Cygwin.

Attempted fix for #247.
@skvadrik
Copy link
Owner

@skvadrik skvadrik commented Apr 6, 2019

I'm trying to understand why --std=c++98 does not have the same effect on other POSIX systems (Linux for example). If it turns off non-ISO-C++ features, why does the compiler (I assume GCC) succeed on Linux? The declaration of fdopen is in <stdio.h>. I assume the header was found in both cases, but then some macro like _POSIX_C_SOURCE was not defined on Cygwin.

The goal of having --std=c++98 is to increase portability on systems that have old (pre C++11) compilers. Of course, if it starts to be an obstacle, then it has the opposite effect (portability is decreased). But it would be better to understand why the different behaviour on Cygwin.

I'll go with AC_USE_SYSTEM_EXTENSIONS for now: c32d433, let me know if it doesn't work.

Loading

@HBBroeker
Copy link
Author

@HBBroeker HBBroeker commented Apr 6, 2019

The goal of having --std=c++98 is to increase portability on systems that have old (pre C++11) compilers.

But these days, it has almost entirely adverse effects:

  1. it dials the C++ standard down to C++98, even if the compiler would ordinarily support C++11 or newer.
  2. it disables all language extensions, including system library functions not part of the ISO C++ standard.

Of course, if it starts to be an obstacle, then it has the opposite effect (portability is decreased).

Portability is in fact increased too much, by setting it to C++98. The fact of the matter is that the source code is not written in strictly pure C++98 --- if it were, you would not need any autoconf tests at all.

But the source does use some POSIX-specific extensions, particularly fdopen(). Once you dialed down all the way to --std=c++98, you have to dial back up to having those available somehow. The absolute minimum on systems that have glibc-style feature test macros is to define _POSIX_C_SOURCE to some suitable value.

AC_USE_SYSTEM_EXTENSIONS effectively turns on _GNU_SOURCE, which brings you back to the same state, library feature-wise, you would have been with --std=gnu++98, or no --std option at all

Loading

@trofi
Copy link
Collaborator

@trofi trofi commented Apr 8, 2019

None of the available C++ standards define fdopen. -std= option is a distraction here.

fdopen() is a POSIX.1-2001 (and above) library call. Which makes sense as fdopen assumes file descriptors exist and are used in file API.

Mechanically -std=c++98 works like that:

$ diff -U0 <(g++ -x c++ -dM -E - </dev/null) <(g++ -x c++ -std=c++98 -dM -E - </dev/null)
...
#define __STRICT_ANSI__ 1

glibc happens to enable POSIX extensions by default:

/* If nothing (other than _GNU_SOURCE and _DEFAULT_SOURCE) is defined,
   define _DEFAULT_SOURCE.  */
#if (defined _DEFAULT_SOURCE                                    \
     || (!defined __STRICT_ANSI__                               \
         && !defined _ISOC99_SOURCE && !defined _ISOC11_SOURCE  \
         && !defined _POSIX_SOURCE && !defined _POSIX_C_SOURCE  \
         && !defined _XOPEN_SOURCE))
# undef  _DEFAULT_SOURCE
# define _DEFAULT_SOURCE        1
#endif
...
/* If none of the ANSI/POSIX macros are defined, or if _DEFAULT_SOURCE
   is defined, use POSIX.1-2008 (or another version depending on
   _XOPEN_SOURCE).  */
#ifdef _DEFAULT_SOURCE
# if !defined _POSIX_SOURCE && !defined _POSIX_C_SOURCE
#  define __USE_POSIX_IMPLICITLY        1
# endif
# undef  _POSIX_SOURCE
# define _POSIX_SOURCE  1
# undef  _POSIX_C_SOURCE
# define _POSIX_C_SOURCE        200809L
#endif
...
#if (defined _POSIX_SOURCE                                      \
     || (defined _POSIX_C_SOURCE && _POSIX_C_SOURCE >= 1)       \
     || defined _XOPEN_SOURCE)
# define __USE_POSIX    1
#endif
...
#ifdef __USE_POSIX
/* Create a new stream that refers to an existing system file descriptor.  */
extern FILE *fdopen (int __fd, const char *__modes) __THROW __wur;
#endif

Note: __STRICT_ANSI__ disables most of system-specific defaults (desired behaviour for re2c to ease porting). AC_USE_SYSTEM_EXTENSIONS (https://www.gnu.org/software/autoconf/manual/autoconf-2.64/html_node/Posix-Variants.html) is the correct way to use fdopen explicitly.

Also note: all the above is glibc's policy implementation. Other libcs (say, non-POSIX targets) can chose not to export anything out of C++ language unless _POSIX_C_SOURCE is defined by user expicitly.

Loading

@HBBroeker
Copy link
Author

@HBBroeker HBBroeker commented Apr 8, 2019

None of the available C++ standards define fdopen. -std= option is a distraction here.

Not really. It's the core of the issue. Without any --std flag, or with --std=gnu++98, the issue would not exist, because that's the primary flag that forcibly turns off all the extensions. If they hadn't been turned off, there would be no need to turn them on again.

glibc happens to enable POSIX extensions by default:

If it does, that's not by any of the code snippets you show.

/* If nothing (other than _GNU_SOURCE and _DEFAULT_SOURCE) is defined,

This doesn't activate, because something is defined: __STRICT_ANSI__

/* If none of the ANSI/POSIX macros are defined, or if _DEFAULT_SOURCE
is defined, use POSIX.1-2008 (or another version depending on
_XOPEN_SOURCE). */

Consequentially, this does not trigger either.

Also note: all the above is glibc's policy implementation.

Precisely. Arguably, it's a bug in glibc if it enables _POSIX_C_SOURCE in the face of __STRICT_ANSI__. At the very least it's a rather wilful interpretation of the documented meaning of --std=c++98, compared to --std=gnu++98. And since Cygwin doesn't use glibc, it's not entirely surprising that its behaviour is, indeed, sufficiently different to cause a build failure.

Loading

@skvadrik
Copy link
Owner

@skvadrik skvadrik commented Apr 8, 2019

@HBBroeker , as you say, re2c source is not pure C++98, but only library-wise, not language-wise. We do use non-standard library functions, but we also do keep the language subset within C++98 bounds (or try to). So my reasoning here is the following:

  • Dropping --std=... altogether doesn't seem right because it drops language restrictions.
  • Using --std=gnu++98 doesn't seem right because it adds some language features on top of C++98.
  • Using AC_USE_SYSTEM_EXTENSIONS seems ok, because it disentangles language and library extensions and enables only the latter (if I understand it correctly).

However, there is no need to enforce --std=... on re2c users or distro-maintainers. If it becomes too much of a trouble, it can go to some developer build script. Adding it to default flags is primarily to set expectations for all the people who clone or fork re2c, build it from source with simple configure && make and do not necessarily look into every script.

The true Autotools way is probably to test for each particular non-standard function, not for headers as we do.

Loading

@skvadrik skvadrik closed this Jun 10, 2019
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Sep 20, 2020
2.0.3 (2020-08-22)
~~~~~~~~~~~~~~~~~~

- Fix issues when building re2c as a CMake subproject
  (`#302 <https://github.com/skvadrik/re2c/pull/302>`_:

- Final corrections in the SIMPA article "RE2C: A lexer generator based on
  lookahead-TDFA", https://doi.org/10.1016/j.simpa.2020.100027

2.0.2 (2020-08-08)
~~~~~~~~~~~~~~~~~~

- Enable re2go building by default.

- Package CMake files into release tarball.

2.0.1 (2020-07-29)
~~~~~~~~~~~~~~~~~~

- Updated version for CMake build system (forgotten in release 2.0).

- Added a short article about re2c for the Software Impacts journal.

2.0 (2020-07-20)
~~~~~~~~~~~~~~~~

- Added new code generation backend for Go and a new ``re2go`` program
  (`#272 <https://github.com/skvadrik/re2c/issues/272>`_: Go support).
  Added option ``--lang <c | go>``.

- Added CMake build system as an alternative to Autotools
  (`#275 <https://github.com/skvadrik/re2c/pull/275>`_:
  Add a CMake build system (thanks to ligfx),
  `#244 <https://github.com/skvadrik/re2c/issues/244>`_: Switching to CMake).

- Changes in generic API:

  + Removed primitives ``YYSTAGPD`` and ``YYMTAGPD``.
  + Added primitives ``YYSHIFT``, ``YYSHIFTSTAG``, ``YYSHIFTMTAG``
    that allow to express fixed tags in terms of generic API.
  + Added configurations ``re2c:api:style`` and ``re2c:api:sigil``.
  + Added named placeholders in interpolated configuration strings.

- Changes in reuse mode (``-r, --reuse`` option):

  + Do not reset API-related configurations in each `use:re2c` block
    (`#291 <https://github.com/skvadrik/re2c/issues/291>`_:
    Defines in rules block are not propagated to use blocks).
  + Use block-local options instead of last block options.
  + Do not accumulate options from rules/reuse blocks in whole-program options.
  + Generate non-overlapping YYFILL labels for reuse blocks.
  + Generate start label for each reuse block in storable state mode.

- Changes in start-conditions mode (``-c, --start-conditions`` option):

  + Allow to use normal (non-conditional) blocks in `-c` mode
    (`#263 <https://github.com/skvadrik/re2c/issues/263>`_:
    allow mixing conditional and non-conditional blocks with -c,
    `#296 <https://github.com/skvadrik/re2c/issues/296>`_:
    Conditions required for all lexers when using '-c' option).
  + Generate condition switch in every re2c block
    (`#295 <https://github.com/skvadrik/re2c/issues/295>`_:
    Condition switch generated for only one lexer per file).

- Changes in the generated labels:

  + Use ``yyeof`` label prefix instead of ``yyeofrule``.
  + Use ``yyfill`` label prefix instead of ``yyFillLabel``.
  + Decouple start label and initial label (affects label numbering).

- Removed undocumented configuration ``re2c🎏o``, ``re2c🎏output``.

- Changes in ``re2c🎏t``, ``re2c🎏type-header`` configuration:
  filename is now relative to the output file directory.

- Added option ``--case-ranges`` and configuration ``re2c🎏case-ranges``.

- Extended fixed tags optimization for the case of fixed-counter repetition.

- Fixed bugs related to EOF rule:

  + `#276 <https://github.com/skvadrik/re2c/issues/276>`_:
    Example 01_fill.re in docs is broken
  + `#280 <https://github.com/skvadrik/re2c/issues/280>`_:
    EOF rules with multiple blocks
  + `#284 <https://github.com/skvadrik/re2c/issues/284>`_:
    mismatched YYBACKUP and YYRESTORE
    (Add missing fallback states with EOF rule)

- Fixed miscellaneous bugs:

  + `#286 <https://github.com/skvadrik/re2c/issues/286>`_:
    Incorrect submatch values with fixed-length trailing context.
  + `#297 <https://github.com/skvadrik/re2c/issues/297>`_:
    configure error on ubuntu 18.04 / cmake 3.10

- Changed bootstrap process (require explicit configuration flags and a path to
  re2c executable to regenerate the lexers).

- Added internal options ``--posix-prectable <naive | complex>``.

- Added debug option ``--dump-dfa-tree``.

- Major revision of the paper "Efficient POSIX submatch extraction on NFA".

----
1.3x
----

1.3 (2019-12-14)
~~~~~~~~~~~~~~~~

- Added option: ``--stadfa``.

- Added warning: ``-Wsentinel-in-midrule``.

- Added generic API primitives:

  + ``YYSTAGPD``
  + ``YYMTAGPD``

- Added configurations:

  + ``re2c:sentinel = 0;``
  + ``re2c:define:YYSTAGPD = "YYSTAGPD";``
  + ``re2c:define:YYMTAGPD = "YYMTAGPD";``

- Worked on reproducible builds
  (`#258 <https://github.com/skvadrik/re2c/pull/258>`_:
  Make the build reproducible).

----
1.2x
----

1.2.1 (2019-08-11)
~~~~~~~~~~~~~~~~~~

- Fixed bug `#253 <https://github.com/skvadrik/re2c/issues/253>`_:
  re2c should install unicode_categories.re somewhere.

- Fixed bug `#254 <https://github.com/skvadrik/re2c/issues/254>`_:
  Turn off re2c:eof = 0.

1.2 (2019-08-02)
~~~~~~~~~~~~~~~~

- Added EOF rule ``$`` and configuration ``re2c:eof``.

- Added ``/*!include:re2c ... */`` directive and ``-I`` option.

- Added ``/*!header:re2c:on*/`` and ``/*!header:re2c:off*/`` directives.

- Added ``--input-encoding <ascii | utf8>`` option.

  + `#237 <https://github.com/skvadrik/re2c/issues/237>`_:
    Handle non-ASCII encoded characters in regular expressions
  + `#250 <https://github.com/skvadrik/re2c/issues/250>`_
    UTF8 enoding

- Added include file with a list of definitions for Unicode character classes.

  + `#235 <https://github.com/skvadrik/re2c/issues/235>`_:
    Unicode character classes

- Added ``--location-format <gnu | msvc>`` option.

  + `#195 <https://github.com/skvadrik/re2c/issues/195>`_:
    Please consider using Gnu format for error messages

- Added ``--verbose`` option that prints "success" message if re2c exits
  without errors.

- Added configurations for options:

  + ``-o --output`` (specify output file)
  + ``-t --type-header`` (specify header file)

- Removed configurations for internal/debug options.

- Extended ``-r`` option: allow to mix multiple ``/*!rules:re2c*/``,
  ``/*!use:re2c*/`` and ``/*!re2c*/`` blocks.

  + `#55 <https://github.com/skvadrik/re2c/issues/55>`_:
    allow standard re2c blocks in reuse mode

- Fixed ``-F --flex-support`` option: parsing and operator precedence.

  + `#229 <https://github.com/skvadrik/re2c/issues/229>`_:
    re2c option -F (flex syntax) broken
  + `#242 <https://github.com/skvadrik/re2c/issues/242>`_:
    Operator precedence with --flex-syntax is broken

- Changed difference operator ``/`` to apply before encoding expansion of
  operands.

  + `#236 <https://github.com/skvadrik/re2c/issues/236>`_:
    Support range difference with variable-length encodings

- Changed output generation of output file to be atomic.

  + `#245 <https://github.com/skvadrik/re2c/issues/245>`_:
    re2c output is not atomic

- Authored research paper "Efficient POSIX Submatch Extraction on NFA"
  together with Dr Angelo Borsotti.

- Added experimental libre2c library (``--enable-libs`` configure option) with
  the following algorithms:

  + TDFA with leftmost-greedy disambiguation
  + TDFA with POSIX disambiguation (Okui-Suzuki algorithm)
  + TNFA with leftmost-greedy disambiguation
  + TNFA with POSIX disambiguation (Okui-Suzuki algorithm)
  + TNFA with lazy POSIX disambiguation (Okui-Suzuki algorithm)
  + TNFA with POSIX disambiguation (Kuklewicz algorithm)
  + TNFA with POSIX disambiguation (Cox algorithm)

- Added debug subsystem (``--enable-debug`` configure option) and new debug
  options:

  + ``-dump-cfg`` (dump control flow graph of tag variables)
  + ``-dump-interf`` (dump interference table of tag variables)
  + ``-dump-closure-stats`` (dump epsilon-closure statistics)

- Added internal options:

  + ``--posix-closure <gor1 | gtop>`` (switch between shortest-path algorithms
    used for the construction of POSIX closure)

- Fixed a number of crashes found by American Fuzzy Lop fuzzer:

  + `#226 <https://github.com/skvadrik/re2c/issues/226>`_,
    `#227 <https://github.com/skvadrik/re2c/issues/227>`_,
    `#228 <https://github.com/skvadrik/re2c/issues/228>`_,
    `#231 <https://github.com/skvadrik/re2c/issues/231>`_,
    `#232 <https://github.com/skvadrik/re2c/issues/232>`_,
    `#233 <https://github.com/skvadrik/re2c/issues/233>`_,
    `#234 <https://github.com/skvadrik/re2c/issues/234>`_,
    `#238 <https://github.com/skvadrik/re2c/issues/238>`_

- Fixed handling of newlines:

  + correctly parse multi-character newlines CR LF in ``#line`` directives
  + consistently convert all newlines in the generated file to Unix-style LF

- Changed default tarball format from .gz to .xz.

  + `#221 <https://github.com/skvadrik/re2c/issues/221>`_:
    big source tarball

- Fixed a number of other bugs and resolved issues:

  + `#2 <https://github.com/skvadrik/re2c/issues/2>`_: abort
  + `#6 <https://github.com/skvadrik/re2c/issues/6>`_: segfault
  + `#10 <https://github.com/skvadrik/re2c/issues/10>`_:
    lessons/002_upn_calculator/calc_002 doesn't produce a useful example program
  + `#44 <https://github.com/skvadrik/re2c/issues/44>`_:
    Access violation when translating the attached file
  + `#49 <https://github.com/skvadrik/re2c/issues/49>`_:
    wildcard state \000 rules makes lexer behave weard
  + `#98 <https://github.com/skvadrik/re2c/issues/98>`_:
    Transparent handling of #line directives in input files
  + `#104 <https://github.com/skvadrik/re2c/issues/104>`_:
    Improve const-correctness
  + `#105 <https://github.com/skvadrik/re2c/issues/105>`_:
    Conversion of pointer parameters into references
  + `#114 <https://github.com/skvadrik/re2c/issues/114>`_:
    Possibility of fixing bug 2535084
  + `#120 <https://github.com/skvadrik/re2c/issues/120>`_:
    condition consisting of default rule only is ignored
  + `#167 <https://github.com/skvadrik/re2c/issues/167>`_:
    Add word boundary support
  + `#168 <https://github.com/skvadrik/re2c/issues/168>`_:
    Wikipedia's article on re2c
  + `#180 <https://github.com/skvadrik/re2c/issues/180>`_:
    Comment syntax?
  + `#182 <https://github.com/skvadrik/re2c/issues/182>`_:
    yych being set by YYPEEK () and then not used
  + `#196 <https://github.com/skvadrik/re2c/issues/196>`_:
    Implicit type conversion warnings
  + `#198 <https://github.com/skvadrik/re2c/issues/198>`_:
    no match for ‘operator!=’ in ‘i != std::vector<_Tp, _Alloc>::rend() [with _Tp = re2c::bitmap_t, _Alloc = std::allocator<re2c::bitmap_t>]()’
  + `#210 <https://github.com/skvadrik/re2c/issues/210>`_:
    How to build re2c in windows?
  + `#215 <https://github.com/skvadrik/re2c/issues/215>`_:
    A memory read overrun issue in s_to_n32_unsafe.cc
  + `#220 <https://github.com/skvadrik/re2c/issues/220>`_:
    src/dfa/dfa.h: simplify constructor to avoid g++-3.4 bug
  + `#223 <https://github.com/skvadrik/re2c/issues/223>`_:
    Fix typo
  + `#224 <https://github.com/skvadrik/re2c/issues/224>`_:
    src/dfa/closure_posix.cc: pack() tweaks
  + `#225 <https://github.com/skvadrik/re2c/issues/225>`_:
    Documentation link is broken in libre2c/README
  + `#230 <https://github.com/skvadrik/re2c/issues/230>`_:
    Changes for upcoming Travis' infra migration
  + `#239 <https://github.com/skvadrik/re2c/issues/239>`_:
    Push model example has wrong re2c invocation, breaks guide
  + `#241 <https://github.com/skvadrik/re2c/issues/241>`_:
    Guidance on how to use re2c for full-duplex command & response protocol
  + `#243 <https://github.com/skvadrik/re2c/issues/243>`_:
    A code generated for period (.) requires 4 bytes
  + `#246 <https://github.com/skvadrik/re2c/issues/246>`_:
    Please add a license to this repo
  + `#247 <https://github.com/skvadrik/re2c/issues/247>`_:
    Build failure on current Cygwin, probably caused by force-fed c++98 mode
  + `#248 <https://github.com/skvadrik/re2c/issues/248>`_:
    distcheck still looks for README
  + `#251 <https://github.com/skvadrik/re2c/issues/251>`_:
    Including what you use is find, but not without inclusion guards

- Updated documentation and website.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants