Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switching to CMake #244

Closed
unixod opened this issue Feb 11, 2019 · 12 comments
Closed

Switching to CMake #244

unixod opened this issue Feb 11, 2019 · 12 comments

Comments

@unixod
Copy link
Contributor

unixod commented Feb 11, 2019

What do you think about switching re2c to using CMake build system?

Although the codebase of re2c is cross-platform, the currently used build system breaks portability of the project by imposing hard dependency on GNU toolchain. Switching to CMake may free the project from this dependency and make it possible to build re2c using native compilers on supported OSes. In other words, with CMake the build system in it self will become cross-platform and more clear.

A few days ago, I partially ported re2c to CMake. I call it partially because I didn't enable unit tests and I used sources from bootstrap directory. The only obstacle I encountered during the poring was the necessity to get rid of c99 stdint.h file, because this file relies on config.h. I simply removed all content in c99 stdint.h and left only #include <stdint.h> which seems admissible solution.

So, if you want I may try to contribute to this direction.

@skvadrik
Copy link
Owner

skvadrik commented Feb 11, 2019

Eh, people are asking about CMake build system from time to time... I can't say anything for sure, but let's give it a try.

The difficulty is in the details. Bootstrap is requires custom rules, but I think it can be done. The c99_stdint.h header is important for pre-C++11 compatibility (re2c is sometimes used in strange or old environments), so we need to keep it. There is a number of other checks, e.g. for compiler flags, that I think will require custom modules. There are tests and docs. As of recent, there is also libre2c that needs to be built into static and shared library. Finally, there are tricky interdependencies between various autogenerated files that I'm not sure I can port to CMake that easily.

If you create a pull request against master, I will pull it and see how it goes.

Thanks for the porting effort!

@unixod
Copy link
Contributor Author

unixod commented Feb 12, 2019

Ulya, thanks for your willingness to switch build system to CMake.

I can prepare a PR, but I still don't know how to preserve all content of c99_stdint.h. If I find some admissible workaround how to preserve this file I'll prepare PR.

By the way, if you find some way to omit that file, or realize that the file actually isn't used by clients, let me know, I'll try to prepare PR.

@unixod unixod closed this as completed Feb 12, 2019
@skvadrik
Copy link
Owner

skvadrik commented Feb 13, 2019

I still don't know how to preserve all content of c99_stdint.h

The problem is the dependency of c99_stdint.h on config.h, which is generated by configure script and contains various defines such as HAVE_STDINT_H, PACKAGE_VERSION, etc. To port this stuff on CMake, we need to find equivalent functionality in CMake (ability to to test for various headers, programs, patch files, etc. when configuring the package). Re2c does not need much, but that what is used in configure.ac should be ported.

I have some further questions:

  • Current build system uses bash scripts for testing and building documentation. As I understand, with CMake on linux we can continue using these scripts, but on windows there is no bash, so neither tests nor documentation would work. What is the canonical approach here, rewrite all scripts in (a portable subset of) python?

  • Are there CMake analogues to make distcheck (command that builds and tests the release tarball)?

  • Cross-compilation currently it is as simple as configure --host i686-w64-mingw32, can that be done in CMake?

  • There is a bunch of scripts __build_*.sh that build re2c in various configurations by passing parameters to configure. Can they be ported to CMake?

@unixod
Copy link
Contributor Author

unixod commented Feb 19, 2019

Hi Ulya,

Sorry for the late response)

The problem is the dependency of c99_stdint.h on config.h, which is generated by configure script and contains various defines such as HAVE_STDINT_H, PACKAGE_VERSION, etc.

Yes, CMake provides facilities to test and extract certain properties of the target environment (such as size of types, existence of headers, availability of compiler flags, etc.). If I remember correctly, when I porting this part of re2c's build system, I stuck in determing size of oi8 (because I simply didn't recognize this type). If you help me to clarify what is this type, I may try to implement the generating of config.h in CMake.

By answering to your questions:

  • As I understand, re2c's build system uses rst2man to generate documentation. CMake allows running external programs in crossplatform maner, hence in CMake to run rst2man and generate documentation we don't need to use Bash.
    With regard to testing the documentation. I haven't yet delved into this portion of re2c's build system, therefore before answering to this question, I need to understand how documentation is tested. I would appreciate if you help me to clarify this process.

  • CMake allows creating packages with sources and with binaries in many different formats (zip, tgz, ..., deb, rpm, msi, etc.). This part of CMake is called CPack. With regard to testing of release tarball, hmm... perhaps it is possible, but first I need to understand the nature of this process. Isn't this as same as testing of sources from which the tarball is generated?

  • Crosscompiling is possible in CMake. For that purpose, we need to write toolchain file with a few variables. You may see some examples here: https://cmake.org/cmake/help/v3.6/manual/cmake-toolchains.7.html#cross-compiling

  • Yes, there are many ways to port them to CMake.

@skvadrik
Copy link
Owner

skvadrik commented Feb 19, 2019

Sorry for the late response)

It's ok! I'll just reopen the issue for now. If you have some time to work on it, great, but discussion is also useful.

I [got] stuck in determing size of oi8 (because I simply didn't recognize this type). If you help me to clarify what is this type, I may try to implement the generating of config.h in CMake.

It's not oi8, it is 0i8: zero literal with suffix i8 to disambiguate size. This suffix is non-standard and probably comes from MSVC, together with __int64. It may be needed on some old MSVC version that predates the existence of stdint.h and long long, and where size of long is 4 bytes. On such platform the only 64-bit signed integer type would be __int64, and the corresponding suffix would be i8.

I expect that CMake, like Autoconf, would be able to check for the existence of any type X, not just some set of predefined types (configure does this by trying to compile a small program that uses X).

As I understand, re2c's build system uses rst2man to generate documentation. CMake allows running external programs in crossplatform maner, hence in CMake to run rst2man and generate documentation we don't need to use Bash.

This is, unfortunately, not so trivial: we also have a shell script genhelp.sh that generates source file help.cc from manpage. This is needed to avoid manual copy-pasting and to keep manpage and help message in sync (and also other docs in gh-pages-gen branch).

@skvadrik skvadrik reopened this Feb 20, 2019
@skvadrik
Copy link
Owner

skvadrik commented Feb 20, 2019

I need to understand how documentation is tested.

It's not tested in any automatic way.

Usually it doesn't change at all, unless you provide --enable-docs option to configure. If you do, and if you touch the contents of doc folder, manpage doc/re2c.1 and source file help.cc will be regenerated using rst2man and local script genhelp.sh, and their bootstrap versions will be updated.

We only enable doc regeneration conditionally with --enable-docs because we want re2c to build on platforms that don't have rst2man.

With regard to testing of release tarball, hmm... perhaps it is possible, but first I need to understand the nature of this process. Isn't this as same as testing of sources from which the tarball is generated?

Not quite: it also checks that the tarball is correct. There is absolutely no guarantee that we package what we test: e.g. a common mistake is when people forget to package some file (it builds fine on their system, because the file is already in place, but it breaks when testing a freshly unpacked tarball). make distcheck also takes other sanity precautions like making source files read-only to ensure default build doesn't mangle them.

@unixod
Copy link
Contributor Author

unixod commented Feb 21, 2019

With regard to 0i8, now I see) Thanks for clarification!

I expect that CMake, like Autoconf, would be able to check for the existence of any type X

Yes, sure.

As I see for now the main complexity is concentrated around the logic which is responsible for building docs and boostrap target, but I think this can be solved. I'll try to see what I can to do.

@nightlark
Copy link
Contributor

nightlark commented May 1, 2019

Here's a partial CMakeLists.txt from a few years ago: https://github.com/nightlark/re2c/blob/master/re2c/CMakeLists.txt

I don't entirely remember where it's at (doesn't do tests/docs or use yacc/bison), but it has at least some of the checks for generating the config.h file, and the bootstrap process used in the existing Makefiles.

@skvadrik
Copy link
Owner

skvadrik commented May 2, 2019

@nightlark Thanks for sharing. In two years re2c has changed, but it's good to have a starting point.

@ligfx
Copy link
Contributor

ligfx commented Mar 31, 2020

Took a crack at this in #275 . It generates config.h, can update docs, does the bootstrap cycle correctly (I'm pretty sure), and builds on Unix and Windows. I couldn't come up with an answer to "make distcheck": CMake doesn't really support this, since all you really need is a Git checkout/tarball.

@skvadrik
Copy link
Owner

skvadrik commented Apr 1, 2020

@ligfx Thanks! Please see the comments on #275.

@skvadrik
Copy link
Owner

skvadrik commented Apr 7, 2020

CMake build system has been added: #275, closing the bug.

@skvadrik skvadrik closed this as completed Apr 7, 2020
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Sep 20, 2020
2.0.3 (2020-08-22)
~~~~~~~~~~~~~~~~~~

- Fix issues when building re2c as a CMake subproject
  (`#302 <https://github.com/skvadrik/re2c/pull/302>`_:

- Final corrections in the SIMPA article "RE2C: A lexer generator based on
  lookahead-TDFA", https://doi.org/10.1016/j.simpa.2020.100027

2.0.2 (2020-08-08)
~~~~~~~~~~~~~~~~~~

- Enable re2go building by default.

- Package CMake files into release tarball.

2.0.1 (2020-07-29)
~~~~~~~~~~~~~~~~~~

- Updated version for CMake build system (forgotten in release 2.0).

- Added a short article about re2c for the Software Impacts journal.

2.0 (2020-07-20)
~~~~~~~~~~~~~~~~

- Added new code generation backend for Go and a new ``re2go`` program
  (`#272 <https://github.com/skvadrik/re2c/issues/272>`_: Go support).
  Added option ``--lang <c | go>``.

- Added CMake build system as an alternative to Autotools
  (`#275 <https://github.com/skvadrik/re2c/pull/275>`_:
  Add a CMake build system (thanks to ligfx),
  `#244 <https://github.com/skvadrik/re2c/issues/244>`_: Switching to CMake).

- Changes in generic API:

  + Removed primitives ``YYSTAGPD`` and ``YYMTAGPD``.
  + Added primitives ``YYSHIFT``, ``YYSHIFTSTAG``, ``YYSHIFTMTAG``
    that allow to express fixed tags in terms of generic API.
  + Added configurations ``re2c:api:style`` and ``re2c:api:sigil``.
  + Added named placeholders in interpolated configuration strings.

- Changes in reuse mode (``-r, --reuse`` option):

  + Do not reset API-related configurations in each `use:re2c` block
    (`#291 <https://github.com/skvadrik/re2c/issues/291>`_:
    Defines in rules block are not propagated to use blocks).
  + Use block-local options instead of last block options.
  + Do not accumulate options from rules/reuse blocks in whole-program options.
  + Generate non-overlapping YYFILL labels for reuse blocks.
  + Generate start label for each reuse block in storable state mode.

- Changes in start-conditions mode (``-c, --start-conditions`` option):

  + Allow to use normal (non-conditional) blocks in `-c` mode
    (`#263 <https://github.com/skvadrik/re2c/issues/263>`_:
    allow mixing conditional and non-conditional blocks with -c,
    `#296 <https://github.com/skvadrik/re2c/issues/296>`_:
    Conditions required for all lexers when using '-c' option).
  + Generate condition switch in every re2c block
    (`#295 <https://github.com/skvadrik/re2c/issues/295>`_:
    Condition switch generated for only one lexer per file).

- Changes in the generated labels:

  + Use ``yyeof`` label prefix instead of ``yyeofrule``.
  + Use ``yyfill`` label prefix instead of ``yyFillLabel``.
  + Decouple start label and initial label (affects label numbering).

- Removed undocumented configuration ``re2c🎏o``, ``re2c🎏output``.

- Changes in ``re2c🎏t``, ``re2c🎏type-header`` configuration:
  filename is now relative to the output file directory.

- Added option ``--case-ranges`` and configuration ``re2c🎏case-ranges``.

- Extended fixed tags optimization for the case of fixed-counter repetition.

- Fixed bugs related to EOF rule:

  + `#276 <https://github.com/skvadrik/re2c/issues/276>`_:
    Example 01_fill.re in docs is broken
  + `#280 <https://github.com/skvadrik/re2c/issues/280>`_:
    EOF rules with multiple blocks
  + `#284 <https://github.com/skvadrik/re2c/issues/284>`_:
    mismatched YYBACKUP and YYRESTORE
    (Add missing fallback states with EOF rule)

- Fixed miscellaneous bugs:

  + `#286 <https://github.com/skvadrik/re2c/issues/286>`_:
    Incorrect submatch values with fixed-length trailing context.
  + `#297 <https://github.com/skvadrik/re2c/issues/297>`_:
    configure error on ubuntu 18.04 / cmake 3.10

- Changed bootstrap process (require explicit configuration flags and a path to
  re2c executable to regenerate the lexers).

- Added internal options ``--posix-prectable <naive | complex>``.

- Added debug option ``--dump-dfa-tree``.

- Major revision of the paper "Efficient POSIX submatch extraction on NFA".

----
1.3x
----

1.3 (2019-12-14)
~~~~~~~~~~~~~~~~

- Added option: ``--stadfa``.

- Added warning: ``-Wsentinel-in-midrule``.

- Added generic API primitives:

  + ``YYSTAGPD``
  + ``YYMTAGPD``

- Added configurations:

  + ``re2c:sentinel = 0;``
  + ``re2c:define:YYSTAGPD = "YYSTAGPD";``
  + ``re2c:define:YYMTAGPD = "YYMTAGPD";``

- Worked on reproducible builds
  (`#258 <https://github.com/skvadrik/re2c/pull/258>`_:
  Make the build reproducible).

----
1.2x
----

1.2.1 (2019-08-11)
~~~~~~~~~~~~~~~~~~

- Fixed bug `#253 <https://github.com/skvadrik/re2c/issues/253>`_:
  re2c should install unicode_categories.re somewhere.

- Fixed bug `#254 <https://github.com/skvadrik/re2c/issues/254>`_:
  Turn off re2c:eof = 0.

1.2 (2019-08-02)
~~~~~~~~~~~~~~~~

- Added EOF rule ``$`` and configuration ``re2c:eof``.

- Added ``/*!include:re2c ... */`` directive and ``-I`` option.

- Added ``/*!header:re2c:on*/`` and ``/*!header:re2c:off*/`` directives.

- Added ``--input-encoding <ascii | utf8>`` option.

  + `#237 <https://github.com/skvadrik/re2c/issues/237>`_:
    Handle non-ASCII encoded characters in regular expressions
  + `#250 <https://github.com/skvadrik/re2c/issues/250>`_
    UTF8 enoding

- Added include file with a list of definitions for Unicode character classes.

  + `#235 <https://github.com/skvadrik/re2c/issues/235>`_:
    Unicode character classes

- Added ``--location-format <gnu | msvc>`` option.

  + `#195 <https://github.com/skvadrik/re2c/issues/195>`_:
    Please consider using Gnu format for error messages

- Added ``--verbose`` option that prints "success" message if re2c exits
  without errors.

- Added configurations for options:

  + ``-o --output`` (specify output file)
  + ``-t --type-header`` (specify header file)

- Removed configurations for internal/debug options.

- Extended ``-r`` option: allow to mix multiple ``/*!rules:re2c*/``,
  ``/*!use:re2c*/`` and ``/*!re2c*/`` blocks.

  + `#55 <https://github.com/skvadrik/re2c/issues/55>`_:
    allow standard re2c blocks in reuse mode

- Fixed ``-F --flex-support`` option: parsing and operator precedence.

  + `#229 <https://github.com/skvadrik/re2c/issues/229>`_:
    re2c option -F (flex syntax) broken
  + `#242 <https://github.com/skvadrik/re2c/issues/242>`_:
    Operator precedence with --flex-syntax is broken

- Changed difference operator ``/`` to apply before encoding expansion of
  operands.

  + `#236 <https://github.com/skvadrik/re2c/issues/236>`_:
    Support range difference with variable-length encodings

- Changed output generation of output file to be atomic.

  + `#245 <https://github.com/skvadrik/re2c/issues/245>`_:
    re2c output is not atomic

- Authored research paper "Efficient POSIX Submatch Extraction on NFA"
  together with Dr Angelo Borsotti.

- Added experimental libre2c library (``--enable-libs`` configure option) with
  the following algorithms:

  + TDFA with leftmost-greedy disambiguation
  + TDFA with POSIX disambiguation (Okui-Suzuki algorithm)
  + TNFA with leftmost-greedy disambiguation
  + TNFA with POSIX disambiguation (Okui-Suzuki algorithm)
  + TNFA with lazy POSIX disambiguation (Okui-Suzuki algorithm)
  + TNFA with POSIX disambiguation (Kuklewicz algorithm)
  + TNFA with POSIX disambiguation (Cox algorithm)

- Added debug subsystem (``--enable-debug`` configure option) and new debug
  options:

  + ``-dump-cfg`` (dump control flow graph of tag variables)
  + ``-dump-interf`` (dump interference table of tag variables)
  + ``-dump-closure-stats`` (dump epsilon-closure statistics)

- Added internal options:

  + ``--posix-closure <gor1 | gtop>`` (switch between shortest-path algorithms
    used for the construction of POSIX closure)

- Fixed a number of crashes found by American Fuzzy Lop fuzzer:

  + `#226 <https://github.com/skvadrik/re2c/issues/226>`_,
    `#227 <https://github.com/skvadrik/re2c/issues/227>`_,
    `#228 <https://github.com/skvadrik/re2c/issues/228>`_,
    `#231 <https://github.com/skvadrik/re2c/issues/231>`_,
    `#232 <https://github.com/skvadrik/re2c/issues/232>`_,
    `#233 <https://github.com/skvadrik/re2c/issues/233>`_,
    `#234 <https://github.com/skvadrik/re2c/issues/234>`_,
    `#238 <https://github.com/skvadrik/re2c/issues/238>`_

- Fixed handling of newlines:

  + correctly parse multi-character newlines CR LF in ``#line`` directives
  + consistently convert all newlines in the generated file to Unix-style LF

- Changed default tarball format from .gz to .xz.

  + `#221 <https://github.com/skvadrik/re2c/issues/221>`_:
    big source tarball

- Fixed a number of other bugs and resolved issues:

  + `#2 <https://github.com/skvadrik/re2c/issues/2>`_: abort
  + `#6 <https://github.com/skvadrik/re2c/issues/6>`_: segfault
  + `#10 <https://github.com/skvadrik/re2c/issues/10>`_:
    lessons/002_upn_calculator/calc_002 doesn't produce a useful example program
  + `#44 <https://github.com/skvadrik/re2c/issues/44>`_:
    Access violation when translating the attached file
  + `#49 <https://github.com/skvadrik/re2c/issues/49>`_:
    wildcard state \000 rules makes lexer behave weard
  + `#98 <https://github.com/skvadrik/re2c/issues/98>`_:
    Transparent handling of #line directives in input files
  + `#104 <https://github.com/skvadrik/re2c/issues/104>`_:
    Improve const-correctness
  + `#105 <https://github.com/skvadrik/re2c/issues/105>`_:
    Conversion of pointer parameters into references
  + `#114 <https://github.com/skvadrik/re2c/issues/114>`_:
    Possibility of fixing bug 2535084
  + `#120 <https://github.com/skvadrik/re2c/issues/120>`_:
    condition consisting of default rule only is ignored
  + `#167 <https://github.com/skvadrik/re2c/issues/167>`_:
    Add word boundary support
  + `#168 <https://github.com/skvadrik/re2c/issues/168>`_:
    Wikipedia's article on re2c
  + `#180 <https://github.com/skvadrik/re2c/issues/180>`_:
    Comment syntax?
  + `#182 <https://github.com/skvadrik/re2c/issues/182>`_:
    yych being set by YYPEEK () and then not used
  + `#196 <https://github.com/skvadrik/re2c/issues/196>`_:
    Implicit type conversion warnings
  + `#198 <https://github.com/skvadrik/re2c/issues/198>`_:
    no match for ‘operator!=’ in ‘i != std::vector<_Tp, _Alloc>::rend() [with _Tp = re2c::bitmap_t, _Alloc = std::allocator<re2c::bitmap_t>]()’
  + `#210 <https://github.com/skvadrik/re2c/issues/210>`_:
    How to build re2c in windows?
  + `#215 <https://github.com/skvadrik/re2c/issues/215>`_:
    A memory read overrun issue in s_to_n32_unsafe.cc
  + `#220 <https://github.com/skvadrik/re2c/issues/220>`_:
    src/dfa/dfa.h: simplify constructor to avoid g++-3.4 bug
  + `#223 <https://github.com/skvadrik/re2c/issues/223>`_:
    Fix typo
  + `#224 <https://github.com/skvadrik/re2c/issues/224>`_:
    src/dfa/closure_posix.cc: pack() tweaks
  + `#225 <https://github.com/skvadrik/re2c/issues/225>`_:
    Documentation link is broken in libre2c/README
  + `#230 <https://github.com/skvadrik/re2c/issues/230>`_:
    Changes for upcoming Travis' infra migration
  + `#239 <https://github.com/skvadrik/re2c/issues/239>`_:
    Push model example has wrong re2c invocation, breaks guide
  + `#241 <https://github.com/skvadrik/re2c/issues/241>`_:
    Guidance on how to use re2c for full-duplex command & response protocol
  + `#243 <https://github.com/skvadrik/re2c/issues/243>`_:
    A code generated for period (.) requires 4 bytes
  + `#246 <https://github.com/skvadrik/re2c/issues/246>`_:
    Please add a license to this repo
  + `#247 <https://github.com/skvadrik/re2c/issues/247>`_:
    Build failure on current Cygwin, probably caused by force-fed c++98 mode
  + `#248 <https://github.com/skvadrik/re2c/issues/248>`_:
    distcheck still looks for README
  + `#251 <https://github.com/skvadrik/re2c/issues/251>`_:
    Including what you use is find, but not without inclusion guards

- Updated documentation and website.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants