Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a CMake build system #275

Merged
merged 1 commit into from Apr 6, 2020
Merged

Add a CMake build system #275

merged 1 commit into from Apr 6, 2020

Conversation

ligfx
Copy link
Contributor

@ligfx ligfx commented Mar 31, 2020

Attempt at #244

Built successfully on macOS, and on Windows with Visual Studio. Works great for embedding in other projects that use re2c to generate lexers!

I tried to copy the autotools build system as much as possible, including writing CMake functions to mimic autoconf, and using CMake script mode to do the bootstrap vs. use-built-re2c cycle.

@ligfx ligfx mentioned this pull request Mar 31, 2020
@skvadrik
Copy link
Owner

skvadrik commented Mar 31, 2020

Thanks for your effort! I will do my best to integrate it, but there are a few problems that need to be solved before merging. (And I may discover some more as I try various build scenarios and configurations.) If you don't feel like working on this, many thanks for your work anyway. Here is what I found so far:

  • Autogenerated lexers and parser contain absolute paths to .re/.ypp file in the #line directives. This is unacceptable as absolute paths are specific to the environment of the developer who happened to regenerate these files last. This should probably be fixed by passing a relative path to the .re/.ypp file when generating lexers and parser (but I have some difficulties as set(CMAKE_USE_RELATIVE_PATHS ON) doesn't seem to work).

  • A typo in target_link_libraries: using re2c instead of libre2c. Fixed in the obvious way:

--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -344,6 +344,6 @@ if (RE2C_BUILD_LIBS)
   include(CheckIncludeFileCXX)
   check_include_file_cxx("re2/re2.h" HAVE_RE2_H)
   if(HAVE_RE2_H)
-    target_link_libraries(bench_libre2c re2c)
+    target_link_libraries(bench_libre2c libre2c)
   endif()
 endif()
  • Sometimes intermediate build directories are not created and the commands that write to them fail (always reproducible with make -j1). Fixed by adding ${CMAKE_COMMAND} -E make_directory in a few places:
--- a/cmake/Re2cBootstrapLexer.cmake
+++ b/cmake/Re2cBootstrapLexer.cmake
@@ -32,8 +32,10 @@ function(re2c_bootstrap_lexer re_file cc_file)
   if(NOT TARGET re2c)
     message(FATAL_ERROR "'re2c' is not a valid target")
   endif()
+  get_filename_component(parent_dir "${cc_file}" DIRECTORY)
   add_custom_command(
     OUTPUT ${outputs}
+    COMMAND ${CMAKE_COMMAND} -E make_directory ${parent_dir}
     COMMAND
       cmake -Dre2c="$<TARGET_FILE:re2c>" -Dsource_dir="${CMAKE_CURRENT_SOURCE_DIR}"
       -Dre_file="${re_file}" -Dcc_file="${cc_file}" -Dh_file="${h_file}"
--- a/cmake/Re2cBootstrapParser.cmake
+++ b/cmake/Re2cBootstrapParser.cmake
@@ -1,7 +1,9 @@
 function(re2c_bootstrap_parser ypp_file cc_file h_file)
+  get_filename_component(parser_dir "${cc_file}" DIRECTORY)
   if (BISON_EXECUTABLE)
     add_custom_command(
       OUTPUT ${cc_file} ${h_file}
+      COMMAND cmake -E make_directory "${parser_dir}"
       COMMAND "${BISON_EXECUTABLE}" --defines="${h_file}" -o "${cc_file}" "${CMAKE_CURRENT_SOURCE_DIR}/${ypp_file}"
       COMMAND cmake -E copy_if_different "${cc_file}" "${CMAKE_CURRENT_SOURCE_DIR}/bootstrap/${cc_file}"
       COMMAND cmake -E copy_if_different "${h_file}" "${CMAKE_CURRENT_SOURCE_DIR}/bootstrap/${h_file}"
@@ -10,9 +12,9 @@ function(re2c_bootstrap_parser ypp_file cc_file h_file)
   else()
     add_custom_command(
       OUTPUT ${cc_file} ${h_file}
-      COMMAND cmake -E make_directory "src/parse"
-      COMMAND cmake -E copy "${CMAKE_CURRENT_SOURCE_DIR}/bootstrap/${cc_file}" "src/parse"
-      COMMAND cmake -E copy "${CMAKE_CURRENT_SOURCE_DIR}/bootstrap/${h_file}" "src/parse"
+      COMMAND cmake -E make_directory "${parser_dir}"
+      COMMAND cmake -E copy "${CMAKE_CURRENT_SOURCE_DIR}/bootstrap/${cc_file}" "${parser_dir}"
+      COMMAND cmake -E copy "${CMAKE_CURRENT_SOURCE_DIR}/bootstrap/${h_file}" "${parser_dir}"
       DEPENDS
         "${CMAKE_CURRENT_SOURCE_DIR}/bootstrap/${cc_file}"
         "${CMAKE_CURRENT_SOURCE_DIR}/bootstrap/${h_file}"
  • Use 4-space indentation everywhere (currently there is a mix of tabs and spaces).

  • Add a list of all the configurations you used to test the new build system, for example cmake -DRE2C_BUILD_LIBS=on -DCMAKE_BUILD_TYPE=Debug -B build && make -C build (these are helpful not only for historical reasons, but also as a hint to people trying to build new re2c version and not familiar with cmake).

  • Need a way to build shared library libre2c.so. Currently I can only build static library, and it is named liblibre2c.a.

@ligfx
Copy link
Contributor Author

ligfx commented Apr 1, 2020

@skvadrik , thanks for the thoughtful review comments.

  • Autogenerated lexers and parser contain absolute paths to .re/.ypp file in the #line directives. Fixed!

  • A typo in target_link_libraries: using re2c instead of libre2c Ah, this was actually supposed to link to RE2. I rewrote this part as it wasn't working correctly anyways (check_include_file_cxx needs explicit include directories), and checked that it works against a system install of RE2.

  • Sometimes intermediate build directories are not created and the commands that write to them fail (always reproducible with make -j1) I was able to reproduce this behavior. I added the make_directory commands as you suggested, and can no longer reproduce it.

  • Use 4-space indentation everywhere Done

  • Need a way to build shared library libre2c.so. Currently I can only build static library, and it is named liblibre2c.a. Added logic to build both libre2c and libre2c_static targets, with appropriate filenames, depending on the value of CMake's BUILD_SHARED_LIBS. If BUILD_SHARED_LIBS is uninitialized (the default), it builds both. Otherwise, it will only build one or the other. Works fine on my Mac (an install has both libraries), haven't checked on Windows yet.

  • Add a list of all the configurations you used to test the new build system, for example cmake -DRE2C_BUILD_LIBS=on -DCMAKE_BUILD_TYPE=Debug -B build && make -C build (these are helpful not only for historical reasons, but also as a hint to people trying to build new re2c version and not familiar with cmake)._

    Can you explain more what you'd like here? By my count, there are over 100 unique configurations. I've tested the different features separately (with and without Bison, with and without docs enabled, with and without library enabled, with and without RE2, dual libraries/shared-only/static-only), with Makefiles and Ninja.

    The general way to invoke CMake for this CMakeLists.txt looks like this:

    mkdir build
    cd build
    cmake .. \
      [-DCMAKE_BUILD_TYPE=Debug|Release|RelWithDebInfo] \
      [-DRE2C_REBUILD_DOCS=YES] \
      [-DRE2C_BUILD_LIBS=YES [-DBUILD_SHARED_LIBS=YES|NO]] \
      [-G Ninja]
    make -j # or ninja, if "-G Ninja" was used

    On Windows, install Visual Studio's "C++ CMake Tools for Windows", then open the folder. It will configure CMake with the default settings. You can use the CMake settings editor from there to edit the configuration.

@skvadrik
Copy link
Owner

skvadrik commented Apr 2, 2020

Great, thank you! Very good going!

Meanwhile I found a few more things:

  • Need to fix absolute paths in #line directives for parsers, in the same way as you fixed them for lexers.

  • Need to keep the dependency of main lexer on the generated parser (see the comment in Makefile.am "lexer depends on bison-generated header").

  • Need to copy the autogenerated lexer header (if any) to the bootstrap files (currently only the .cc file is copied).

  • All autogenerated sources should be generated before anything else is built (sort of pre-build order only dependency). Otherwise the bootstrap lexer for libre2c is always regenerated (on a fresh build from unmodified sources), because re2c happens to be built by the time the the lexer for libre2c is generated. In automake there is a variable BUILT_SOURCES (see the comment in Makefile.am: "custom rules create headers and must go before normal rules").

  • Currently the compiler flags specified in CMakeLists.txt take override whatever is passed in -DCMAKE_CXX_FLAGS on the command-line during configuration; this is wrong. Command-line flags should be appended, not prepended, so that they take precedence (otherwise it is not possible to override the default flags without modifying the source code).

  • Currently I cannot build shared libre2c because of a missing -fPIC somewhere (I get an error from ld: relocation R_X86_64_PC32 against symbol _ZN4re2c4Code5flistE can not be used when making a shared object; recompile with -fPIC). I haven't looked into it yet, but I wonder if you also get this.

  • What about cross-compilation? In autotools is as simple as passing --host i686-w64-mingw32 to configure. In cmake I think it requires something like this.

  • In general, there is a bunch of scripts __build*.sh in the root directory which build re2c in various configurations; they all will need to be ported to cmake one way or another.

To answer your question:

Can you explain more what you'd like here? By my count, there are over 100 unique configurations.

Sure, I by no means meant testing all of them, just a few different configurations that are often used:

  • with or without debug
  • with or without libre2c (static and shared)
  • with or without docs
  • passing extra flags to compiler and linker (optimizations, debug, sanitizers)
  • overriding compiler (e.g. using clang++ instead of g++, or using include-what-you-use, etc.)
  • cross-compilation (and running tests under wine)

It's good that you keep separate commits while working on the patch set, but before merging I think all commits should be squashed in one with a descriptive commit message that lists a few of the above configure/build scenarios (what you wrote above, "The general way to invoke CMake ...", is also very useful information). Not that it can't be found elsewhere (e.g. in re2c release notes), but some people may read the commits instead of release notes.

Thanks again for what you've done already.

@ligfx
Copy link
Contributor Author

ligfx commented Apr 3, 2020

  • Need to fix absolute paths in #line directives for parsers, in the same way as you fixed them for lexers. Ah, missed that because the version of Bison on my system is so old the files have significant changes anyways. Fixed.

  • Need to keep the dependency of main lexer on the generated parser (see the comment in Makefile.am "lexer depends on bison-generated header").

    Can you explain this more? I tested it, and when touching the bison source the parser is first rebuilt, then the lexer code is recompiled. I think CMake picks up the dependency automatically after the first build, and on the first build all generated files are generated before object files are compiled.

  • Need to copy the autogenerated lexer header (if any) to the bootstrap files (currently only the .cc file is copied). Fixed.

  • All autogenerated sources should be generated before anything else is built (sort of pre-build order only dependency). Otherwise the bootstrap lexer for libre2c is always regenerated (on a fresh build from unmodified sources), because re2c happens to be built by the time the the lexer for libre2c is generated. In automake there is a variable BUILT_SOURCES (see the comment in Makefile.am: "custom rules create headers and must go before normal rules").

    I fixed it to generate ver_to_vernum.cc before re2c and libre2c. Unfortunately, this involves a bit of a hack, since CMake detects a cyclical dependency between ver_to_vernum.cc and re2c: we have to guess the re2c executable location and name manually ${CMAKE_CURRENT_BINARY_DIR}/re2c${CMAKE_EXECUTABLE_SUFFIX}
    I thought about adding the rest of the generated files to the object library, but ver_to_vernum.cc is the only one shared between re2c and libre2c.

  • Currently the compiler flags specified in CMakeLists.txt take override whatever is passed in -DCMAKE_CXX_FLAGS on the command-line during configuration; this is wrong. Command-line flags should be appended, not prepended, so that they take precedence (otherwise it is not possible to override the default flags without modifying the source code).

    I'm not sure what to do here— this is how standard CMake works. Does Autotools do something different? When would you want to override default flags? AFAICT they're harmless warnings.

  • Currently I cannot build shared libre2c because of a missing -fPIC somewhere

    Think I fixed this... my macOS box is building everything PIC by default. Can you check on your computer that this works now?

  • What about cross-compilation? In autotools is as simple as passing --host i686-w64-mingw32 to configure. In cmake I think it requires something like this.

    Configuring for cross compilation varies depending on the target. The standard way is to use/write a toolchain file, as indicated in that link. Usually, platforms which require cross-compilation will come with toolchain files (like Android). I believe the thinking with Windows is, with CMake you can just compile on a Windows build box directly. That said, the simplest way to compile for mingw on my system is:

    cmake .. \
        -DCMAKE_SYSTEM_NAME=Windows \
        -DCMAKE_C_COMPILER=x86_64-w64-mingw32-gcc \
        -DCMAKE_CXX_COMPILER=x86_64-w64-mingw32-g++

    but probably has issues around finding libraries/packages/tools etc that I haven't thought about.

  • In general, there is a bunch of scripts __build*.sh in the root directory which build re2c in various configurations; they all will need to be ported to cmake one way or another.

    Are you thinking of moving entirely to CMake with this PR? I was thinking it would live alongside Autotools for a bit, to flesh out any bugs/gotchas.

  • Sure, I by no means meant testing all of them, just a few different configurations that are often used

    I can write command lines for each of the scenarios you describe (except cross-compilation) but I think I'm missing the point here? I added a few more of these scenarios to the general usage bit, below.

  • It's good that you keep separate commits while working on the patch set, but before merging I think all commits should be squashed in one with a descriptive commit message that lists a few of the above configure/build scenarios

    Merged the commits, and added this commit message:

     Add a CMake build system
    
     Invoke it like this on Unix:
    
     mkdir build
     cd build
     cmake .. \
       [-DCMAKE_C_FLAGS=...] [-DCMAKE_CXX_FLAGS=...] \
       [-DCMAKE_EXE_LINKER_FLAGS=...] [-DCMAKE_SHARED_LINKER_FLAGS=...] \
       [-DCMAKE_C_COMPILER=...] [-DCMAKE_CXX_COMPILER=...] \
       [-DCMAKE_BUILD_TYPE=Debug|Release|RelWithDebInfo] \
       [-DRE2C_REBUILD_DOCS=YES] \
       [-DRE2C_BUILD_LIBS=YES [-DBUILD_SHARED_LIBS=YES|NO]] \
       [-G Ninja]
     make -j # or ninja, if "-G Ninja" was used
    
     Or on Windows, install Visual Studio's CMake tools and then open this folder.
    

@@ -0,0 +1,16 @@
include(CheckCXXCompilerFlag)
# Iff C++ compiler recognizes 'flag', append 'flag' and 'implied-flags' to CXXFLAGSDEFAULT
Copy link
Collaborator

@sergeyklay sergeyklay Apr 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 'Iff' here means 'if and only if'? Don't you think this is bit nerdy? :)

Copy link
Owner

@skvadrik skvadrik Apr 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sergeyklay It comes from the configure.ac, which was added by me back in the days when I was an undergrad student and thought the world could work on 1st order predicate logic. :D

@ligfx just did a good job at porting the comment as well.

Copy link
Owner

@skvadrik skvadrik Apr 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, that was not so long ago: 4c7a3c8

Copy link
Contributor Author

@ligfx ligfx Apr 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skvadrik @sergeyklay I mean it's not wrong, and you could always go more nerdy and start using the symbols like ⇔ !

Copy link
Owner

@skvadrik skvadrik Apr 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not until re2c is rewriten in haskell. ;)

@skvadrik
Copy link
Owner

skvadrik commented Apr 4, 2020

Almost there!

I have two things to fix (and for both I have suggestions how to fix them), and then I can merge.

  • First, can we use simpler logic for dealing with shared/static libraries? Maybe I confused you the previous time. but what I think is needed is to always build static libraries (and link tests/benchmarks against them), but to also have optional support for building shared libraries. In other words, this:
    # optionally build shared libraries
    if (RE2C_BUILD_LIBS_SHARED)
        add_library(libre2c_shared SHARED ${libre2c_sources})
        if (UNIX)
            install(TARGETS libre2c_shared)
        endif()
        set_target_properties(libre2c_shared PROPERTIES OUTPUT_NAME "re2c")
    endif()

    # build static libraries
    add_library(libre2c_static STATIC ${libre2c_sources})
    if (UNIX)
        install(TARGETS libre2c_static)
    endif()
    set_target_properties(libre2c_static PROPERTIES OUTPUT_NAME "re2c")

To be used like cmake ... [-DRE2C_BUILD_LIBS [-DRE2C_BUILD_LIBS_SHARED]]. I also changed library names a bit (simplified to be always libre2c.<suffix>) and tested on Linix, maybe you will find that I broke something for Windows or macOS.

  • Second thing is about specifying CMAKE_CXX_FLAGS on command line when configuring re2c. I'm absolutely convinced that there should be a way to override default flags without interfering with the build system. The default flags may not work for any number of reasons (even harmless warnings may suddenly change into fatal build errors on a compiler upgrade, and we also set optimization level, so one should be able to override it with something else). To fix this in a quick and somewhat hacky way I suggest this in cmake/Re2cTryCXXFlag.cmake:
...
function(try_cxxflag flag)
    set(varname "cxxflag_${flag}")
    string(REPLACE "=" "_" varname "${varname}")
    check_cxx_compiler_flag("${flag}" ${varname})
    if(${varname})
        set(CMAKE_CXX_FLAGS "${flag} ${CMAKE_CXX_FLAGS}" PARENT_SCOPE)
        foreach(implied_flag IN LISTS ARGN)
            set(CMAKE_CXX_FLAGS "${implied_flag} ${CMAKE_CXX_FLAGS}" PARENT_SCOPE)
        endforeach()
    endif()
endfunction()

This way default flags are prepended, not appended. Later on this can be changed to avoid reversing the list of default flags.

Need to keep the dependency of main lexer on the generated parser (see the comment in Makefile.am "lexer depends on bison-generated header").

Can you explain this more? I tested it, and when touching the bison source the parser is first rebuilt, then the lexer code is recompiled. I think CMake picks up the dependency automatically after the first build, and on the first build all generated files are generated before object files are compiled.

I thought it over, and I agree with you reasoning. This was probably needed before we enforced the ordering requirement that all generated sources must be built before any compilation starts.

Currently I cannot build shared libre2c because of a missing -fPIC somewhere

Think I fixed this... my macOS box is building everything PIC by default. Can you check on your computer that this works now?

Thanks, works for me now. What did you fix?

Configuring for cross compilation varies depending on the target. The standard way is to use/write a toolchain file, as indicated in that link. Usually, platforms which require cross-compilation will come with toolchain files (like Android). I believe the thinking with Windows is, with CMake you can just compile on a Windows build box directly. That said, the simplest way to compile for mingw on my system is [...] but probably has issues around finding libraries/packages/tools etc that I haven't thought about.

I found a way to do build mingw-compiled re2c and libre2c and test them under wine, so this one is solved. I took me a bit of time because linker flags -static-libgcc -static-libstdc++ and shared libraries do not play well together.

Are you thinking of moving entirely to CMake with this PR? I was thinking it would live alongside Autotools for a bit, to flesh out any bugs/gotchas.

Yes, maybe even until the next release cycle to give maintainers more time. But as a paranoid maintainer I don't want to take responsibility for anything that I cannot maintain, and I certainly want cmake build system to used, tested and covered by CI, and I plan to use it myself and eventually deprecate autotools. Not because I don't like autotools --- quite the contrary, I think for Unix systems they are more elegant than cmake. But living with two build systems just won't work in the long run.

I can write command lines for each of the scenarios you describe (except cross-compilation) but I think I'm missing the point here? I added a few more of these scenarios to the general usage bit, below.

Your commit message is is quite perfect (and helped me a few times already). Just make sure you update it if you do any last-minute changes.

@ligfx ligfx force-pushed the cmake branch 2 times, most recently from 0563c56 to 0ee8182 Compare Apr 4, 2020
@ligfx
Copy link
Contributor Author

ligfx commented Apr 4, 2020

  • First, can we use simpler logic for dealing with shared/static libraries? Maybe I confused you the previous time. but what I think is needed is to always build static libraries (and link tests/benchmarks against them), but to also have optional support for building shared libraries.

    Can you explain this more? Why does it always need to build static libraries? The built tests and benchmarks work on my system with either shared or static.

    I was trying to emulate Autotools' behavior of always building both static and shared libraries, while still letting the user/parent project choose using the standard BUILD_SHARED_LIBS flag (and in the case of parent projects, always having a libre2c target to point to).

  • Second thing is about specifying CMAKE_CXX_FLAGS on command line when configuring re2c. I'm absolutely convinced that there should be a way to override default flags without interfering with the build system. Done. Note that specifying CMAKE_BUILD_TYPE will still append CMake's default flags to the user's flags. I did add a condition to not add the -O2 flag if a CMAKE_BUILD_TYPE is specified (mainly for pure debug builds, since you presumably don't want optimizations).

  • Currently I cannot build shared libre2c because of a missing -fPIC somewhere [...] Thanks, works for me now. What did you fix? I told CMake to always build position independent code with set(CMAKE_POSITION_INDEPENDENT_CODE ON).

  • I found a way to do build mingw-compiled re2c and libre2c and test them under wine, so this one is solved. I took me a bit of time because linker flags -static-libgcc -static-libstdc++ and shared libraries do not play well together.

    How did you solve it? I would do cmake -DBUILD_SHARED_LIBS=OFF. Unfortunately, this is a case where CMake's lack of knowledge about dual shared/static libraries becomes an issue.

    I could add a check for -static in the linker flags to the logic that chooses the types of library to build, if that'd be easier.

  • Are you thinking of moving entirely to CMake with this PR?

    Yes, maybe even until the next release cycle to give maintainers more time.

    Sure. Want me to update the __build_*.sh files in this PR? Or I can do them in a separate one.

  • Your commit message is is quite perfect (and helped me a few times already). Just make sure you update it if you do any last-minute changes. Added CMAKE_STATIC_LINKER_FLAGS to the message as well, forgot about that last time.

@skvadrik
Copy link
Owner

skvadrik commented Apr 5, 2020

Can you explain this more? Why does it always need to build static libraries? The built tests and benchmarks work on my system with either shared or static.

I had two reasons to change the code: first, I wanted to simplify the logic, and second, I wanted to get rid of the mingw linking error in the default configuration.

I was trying to emulate Autotools' behavior of always building both static and shared libraries, while still letting the user/parent project choose using the standard BUILD_SHARED_LIBS flag (and in the case of parent projects, always having a libre2c target to point to).

Thanks for keeping close to Autotools. Your reasoning indeed makes sense.

This is the final variant that I ended up with. It behaves like your original code: build both shared and static libraries by default, but it is possible to choose either variant exclusively with BUILD_SHARED_LIBS option, and there is a top-level target alias. The only functional difference is that by default the alias points to the static library. This helps in cases like mingw (see the details below). Also I restructured the code to be (in my opinion) a bit easier to follow the logic:

    # build shared libraries
    if ((NOT DEFINED BUILD_SHARED_LIBS) OR BUILD_SHARED_LIBS)
        add_library(libre2c_shared SHARED ${libre2c_sources})
        set_target_properties(libre2c_shared PROPERTIES OUTPUT_NAME "re2c")
        if (UNIX)
            install(TARGETS libre2c_shared)
        endif()
    endif()

    # build static libraries
    if ((NOT DEFINED BUILD_SHARED_LIBS) OR (NOT BUILD_SHARED_LIBS))
        add_library(libre2c_static STATIC ${libre2c_sources})
        set_target_properties(libre2c_static PROPERTIES OUTPUT_NAME "re2c")
        if (UNIX)
            install(TARGETS libre2c_static)
        endif()
    endif()

    # define top-level aliases to either static or shared libraries (default is static)
    if (BUILD_SHARED_LIBS)
        add_library(libre2c ALIAS libre2c_shared)
    else()
        add_library(libre2c ALIAS libre2c_static)
    endif()

    ...
    target_link_libraries(test_libre2c libre2c)
    ...
    target_link_libraries(bench_libre2c libre2c)

Note that specifying CMAKE_BUILD_TYPE will still append CMake's default flags to the user's flags. I did add a condition to not add the -O2 flag if a CMAKE_BUILD_TYPE is specified (mainly for pure debug builds, since you presumably don't want optimizations).

Yes, I know that cmake appends some flags on its own when a certain build type is used. This should not be a problem, because it is possible to configure without CMAKE_BUILD_TYPE, or choose a suitable type. Thanks for adding the check for -O2, it makes sense.

I told CMake to always build position independent code with set(CMAKE_POSITION_INDEPENDENT_CODE ON).

Ah! I was grepping for -fPIC and -fPIE and didn't find anything.

How did you solve it? I would do cmake -DBUILD_SHARED_LIBS=OFF. Unfortunately, this is a case where CMake's lack of knowledge about dual shared/static libraries becomes an issue.

I was experimenting with various combinations of -static -statc-libgcc -static-libstdc++ in CMAKE_EXE_LINKER_FLAGS and CMAKE_SHARED_LINKER_FLAGS, getting either unresolved symbols from libgcc and libstdc++, or multiple definition of Unwind_SjLj_Register from libgcc_eh.a if I tried to link libgcc into shared libre2c. Curiously, multiple definition only happened for this one symbol, and only for bench_libre2c (test_libre2c linked successfully and ran without issues under wine). It could be a Mingw issue with forgetting to hide internal symbols, I'm not sure.

At first I didn't realize that I was trying to link against shared libre2c. When I realized it, I changed CMakeLists.txt to link test and benchmark against static library, and this resolved the linking problem. I was also lucky in that I had a working case to compare against (the libtool example).

My understanding is that linking with static libraries libre2c and libgcc that contain definitions of the same symbols from libgcc is fine, although probably the order of static libraries matters (unless every symbol is in its own section).

I could add a check for -static in the linker flags to the logic that chooses the types of library to build, if that'd be easier.

No, let's not add a check --- it is best to keep such workarounds out of the build system, preferably in helper scripts. Every bit of extra logic in the build system makes it harder to read and understand. This complexity is ok in cases like bootstrap, but for simple or rare cases it is better not to clutter the build rules with extra logic.

One minor fix that is needed is to do chmod +1 run_tests.sh in case of MINGW as well as UNIX.

Sure. Want me to update the _build*.sh files in this PR? Or I can do them in a separate one.

Let's do it in a follow-up commit, and let's keep the autotools scripts and write a new set of scripts for cmake. I wanted to make sure that it is possible to cover all these scenarios in principle.

Invoke it like this on Unix:

mkdir build
cd build
cmake .. \
  [-DCMAKE_C_FLAGS=...] [-DCMAKE_CXX_FLAGS=...] \
  [-DCMAKE_EXE_LINKER_FLAGS=...] \
  [-DCMAKE_SHARED_LINKER_FLAGS=...] [-DCMAKE_STATIC_LINKER_FLAGS=...] \
  [-DCMAKE_C_COMPILER=...] [-DCMAKE_CXX_COMPILER=...] \
  [-DCMAKE_BUILD_TYPE=Debug|Release|RelWithDebInfo] \
  [-DRE2C_REBUILD_DOCS=YES] \
  [-DRE2C_BUILD_LIBS=YES [-DBUILD_SHARED_LIBS=YES|NO]] \
  [-G Ninja]
make -j # or ninja, if "-G Ninja" was used

Or on Windows, install Visual Studio's CMake tools and then open this folder.
@ligfx
Copy link
Contributor Author

ligfx commented Apr 5, 2020

  • This is the final variant that I ended up with. The only functional difference is that by default the alias points to the static library. Also I restructured the code to be (in my opinion) a bit easier to follow the logic. Ah, love it. CMake defaults to static libraries so that makes more sense. And your structure is much cleaner. I added this in.
  • One minor fix that is needed is to do chmod +x run_tests.sh in case of MINGW as well as UNIX. I changed the condition if(UNIX) to if(CMAKE_HOST_UNIX). Works for me when running a mingw build.
  • Let's do it in a follow-up commit, and let's keep the autotools scripts and write a new set of scripts for cmake. I wanted to make sure that it is possible to cover all these scenarios in principle. See Add build scripts using CMake #277

@skvadrik skvadrik merged commit 83f7bc8 into skvadrik:master Apr 6, 2020
@skvadrik
Copy link
Owner

skvadrik commented Apr 6, 2020

Hooray \o/ I've merged this. Thanks @ligfx for all your work!

One question regarding static libraries, why do you have to add _static suffix to the library name on Windows?

A few things are still needed before we can switch to this as default build system (like make check or make wtests that currently trigger regeneration of lexers), but we'll deal with them.

@ligfx
Copy link
Contributor Author

ligfx commented Apr 6, 2020

One question regarding static libraries, why do you have to add _static suffix to the library name on Windows?

Windows shared libraries generate a .dll file (the shared object) and a .lib file (the static object to interface with the shared object.. or something like that). Static libraries just generate a .lib file. If a shared and static library have the same name, their .lib files will conflict

@skvadrik
Copy link
Owner

skvadrik commented Apr 6, 2020

Ah, got it. I should add a comment in CMakeLists.txt to remember this.

@skvadrik skvadrik mentioned this pull request Apr 7, 2020
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Sep 20, 2020
2.0.3 (2020-08-22)
~~~~~~~~~~~~~~~~~~

- Fix issues when building re2c as a CMake subproject
  (`#302 <https://github.com/skvadrik/re2c/pull/302>`_:

- Final corrections in the SIMPA article "RE2C: A lexer generator based on
  lookahead-TDFA", https://doi.org/10.1016/j.simpa.2020.100027

2.0.2 (2020-08-08)
~~~~~~~~~~~~~~~~~~

- Enable re2go building by default.

- Package CMake files into release tarball.

2.0.1 (2020-07-29)
~~~~~~~~~~~~~~~~~~

- Updated version for CMake build system (forgotten in release 2.0).

- Added a short article about re2c for the Software Impacts journal.

2.0 (2020-07-20)
~~~~~~~~~~~~~~~~

- Added new code generation backend for Go and a new ``re2go`` program
  (`#272 <https://github.com/skvadrik/re2c/issues/272>`_: Go support).
  Added option ``--lang <c | go>``.

- Added CMake build system as an alternative to Autotools
  (`#275 <https://github.com/skvadrik/re2c/pull/275>`_:
  Add a CMake build system (thanks to ligfx),
  `#244 <https://github.com/skvadrik/re2c/issues/244>`_: Switching to CMake).

- Changes in generic API:

  + Removed primitives ``YYSTAGPD`` and ``YYMTAGPD``.
  + Added primitives ``YYSHIFT``, ``YYSHIFTSTAG``, ``YYSHIFTMTAG``
    that allow to express fixed tags in terms of generic API.
  + Added configurations ``re2c:api:style`` and ``re2c:api:sigil``.
  + Added named placeholders in interpolated configuration strings.

- Changes in reuse mode (``-r, --reuse`` option):

  + Do not reset API-related configurations in each `use:re2c` block
    (`#291 <https://github.com/skvadrik/re2c/issues/291>`_:
    Defines in rules block are not propagated to use blocks).
  + Use block-local options instead of last block options.
  + Do not accumulate options from rules/reuse blocks in whole-program options.
  + Generate non-overlapping YYFILL labels for reuse blocks.
  + Generate start label for each reuse block in storable state mode.

- Changes in start-conditions mode (``-c, --start-conditions`` option):

  + Allow to use normal (non-conditional) blocks in `-c` mode
    (`#263 <https://github.com/skvadrik/re2c/issues/263>`_:
    allow mixing conditional and non-conditional blocks with -c,
    `#296 <https://github.com/skvadrik/re2c/issues/296>`_:
    Conditions required for all lexers when using '-c' option).
  + Generate condition switch in every re2c block
    (`#295 <https://github.com/skvadrik/re2c/issues/295>`_:
    Condition switch generated for only one lexer per file).

- Changes in the generated labels:

  + Use ``yyeof`` label prefix instead of ``yyeofrule``.
  + Use ``yyfill`` label prefix instead of ``yyFillLabel``.
  + Decouple start label and initial label (affects label numbering).

- Removed undocumented configuration ``re2c🎏o``, ``re2c🎏output``.

- Changes in ``re2c🎏t``, ``re2c🎏type-header`` configuration:
  filename is now relative to the output file directory.

- Added option ``--case-ranges`` and configuration ``re2c🎏case-ranges``.

- Extended fixed tags optimization for the case of fixed-counter repetition.

- Fixed bugs related to EOF rule:

  + `#276 <https://github.com/skvadrik/re2c/issues/276>`_:
    Example 01_fill.re in docs is broken
  + `#280 <https://github.com/skvadrik/re2c/issues/280>`_:
    EOF rules with multiple blocks
  + `#284 <https://github.com/skvadrik/re2c/issues/284>`_:
    mismatched YYBACKUP and YYRESTORE
    (Add missing fallback states with EOF rule)

- Fixed miscellaneous bugs:

  + `#286 <https://github.com/skvadrik/re2c/issues/286>`_:
    Incorrect submatch values with fixed-length trailing context.
  + `#297 <https://github.com/skvadrik/re2c/issues/297>`_:
    configure error on ubuntu 18.04 / cmake 3.10

- Changed bootstrap process (require explicit configuration flags and a path to
  re2c executable to regenerate the lexers).

- Added internal options ``--posix-prectable <naive | complex>``.

- Added debug option ``--dump-dfa-tree``.

- Major revision of the paper "Efficient POSIX submatch extraction on NFA".

----
1.3x
----

1.3 (2019-12-14)
~~~~~~~~~~~~~~~~

- Added option: ``--stadfa``.

- Added warning: ``-Wsentinel-in-midrule``.

- Added generic API primitives:

  + ``YYSTAGPD``
  + ``YYMTAGPD``

- Added configurations:

  + ``re2c:sentinel = 0;``
  + ``re2c:define:YYSTAGPD = "YYSTAGPD";``
  + ``re2c:define:YYMTAGPD = "YYMTAGPD";``

- Worked on reproducible builds
  (`#258 <https://github.com/skvadrik/re2c/pull/258>`_:
  Make the build reproducible).

----
1.2x
----

1.2.1 (2019-08-11)
~~~~~~~~~~~~~~~~~~

- Fixed bug `#253 <https://github.com/skvadrik/re2c/issues/253>`_:
  re2c should install unicode_categories.re somewhere.

- Fixed bug `#254 <https://github.com/skvadrik/re2c/issues/254>`_:
  Turn off re2c:eof = 0.

1.2 (2019-08-02)
~~~~~~~~~~~~~~~~

- Added EOF rule ``$`` and configuration ``re2c:eof``.

- Added ``/*!include:re2c ... */`` directive and ``-I`` option.

- Added ``/*!header:re2c:on*/`` and ``/*!header:re2c:off*/`` directives.

- Added ``--input-encoding <ascii | utf8>`` option.

  + `#237 <https://github.com/skvadrik/re2c/issues/237>`_:
    Handle non-ASCII encoded characters in regular expressions
  + `#250 <https://github.com/skvadrik/re2c/issues/250>`_
    UTF8 enoding

- Added include file with a list of definitions for Unicode character classes.

  + `#235 <https://github.com/skvadrik/re2c/issues/235>`_:
    Unicode character classes

- Added ``--location-format <gnu | msvc>`` option.

  + `#195 <https://github.com/skvadrik/re2c/issues/195>`_:
    Please consider using Gnu format for error messages

- Added ``--verbose`` option that prints "success" message if re2c exits
  without errors.

- Added configurations for options:

  + ``-o --output`` (specify output file)
  + ``-t --type-header`` (specify header file)

- Removed configurations for internal/debug options.

- Extended ``-r`` option: allow to mix multiple ``/*!rules:re2c*/``,
  ``/*!use:re2c*/`` and ``/*!re2c*/`` blocks.

  + `#55 <https://github.com/skvadrik/re2c/issues/55>`_:
    allow standard re2c blocks in reuse mode

- Fixed ``-F --flex-support`` option: parsing and operator precedence.

  + `#229 <https://github.com/skvadrik/re2c/issues/229>`_:
    re2c option -F (flex syntax) broken
  + `#242 <https://github.com/skvadrik/re2c/issues/242>`_:
    Operator precedence with --flex-syntax is broken

- Changed difference operator ``/`` to apply before encoding expansion of
  operands.

  + `#236 <https://github.com/skvadrik/re2c/issues/236>`_:
    Support range difference with variable-length encodings

- Changed output generation of output file to be atomic.

  + `#245 <https://github.com/skvadrik/re2c/issues/245>`_:
    re2c output is not atomic

- Authored research paper "Efficient POSIX Submatch Extraction on NFA"
  together with Dr Angelo Borsotti.

- Added experimental libre2c library (``--enable-libs`` configure option) with
  the following algorithms:

  + TDFA with leftmost-greedy disambiguation
  + TDFA with POSIX disambiguation (Okui-Suzuki algorithm)
  + TNFA with leftmost-greedy disambiguation
  + TNFA with POSIX disambiguation (Okui-Suzuki algorithm)
  + TNFA with lazy POSIX disambiguation (Okui-Suzuki algorithm)
  + TNFA with POSIX disambiguation (Kuklewicz algorithm)
  + TNFA with POSIX disambiguation (Cox algorithm)

- Added debug subsystem (``--enable-debug`` configure option) and new debug
  options:

  + ``-dump-cfg`` (dump control flow graph of tag variables)
  + ``-dump-interf`` (dump interference table of tag variables)
  + ``-dump-closure-stats`` (dump epsilon-closure statistics)

- Added internal options:

  + ``--posix-closure <gor1 | gtop>`` (switch between shortest-path algorithms
    used for the construction of POSIX closure)

- Fixed a number of crashes found by American Fuzzy Lop fuzzer:

  + `#226 <https://github.com/skvadrik/re2c/issues/226>`_,
    `#227 <https://github.com/skvadrik/re2c/issues/227>`_,
    `#228 <https://github.com/skvadrik/re2c/issues/228>`_,
    `#231 <https://github.com/skvadrik/re2c/issues/231>`_,
    `#232 <https://github.com/skvadrik/re2c/issues/232>`_,
    `#233 <https://github.com/skvadrik/re2c/issues/233>`_,
    `#234 <https://github.com/skvadrik/re2c/issues/234>`_,
    `#238 <https://github.com/skvadrik/re2c/issues/238>`_

- Fixed handling of newlines:

  + correctly parse multi-character newlines CR LF in ``#line`` directives
  + consistently convert all newlines in the generated file to Unix-style LF

- Changed default tarball format from .gz to .xz.

  + `#221 <https://github.com/skvadrik/re2c/issues/221>`_:
    big source tarball

- Fixed a number of other bugs and resolved issues:

  + `#2 <https://github.com/skvadrik/re2c/issues/2>`_: abort
  + `#6 <https://github.com/skvadrik/re2c/issues/6>`_: segfault
  + `#10 <https://github.com/skvadrik/re2c/issues/10>`_:
    lessons/002_upn_calculator/calc_002 doesn't produce a useful example program
  + `#44 <https://github.com/skvadrik/re2c/issues/44>`_:
    Access violation when translating the attached file
  + `#49 <https://github.com/skvadrik/re2c/issues/49>`_:
    wildcard state \000 rules makes lexer behave weard
  + `#98 <https://github.com/skvadrik/re2c/issues/98>`_:
    Transparent handling of #line directives in input files
  + `#104 <https://github.com/skvadrik/re2c/issues/104>`_:
    Improve const-correctness
  + `#105 <https://github.com/skvadrik/re2c/issues/105>`_:
    Conversion of pointer parameters into references
  + `#114 <https://github.com/skvadrik/re2c/issues/114>`_:
    Possibility of fixing bug 2535084
  + `#120 <https://github.com/skvadrik/re2c/issues/120>`_:
    condition consisting of default rule only is ignored
  + `#167 <https://github.com/skvadrik/re2c/issues/167>`_:
    Add word boundary support
  + `#168 <https://github.com/skvadrik/re2c/issues/168>`_:
    Wikipedia's article on re2c
  + `#180 <https://github.com/skvadrik/re2c/issues/180>`_:
    Comment syntax?
  + `#182 <https://github.com/skvadrik/re2c/issues/182>`_:
    yych being set by YYPEEK () and then not used
  + `#196 <https://github.com/skvadrik/re2c/issues/196>`_:
    Implicit type conversion warnings
  + `#198 <https://github.com/skvadrik/re2c/issues/198>`_:
    no match for ‘operator!=’ in ‘i != std::vector<_Tp, _Alloc>::rend() [with _Tp = re2c::bitmap_t, _Alloc = std::allocator<re2c::bitmap_t>]()’
  + `#210 <https://github.com/skvadrik/re2c/issues/210>`_:
    How to build re2c in windows?
  + `#215 <https://github.com/skvadrik/re2c/issues/215>`_:
    A memory read overrun issue in s_to_n32_unsafe.cc
  + `#220 <https://github.com/skvadrik/re2c/issues/220>`_:
    src/dfa/dfa.h: simplify constructor to avoid g++-3.4 bug
  + `#223 <https://github.com/skvadrik/re2c/issues/223>`_:
    Fix typo
  + `#224 <https://github.com/skvadrik/re2c/issues/224>`_:
    src/dfa/closure_posix.cc: pack() tweaks
  + `#225 <https://github.com/skvadrik/re2c/issues/225>`_:
    Documentation link is broken in libre2c/README
  + `#230 <https://github.com/skvadrik/re2c/issues/230>`_:
    Changes for upcoming Travis' infra migration
  + `#239 <https://github.com/skvadrik/re2c/issues/239>`_:
    Push model example has wrong re2c invocation, breaks guide
  + `#241 <https://github.com/skvadrik/re2c/issues/241>`_:
    Guidance on how to use re2c for full-duplex command & response protocol
  + `#243 <https://github.com/skvadrik/re2c/issues/243>`_:
    A code generated for period (.) requires 4 bytes
  + `#246 <https://github.com/skvadrik/re2c/issues/246>`_:
    Please add a license to this repo
  + `#247 <https://github.com/skvadrik/re2c/issues/247>`_:
    Build failure on current Cygwin, probably caused by force-fed c++98 mode
  + `#248 <https://github.com/skvadrik/re2c/issues/248>`_:
    distcheck still looks for README
  + `#251 <https://github.com/skvadrik/re2c/issues/251>`_:
    Including what you use is find, but not without inclusion guards

- Updated documentation and website.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants