Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mlpack/config.hpp and automatic CMake configuration #3529

Merged
merged 19 commits into from Sep 1, 2023

Conversation

rcurtin
Copy link
Member

@rcurtin rcurtin commented Aug 17, 2023

It turns out there are a couple loose ends leftover from the header-only conversion. mlpack's old runtime library depended on a few macros that were specifically defined by CMake: HAS_STB and MLPACK_GIT_VERSION. Those only needed to be set by CMake, because the functionality that depended on it lived only inside libmlpack.so.

However, now, that is not the case, and so these variables must be set properly for every compilation that includes mlpack. #3514 exposed this problem. Thus, what is necessary is a way of ensuring that system-specific defines are properly defined each time.

Inspired by Armadillo's solution of config.hpp, I set to work with that approach. The general idea is that CMake modifies the "sample" config.hpp (in which nothing is enabled by default). However, I did not use the approach directly. Some technical details follow...

In Armadillo, CMake also copies all of the Armadillo header files to tmp/include/, and the generated config.hpp lives in that directory. That actually presents a slight difficult for development, because if you modify Armadillo's sources, there is nothing to copy those updated sources to tmp/include/, and you have to remember to do that manually or re-run CMake, if you are hand-compiling some test program that uses, e.g., -Itmp/include/ or something. I wanted to avoid that situation for mlpack development---because we have so many contributors, that could be a potentially confusing pitfall.

The obvious next idea is to modify config.hpp directly in the source tree with CMake at configuration time. That works, but then any git commit -a or similar that is done during development will pick up the changes in config.hpp---which should not be committed to the main repository, because the config.hpp in mlpack's repository should be as generic as possible (i.e. STB disabled by default, etc.). config.hpp cannot be added to .gitignore, because it is a file that is in the repository. I considered pre-commit and post-commit hooks to prevent config.hpp from being added, but concluded that in the end those approaches could present a lot of confusion to users.

The solution I ended up settling on is this:

  • CMake configures config.hpp, but outputs it to build/include/mlpack/config-local.hpp.
  • mlpack/base.hpp, which includes the configuration file, has support for a macro MLPACK_CUSTOM_CONFIG_FILE; if that is set, then instead of including mlpack/config.hpp, the custom file will be included.
  • Whenever CMake is building any mlpack program, binding, or test, it sets MLPACK_CUSTOM_CONFIG_FILE=build/include/mlpack/config-local.hpp so that the correct system-local configuration is used.
  • At install time, CMake installs config-local.hpp instead of the default config.hpp. So the installation will have the system-local configuration.

If users want to use mlpack directly by unpacking the tarball, they can either set the configuration options manually during compilation, like one can do with Armadillo (e.g. compile with -DMLPACK_HAS_STB), or they can modify config.hpp themselves, or they can even use MLPACK_CUSTOM_CONFIG_FILE just like CMake does.

These are the configuration variables that mlpack currently automatically sets with CMake:

  • MLPACK_HAS_BFD_DL: this is for the backtrace support, only available on Linux when mlpack is compiled with debugging symbols. Personally I would not be sad to see the backtrace code go, since it is relatively unused, but that's a different story. I preserved it for now.

  • MLPACK_HAS_STB: this is the actual solution for Problem with STB support #3514; the macro is renamed from HAS_STB.

  • MLPACK_HAS_NO_STB_DIR: this is a tricky piece. STB is distributed on most Linux systems in a way where you typically do #include <stb/stb_image.h>. But as STB is a header-only library with single-file includes, it is reasonable and somewhat expected for a user to manually download stb_image.h in a way where you instead would need to do #include <stb_image.h> (for instance, if they downloaded stb_image.h directly into /usr/local/include/). If MLPACK_HAS_NO_STB_DIR is defined, #include <stb_image.h> will be used; if it is not defined (the default), #include <stb/stb_image.h> is used. The default is chosen to match what most Linux systems will do. I had to refactor the STB finding code to support this correctly, but I tried several combinations locally successfully (let's see what CI thinks!).

  • MLPACK_GIT_VERSION: if set, this just means that mlpack::util::GetVersion() will return a version indicator with the git revision, instead of an incorrect version number. This is determined by CMake at configuration time.

I know that is a really long description. There are still a couple tiny things to do once CI succeeds:

  • Update documentation in README.md to at least mention the config file briefly
  • Update HISTORY.md
  • Check through other build/dependency documentation to make sure it is consistent

src/mlpack/config.hpp Outdated Show resolved Hide resolved
Copy link
Contributor

@conradsnicta conradsnicta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments

@conradsnicta
Copy link
Contributor

  • MLPACK_HAS_BFD_DL: this is for the backtrace support, only available on Linux when mlpack is compiled with debugging symbols. Personally I would not be sad to see the backtrace code go, since it is relatively unused, but that's a different story. I preserved it for now.

I suggest removing it unless there is a clear use case for keeping that specific functionality and its associated implementation. It's not a portable, and requires the code to be rebuilt with debugging enabled (which brings its own further baggage).

If similar functionality to aid debugging is required, one option is to adapt the portable approach that Armadillo uses. At the start of almost every armadillo function there is the arma_extra_debug_sigprint() statement. If tracing is not enabled (ie. ARMA_EXTRA_DEBUG is not defined), the statement does nothing (and the compiler removes it). If tracing is enabled, we get to see a nice trace of execution, including expansion of all template arguments.

While this approach does require peppering the mlpack codebase pretty much everywhere, in practice it has proven to be very effective in tracking down problems.

@rcurtin
Copy link
Member Author

rcurtin commented Aug 29, 2023

I suggest removing it unless there is a clear use case for keeping that specific functionality and its associated implementation.

I agree, but would prefer to leave it for another day.

@rcurtin
Copy link
Member Author

rcurtin commented Aug 29, 2023

It became clear to me while addressing the comments that when C++17 is the minimum supported language version, we can simply remove config.hpp and all of the CMake configuration related to that, and entirely use __has_include to detect dependencies. So, as a result, I did not document config.hpp heavily, as it will go away at some point. Just at a glance, RHEL8 and Ubuntu 20.04 (the oldest non-EOL LTS version as of April) both have C++17 support, and RHEL7 goes EOL next June. So, perhaps we are sooner to being able to support C++17 as a minimum than I thought.

Copy link

@mlpack-bot mlpack-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second approval provided automatically after 24 hours. 👍

@rcurtin rcurtin merged commit 1a6100a into mlpack:master Sep 1, 2023
14 of 17 checks passed
@rcurtin rcurtin deleted the stb-header-detect branch September 1, 2023 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants