Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make mlpack completely header-only #3250

Merged
merged 21 commits into from
Aug 15, 2022

Conversation

rcurtin
Copy link
Member

@rcurtin rcurtin commented Jul 31, 2022

This PR does the last step---removes libmlpack.so entirely. This will probably require some adaptation downstream in the examples and models repositories, but, that should be pretty easy (just don't link against libmlpack.so anymore).

Most of this PR is CMake reconfiguration and simplification: now that there is no libmlpack.so, there's a lot less that we have to do.

A shortlist of notable modifications here:

  • libmlpack.so is gone, and so linking for bindings and tests is now a little bit simpler.

  • The arma_config_check.hpp file, which made sure that the same compilation settings were used in libmlpack.so were used when mlpack was included, are no longer necessary, and so all related CMake infrastructure has been removed.

  • The pkgconfig generator is modified so it no longer includes -lmlpack in the linker command.

  • mlpack_export.hpp is no longer needed---so everything related to that is now gone.

  • Documentation is updated to reflect that mlpack is now header-only (maybe it could be updated in more places---I would be interested in people's comments on where to do that).

  • Finally, there is now only one source file that ever changes as a result of CMake: src/mlpack/util/gitversion.hpp. This is already generated directly into src/, and not into build/include/, so there is no compelling reason to make the first step of every build to copy every mlpack header into build/include/. Therefore, I removed the mlpack_headers target, and now there is no more step to copy all of the headers. This should accelerate builds, I hope, or at least remove some of the tedium... the only "downside" is that users used to including the build/include/ directory (if they build mlpack without installing, for instance, like I often do), will need to just include src/ instead---a minor change.

@conradsnicta
Copy link
Contributor

conradsnicta commented Aug 1, 2022

@rcurtin I thought more about the "include everything in one header" issue (follow-up to #3233). Rather than making the "one header" approach mandatory and converting all of the codebase, I suggest making it an option. With this, both the old and new ways of including mlpack functionality would work.

More specifically:

  • for folks that simply want the convenience (at the possible cost of increased compilation time) they can do #include <mlpack.hpp> and be done
  • for folks that want to be more selective in what is included (to avoid increased compilation time), they can include specific subsets of mlpack headers, as is done now

@rcurtin
Copy link
Member Author

rcurtin commented Aug 12, 2022

@rcurtin I thought more about the "include everything in one header" issue (follow-up to #3233). Rather than making the "one header" approach mandatory and converting all of the codebase, I suggest making it an option. With this, both the old and new ways of including mlpack functionality would work.

More specifically:

  • for folks that simply want the convenience (at the possible cost of increased compilation time) they can do #include <mlpack.hpp> and be done

  • for folks that want to be more selective in what is included (to avoid increased compilation time), they can include specific subsets of mlpack headers, as is done now

@conradsnicta I did some serious digging into this issue. Fundamentally simply the cost of including all of mlpack's headers is not too expensive although it is noticeable. If I make a header that includes everything---except the code that enables serialization for ANN layers (more on that later)---and I include this in the mnist_cnn example and the example code from this linear regression example I see these changes:

  • mnist_cnn

    • g++, this branch with only necessary headers included (and no serialization.hpp): 11.4s, ~800MB RAM
    • g++, this branch with all headers included (and no serialization.hpp): 18.8s, ~1.6GB RAM
    • g++, this branch with only necessary headers included (and serialization): 65.0s, ~4.5GB RAM
    • clang, this branch with only necessary headers included (and no serialization.hpp): 11.2s, ~300MB RAM
    • clang, this branch with all headers included (and no serialization.hpp): 17.4s, ~900MB RAM
  • linear_regression

    • g++, this branch with only necessary headers included: 6.0s, ~750MB RAM
    • g++, this branch with all headers included (and no serialization.hpp): 13.6s, ~1.5GB RAM
    • g++, this branch with all headers included (and serialization): 63.6s, ~4.5GB RAM
    • clang, this branch with only necessary headers included: 5.1s, ~300MB RAM
    • clang, this branch with all headers included (and no serialization.hpp): 11.9s, ~900MB RAM
    • clang, this branch with all headers included (and serialization): 192.0s, ~3GB RAM

(I found that gcc's precompiled headers did not help much.)

So fundamentally it is not too painful to include all of the headers, and I agree that your approach of supplying an mlpack.hpp header that includes everything is reasonable, and then we can ensure that we have documentation to suggest that users can reduce compile times and memory usage by only including what they need. Simultaneously I will also need to go through the library and make sure that each directory has a "top-level" include file that you can include to get everything related to that module. So, e.g., #include <mlpack/methods/cf.hpp> should include all the bells and whistles of the CF module, instead of the bare minimum. Some directories don't have any top-level include at all right now.

Now, about the crazy serialization numbers: this is what surprised me, though in retrospect it makes perfect sense. Including this file causes compile times to take a minimum of one minute and use a minimum of 3GB of RAM. It also makes the resulting programs much larger! The reason, as it turns out, is that the compiler must instantiate every single layer that we are allowing to be serializable. This is because cereal must have a constructor definition for any polymorphic class that it might be serializing, since deserialization might encounter an arbitrary layer type. So basically what this boils down to is that any program that includes that file must contain compiled versions of every layer type---which places an insane demand on the compiler, especially if the user is not ever even using a neural network (or serializing one).

There is no reasonable way to avoid that compilation cost, or to automatically compile only the layers that a user has explicitly serialized---since a program could feasibly just be loading a model, a user may never actually manually instantiate a layer.

So, this presents a dilemma that I plan to solve like this:

  • By default, ANN layers will not be serializable. Users' code will still compile, but if they try to serialize a model, it will fail at runtime (cereal will issue some specific error). An FAQ section will be added to the README and the website for this specific error, suggesting the following solutions:

  • Users can manually #include <mlpack/methods/ann/layer/serialization.hpp> to allow serialization of all mlpack's layer types (where MatType = arma::mat), although this will come with a heavy compilation cost. It is convenient, though, and necessary if the program is to be able to load arbitrary networks.

  • Users who want to minimize compilation time can manually write CEREAL_REGISTER_TYPE(Layer) for each layer that they use. It is perhaps inconvenient and a little ugly to do that, but there is no realistic alternative.

(Also, a side note: when the serialization issue is handled correctly, these runtimes are way better than before #2777: compiling mnist_cnn with boost::visitor takes 50s and uses 3GB RAM.)

Anyway, happy to hear any comments on this approach. However I will implement in a separate PR, since it'll be a lot of moving includes around and other trudgery.

Copy link
Member

@zoq zoq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is huge, awesome work.

@conradsnicta
Copy link
Contributor

@rcurtin Thanks for digging into this. Even though there is a slowdown when including everything, I bet that many people would be willing to use this for the sake of convenience and simplicity.

For the ANN serialistion problem, there are a few approaches. First, I suggest making a simple define-based option to enable serialisation of ANNs, which is then detected by mlpack headers. This is instead of users directly using CEREAL_REGISTER_TYPE(Layer). For example:

#define MLPACK_ENABLE_ANN_SERIALISATION
#include <mlpack.hpp>

and then mlpack would internally call the appropriate set of CEREAL_REGISTER_TYPE(Layer) functions.

Second, serialisation is not execution-time critical, so we can direct the compiler not to spend too much time optimising the produced code. This can be accomplished by adding attributes to serialisation-related functions. As an example, say we want to have the attributed named mlpack_cold, which can be defined for GCC and clang as follows:

#define mlpack_cold

#if defined(__GNUG__) && (!defined(__clang__))
  #undef  mlpack_cold
  #define mlpack_cold __attribute__((__cold__))
#endif

#if defined(__clang__)
  #if !defined(__has_attribute)
    #define __has_attribute(x) 0
  #endif

#if __has_attribute(__cold__)
  #undef  mlpack_cold
  #define mlpack_cold __attribute__((__cold__))
#elif __has_attribute(__minsize__)
  #undef  mlpack_cold
  #define mlpack_cold __attribute__((__minsize__))
#endif

Then a serialisation function would be decorated with mlpack_cold along these lines:

inline
mlpack_cold
bool
serialise(output_object& out, const input_object& in) { ... }

I've used a similar approach to decorate Mat::save() and Mat::load() functions within Armadillo, as well as a subset of functions within the diskio class. This has led to minor but measurable decreases in compilation time. It's possible that the effect with Cereal would be more pronounced, depending on how complex the underlying code is.

Yet another option would be to rewrite the ANN serialisation code, so that all the ANN layers are first converted into a contiguous block of memory (akin to a raw dump). Then Cereal would be used only to serialise that block of memory, instead of hooking into the guts of ANN code.

@rcurtin
Copy link
Member Author

rcurtin commented Aug 15, 2022

For the ANN serialistion problem, there are a few approaches. First, I suggest making a simple define-based option to enable serialisation of ANNs, which is then detected by mlpack headers.

Yep, that's exactly what I'm thinking!

Second, serialisation is not execution-time critical, so we can direct the compiler not to spend too much time optimising the produced code. This can be accomplished by adding attributes to serialisation-related functions. As an example, say we want to have the attributed named mlpack_cold, which can be defined for GCC and clang as follows:

This is a nice idea but I think that it would not apply here. The issue is not so much that the code generated by CEREAL_REGISTER_TYPE(Linear) (or whatever layer) is being over-optimized, but instead that when I call CEREAL_REGISTER_TYPE(Linear), Linear is actually the templated class LinearType<arma::mat>, which must be fully instantiated---and the methods involved aren't just serialization but instead all the methods that each layer implements. So it might still be useful to add something like mlpack_cold in various places, but I don't think it will change the reality that registering serialization for a layer causes a significant number of template instantiations. Let me know if I overlooked something there. 👍

@rcurtin rcurtin merged commit a472496 into mlpack:master Aug 15, 2022
@rcurtin rcurtin deleted the everything-header-only branch August 15, 2022 12:54
@shrit
Copy link
Member

shrit commented Aug 15, 2022

@rcurtin Great work on this. I wanted to review it this weekend, but I had no chance. I was facing some DNS issues.
This is really huge 🚀 💯

@rcurtin
Copy link
Member Author

rcurtin commented Aug 15, 2022

No worries, there are a few follow-up PRs in progress if you want to review those 😄 and, if you find any issues with this PR, I'll handle any comments that you post. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants