Multiple definitions when including Stan headers in several source files #311

rgiordan · 2016-06-29T20:58:59Z

Summary:

Stan headers cause multiple definition errors when included in several source files that are linked together in a library.

Description:

If the Stan headers are included in several source files that are compiled separately and then linked, the compiler fails with multiple definition errors. I am a C++ novice, but I believe this is because non-inline functions are being instantiated in the header files (e.g. print_mat_size).

Reproducible Steps:

A minimal example to reproduce can be found in this repo. In short, it has two cpp files that each instantiate templates using Stan variables. They can be successfully compiled into object files, but the linker fails with multiple definition errors.

// source1.hpp:
# ifndef SOURCE1_H
# define SOURCE1_H

#include <boost/math/special_functions/digamma.hpp>
#include <boost/math/special_functions/gamma.hpp>

using boost::math::lgamma;
using boost::math::digamma;

# include <stan/math.hpp>
# include "stan/math/fwd/scal.hpp"

using var = stan::math::var;
using fvar = stan::math::fvar<var>;

template <typename T> T multivariate_lgamma(T x, int p) {
  T result = log(M_PI) * p * (p - 1.0) / 4.0;
  for (int i = 1; i <= p; i++) {
    result += lgamma(x + 0.5 * (1 - (double)i));
  }
  return result;
}

extern template double multivariate_lgamma(double x, int p);
extern template var multivariate_lgamma(var x, int p);
extern template fvar multivariate_lgamma(fvar x, int p);

# endif

// source1.cpp:
# include "source1.hpp"
# include <stan/math.hpp>
# include <stan/math/fwd/scal.hpp>

using var = stan::math::var;
using fvar = stan::math::fvar<var>;

template double multivariate_lgamma(double x, int p);
template var multivariate_lgamma(var x, int p);
template fvar multivariate_lgamma(fvar x, int p);


// source1.hpp:
# ifndef SOURCE2_H
# define SOURCE2_H

#include <boost/math/special_functions/digamma.hpp>
#include <boost/math/special_functions/gamma.hpp>

using boost::math::lgamma;
using boost::math::digamma;

# include <stan/math.hpp>
# include "stan/math/fwd/scal.hpp"

using var = stan::math::var;
using fvar = stan::math::fvar<var>;

template <typename T> T multivariate_digamma(T x, int p) {
  T result = 0.0;
  for (int i = 1; i <= p; i++) {
    result += digamma(x + 0.5 * (1 - (double)i));
  }
  return result;
}

extern template double multivariate_digamma(double x, int p);
extern template var multivariate_digamma(var x, int p);
extern template fvar multivariate_digamma(fvar x, int p);

# endif

// source2.cpp:
# include "source2.hpp"
# include <stan/math.hpp>
# include <stan/math/fwd/scal.hpp>

using var = stan::math::var;
using fvar = stan::math::fvar<var>;

template double multivariate_digamma(double x, int p);
template var multivariate_digamma(var x, int p);
template fvar multivariate_digamma(fvar x, int p);

Current Output:

The linker fails with errors like this:

[ 33%] Building CXX object CMakeFiles/StanHeaderExample.dir/source2.cpp.o
Linking CXX shared library libStanHeaderExample.so
CMakeFiles/StanHeaderExample.dir/source2.cpp.o: In function `stan::math::print_mat_size(int, std::ostream&)':
source2.cpp:(.text+0x1390): multiple definition of `stan::math::print_mat_size(int, std::ostream&)'
CMakeFiles/StanHeaderExample.dir/source1.cpp.o:source1.cpp:(.text+0x12c0): first defined here
CMakeFiles/StanHeaderExample.dir/source2.cpp.o: In function `stan::math::csr_u_to_z(std::vector<int, std::allocator<int> > const&, int)':
source2.cpp:(.text+0x13c0): multiple definition of `stan::math::csr_u_to_z(std::vector<int, std::allocator<int> > const&, int)'
CMakeFiles/StanHeaderExample.dir/source1.cpp.o:source1.cpp:(.text+0x12f0): first defined here
... and so on.

Expected Output:

A compiled shared library with templates instantiated with Stan types.

Additional Information:

Provide any additional information here.

Current Version:

v2.9.0

The text was updated successfully, but these errors were encountered:

bob-carpenter · 2016-06-30T03:10:12Z

I'm afraid you're not going to be able to do what you're trying to do. The problem is that all the includes include also statics and other non-inline functions, so they can't have multiple definitions. I don't think this is something we can fix.

rgiordan · 2016-06-30T14:27:12Z

Again, I'm a little new to C++, but isn't it unusual to put function definitions in header files like this? It seems to me that this problem really limits the usefulness of this library. (Unless there is a work-around, I think it would be a deal-breaker for me, since it significantly drives up compile times for complex models.)

Why can't the functions be inlined? Most of the ones are giving errors in this example were quite small.

betanalpha · 2016-06-30T14:51:56Z

That’s conventual for much of C++ but things get dramatically more
complicated with the introduction of templates, not to mention the
global memory we need for implementing automatic differentiation.

The long compilation times are a serious concern, but at the moment
there aren’t many solutions that don’t resort to having gigantic shared
objects that would introduce their own serious problems.

On Jun 30, 2016, at 3:27 PM, Ryan notifications@github.com wrote:

Again, I'm a little new to C++, but isn't it unusual to put function definitions in header files like this? It seems to me that this problem really limits the usefulness of this library. (Unless there is a work-around, I think it would be a deal-breaker for me, since it significantly drives up compile times for complex models.)

Why can't the functions be inlined? Most of the ones are giving errors in this example were quite small.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

rgiordan · 2016-06-30T15:32:41Z

The problem with multiple includes seems to be tiny little functions like this one:

    int csr_u_to_z(const std::vector<int>& u, int i) {
      check_positive("csr_u_to_z", "u.size()", u.size());
      check_range("csr_u_to_z", "i", u.size(), i + 1, "index out of range");
      return u[i + 1] - u[i];
    }

If one were to compile these functions separately or inline them, would it really result in such a bloated shared library? Or is there a different problem that I don't see?

For some context, I should explain why this is important. I routinely perform inference on models composed of a small number of types of random variables. I'd like to build a library with these distributions and use Stan's autodifferentiation to optimize these models as well as compute derivatives for sensitivity analysis in many different contexts (e.g. this one, and many more if I can get around this difficulty). As-is, because I cannot pre-instantiate instances of my templates with Stan types in a shared library, I have to wait minutes each time I change a single line of code (e.g. tweak a unit test). It's grown quite unusable.

In other words, for my own projects I'd be happy to trade bloated shared libraries for faster compile time, but because of a small number of tiny functions in the Stan headers I don't have the option. It seems to me that Stan the modeling language and I could make different decisions about this tradeoff if only the header files were true header files without function definitions.

bob-carpenter · 2016-06-30T18:18:49Z

It'd be nice if it were just non-templated functions, but
the problem is that you can't have two instantiations of
the same function.

We could break things down into header, definition, and
instantiation, and that could work in most cases. We did
that where necessary with the grammars to work around Windows
compilation limits. But it's a lot of work with no payoff in
our use cases.

Instead, we've gone the with the standard header-only library
pattern for heavily templated libraries. It's the same
organization you see in Boost and Eigen.

When we say bloated shared libraries, we mean really bloated.
As in perhaps terabytes if we di the whole thing.
There are something like 10K instantiations just of the
univariate student-T function!

Bob

On Jun 30, 2016, at 8:32 AM, Ryan notifications@github.com wrote:

The problem with multiple includes seems to be tiny little functions like this one:
int csr_u_to_z(const std::vector<int>& u, int i) {
  check_positive("csr_u_to_z", "u.size()", u.size());
  check_range("csr_u_to_z", "i", u.size(), i + 1, "index out of range");
  return u[i + 1] - u[i];
}
If one were to compile these functions separately or inline them, would it really result in such a bloated shared library? Or is there a different problem that I don't see?

For some context, I should explain why this is important. I routinely perform inference on models composed of a small number of types of random variables. I'd like to build a library with these distributions and use Stan's autodifferentiation to optimize these models as well as compute derivatives for sensitivity analysis in many different contexts (e.g. this one, and many more if I can get around this difficulty). As-is, because I cannot pre-instantiate instances of my templates with Stan types in a shared library, I have to wait minutes each time I change a single line of code (e.g. tweak a unit test). It's grown quite unusable.

In other words, for my own projects I'd be happy to trade bloated shared libraries for faster compile time, but because of a small number of tiny functions in the Stan headers I don't have the option. It seems to me that Stan the modeling language and I could make different decisions about this tradeoff if only the header files were true header files without function definitions.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

rgiordan · 2016-06-30T18:40:10Z

I think there's some miscommunication -- I'm sorry if it's on my part.

Other header-only libraries (e.g. Eigen) do not have the problem that I'm describing in Stan. I can do something like

First source file:

// source1.cpp
# include <Eigen/Dense>

Eigen::VectorXd MyFun() {
  ... stuff ...
}

Second source file:

// source2.cpp
# include <Eigen/Dense>

Eigen::VectorXd MyOtherFun() {
  ... stuff ...
}

I can then link both to source1.o and source2.o without errors. This is because Eigen is really header-only -- its headers contain only templates and definitions. I cannot do this in Stan (per the above example, for instance).

I'm not suggesting that you guys should instantiate all the Stan functions in your library. My problem is that, in my project which uses Stan, I don't have the option to instantiate them in a shared library. And the reason I can't do that has nothing to do with the size of the Stan objects, but only because there are these tiny non-templated functions like print_mat_size that are instantiated in the Stan headers preventing me from including the Stan headers in more than one translation unit.

Does this make sense? Again, I'm sorry if I'm missing some basic point.

rgiordan · 2016-07-01T18:57:22Z

A similar issue has been reported previously in stan issue #1149. I'm not sure why it was closed -- maybe for inactivity.

To clarify, I've made a simpler example (also in the same repo linked to earlier). Here it is:

source1.cpp:

# include <stan/math.hpp>
stan::math::var Function1() {
  return 0;
}

source2.cpp:

# include <stan/math.hpp>
stan::math::var Function2() {
  return 0;
}

These two files cannot be linked without multiple definition errors (unlike other header-only libraries like Eigen).

To further clarify, the linline branch of my fork fixes the problem (at least for this simple example). I'm not really asserting that inlining is the answer, but I'm trying to be clear about what the problem is.

bgoodri · 2016-07-02T20:53:24Z

Yeah, we had this linker problem with rstanarm (with log_sum_exp IIRC). The only solution I could come up with was to concatenate all the C++ files into one file and compile that.

rgiordan · 2016-07-02T22:55:59Z

@bgoodri , up to now that is what I've been doing, too, though with moderately complex projects the compile times have become prohibitively long.

bgoodri · 2016-07-02T23:35:47Z

Even in the best case scenario, the compile times are going to be long. But
if we sort this issue out, the C++ files can be compiled in parallel,
possibly at the expense of a lot more memory consumed.

On Sat, Jul 2, 2016 at 6:56 PM, Ryan notifications@github.com wrote:

@bgoodri https://github.com/bgoodri , up to now that is what I've been
doing, too, though with moderately complex projects the compile times have
become prohibitively long.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#311 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ADOrqvrzYE2U9Jy1ykFF4u8o_m-6WwMyks5qRuyAgaJpZM4JBjyo
.

rgiordan · 2016-07-02T23:51:01Z

Yes. Also, at least in my case, I could solve my compile time problems by pre-compiling the relatively limited functionality that I need into a shared library.

bob-carpenter · 2016-07-04T19:04:19Z

I think the right general solution is to do what we did
with the grammars:

pure header: foo.hpp
definition: foo_def.hpp
instantiation: foo_inst.cpp

We can do that with most of our library and it'll allow
use as header-only through foo_def.hpp, or as a shared
object by including foo.hpp and linking foo_inst.cpp.

The problem is that foo_inst.cpp changes per instantiation,
and in many of our probability function classes, there are
just too many to preinstantiate.

Also, we have some shared global objects that would have
to be taken care of.

I wouldn't mind at all if someone refactored all the code
into this pattern. It's a lot of work!

Bob

On Jul 2, 2016, at 4:51 PM, Ryan notifications@github.com wrote:

Yes. Also, at least in my case, I could solve my compile time problems by pre-compiling the relatively limited functionality that I need into a shared library.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

rgiordan · 2016-07-04T20:47:53Z

@bob-carpenter , that does seem like it would be a lot of work. But why does all the code have to be rewritten? Things are mostly fine left as templates and inlines, it seems to me that a few small functions were forgotten. For example, why can't the following function (and other similarly small functions) be inlined?

    double abs(double x) {
      return std::fabs(x);
    }

There are some exceptions that are longer, like log_sum_exp for vectors of doubles which takes thirteen lines, or cholesky_decompose for matrices of var which is pretty long. Maybe those could go in a separate cpp files if you think they're too big to inline. But that's much less work than a re-organization of the entire project.

bob-carpenter · 2016-07-04T21:25:25Z

Is every function and template function declared inline
in Eigen? Isn't it an error to instantiate the same
template function in two different translation units unless
it's declared inline?

I'd be OK inlining everything if it doesn't increase our
compilation time, generate larger object files, or slow down
run time.

Bob

On Jul 4, 2016, at 1:47 PM, Ryan notifications@github.com wrote:

@bob-carpenter , that does seem like it would be a lot of work. But why does all the code have to be rewritten? Things are mostly fine left as templates and inlines, it seems to me that a few small functions were forgotten. For example, why can't the following function (and other similarly small functions) be inlined?
double abs(double x) {
  return std::fabs(x);
}
There are some exceptions that are longer, like log_sum_exp for vectors of doubles which takes thirteen lines, or cholesky_decompose for matrices of var which is pretty long. Maybe those could go in a separate cpp files if you think they're too big to inline. But that's much less work than a re-organization of the entire project.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

rgiordan · 2016-07-04T21:46:42Z

Is every function and template function declared inline in Eigen?

No, the Eigen headers only define templates and their implementations. AFAIK, they do not instantiate anything. If they did, you could not include Eigen headers in multiple translation units.

Isn't it an error to instantiate the same template function in two different translation units unless
it's declared inline?

Yes, it is an error to instantiate the same template function in two different translation units unless it is declared inline. However, you can define a template function and its implementation in headers that are included in several translation units as long as they are not instantiated. E.g.

Ok for headers (definition but not instantiation):

template <typename T> T MyFunc(T foo) {
  ... do things with foo ...
}

Not ok for headers (instantiation):

template double MyFunc(double);

Also not ok for headers:

double MyFunc(double x) {
...
}

But this is ok for headers:

inline double MyFunc(double x) {
...
}

Did you take a look at my branch? It might clarify things.

Can you point me to a benchmark that you all use for measuring compile time changes?

bob-carpenter · 2016-07-06T15:33:10Z

Thanks. That's how I thought things worked and didn't
imagine the use case you have (multiple translation units,
each of which instantiates different translates).

So we should definitely work on this as it shouldn't hurt
anything unless we get a lot of code bloat.

Or we can start being better behaved with our non-templated
functions and pull them into proper header files with
properly compiled versions. That should just be a win overall
for compile time.

We don't have benchmarks for compilation. We just look
at wall time using -j4 on our Mac or Linux notebooks. It'd
be great to do this more methodically, but not a requirement
for anything.

Bob

On Jul 4, 2016, at 2:46 PM, Ryan notifications@github.com wrote:

Is every function and template function declared inline in Eigen?

No, the Eigen headers only define templates and their implementations. AFAIK, they do not instantiate anything. If they did, you could not include Eigen headers in multiple translation units.

Isn't it an error to instantiate the same template function in two different translation units unless
it's declared inline?

Yes, it is an error to instantiate the same template function in two different translation units unless it is declared inline. However, you can define a template function and its implementation in headers that are included in several translation units as long as they are not instantiated. E.g.

Ok for headers (definition but not instantiation):

template T MyFunc(T foo) {
... do things with foo ...
}

Not ok for headers (instantiation):

template double MyFunc(double);

Also not ok for headers:

double MyFunc(double x) {
...
}

But this is ok for headers:

inline double MyFunc(double x) {
...
}

Did you take a look at my branch? It might clarify things.

Can you point me to a benchmark that you all use for measuring compile time changes?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

bgoodri · 2016-07-09T19:34:21Z

Can you try it after adding --allow-multiple-definition to CXXFLAGS?

rgiordan · 2016-07-09T19:41:21Z

@bgoodri , can you elaborate a little on what you have in mind? I don't see a --allow-multiple-definition option in gcc or clang. Besides, I think it's better style to simple not multiply define functions across translation units.

bgoodri · 2016-07-09T19:45:18Z

I meant LDFLAGS as it is a linker option
http://linux.die.net/man/1/ld
I don't think designing the Stan library correctly is in the cards.

rgiordan · 2016-07-09T19:47:27Z

Why do you say it's not in the cards? Do you have any objection to PR 313, which fixes this problem?

bgoodri · 2016-07-09T19:51:16Z

That PR is fine I guess. I was referring to splitting up the header files into def and inst.

rgiordan · 2016-07-09T19:53:01Z

Yes. I think it is not actually necessary to split header files into def and inst.

Fixes #311. Inline functions

rgiordan mentioned this issue Jul 7, 2016

Inline functions #313

Merged

3 tasks

syclik closed this as completed in #313 Jul 20, 2016

syclik added a commit that referenced this issue Jul 20, 2016

Merge pull request #313 from rgiordan/inline_pr

c14bbdb

Fixes #311. Inline functions

syclik modified the milestone: v2.11.0 Jul 27, 2016

syclik mentioned this issue Jul 27, 2016

Add test for multiple translation units #328

Closed

bgoodri mentioned this issue Sep 17, 2016

Still have multiple definitions when including Stan headers in several source files #394

Closed

TheCynosure mentioned this issue Apr 8, 2020

Added functionality to wait for initialization. ut-amrl/config-reader#4

Merged

bbbales2 mentioned this issue Feb 3, 2021

Cash karp ckrk #2336

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple definitions when including Stan headers in several source files #311

Multiple definitions when including Stan headers in several source files #311

rgiordan commented Jun 29, 2016

bob-carpenter commented Jun 30, 2016

rgiordan commented Jun 30, 2016

betanalpha commented Jun 30, 2016

rgiordan commented Jun 30, 2016

bob-carpenter commented Jun 30, 2016

rgiordan commented Jun 30, 2016

rgiordan commented Jul 1, 2016

bgoodri commented Jul 2, 2016

rgiordan commented Jul 2, 2016

bgoodri commented Jul 2, 2016

rgiordan commented Jul 2, 2016

bob-carpenter commented Jul 4, 2016

rgiordan commented Jul 4, 2016

bob-carpenter commented Jul 4, 2016

rgiordan commented Jul 4, 2016

bob-carpenter commented Jul 6, 2016

bgoodri commented Jul 9, 2016

rgiordan commented Jul 9, 2016

bgoodri commented Jul 9, 2016

rgiordan commented Jul 9, 2016

bgoodri commented Jul 9, 2016

rgiordan commented Jul 9, 2016

Multiple definitions when including Stan headers in several source files #311

Multiple definitions when including Stan headers in several source files #311

Comments

rgiordan commented Jun 29, 2016

Summary:

Description:

Reproducible Steps:

Current Output:

Expected Output:

Additional Information:

Current Version:

bob-carpenter commented Jun 30, 2016

rgiordan commented Jun 30, 2016

betanalpha commented Jun 30, 2016

rgiordan commented Jun 30, 2016

bob-carpenter commented Jun 30, 2016

rgiordan commented Jun 30, 2016

rgiordan commented Jul 1, 2016

bgoodri commented Jul 2, 2016

rgiordan commented Jul 2, 2016

bgoodri commented Jul 2, 2016

rgiordan commented Jul 2, 2016

bob-carpenter commented Jul 4, 2016

rgiordan commented Jul 4, 2016

bob-carpenter commented Jul 4, 2016

rgiordan commented Jul 4, 2016

bob-carpenter commented Jul 6, 2016

bgoodri commented Jul 9, 2016

rgiordan commented Jul 9, 2016

bgoodri commented Jul 9, 2016

rgiordan commented Jul 9, 2016

bgoodri commented Jul 9, 2016

rgiordan commented Jul 9, 2016