-
-
Notifications
You must be signed in to change notification settings - Fork 198
Fix performance regression due to #809 #836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Particularly wondering what @bob-carpenter and @wds15 think of this one. I wonder if I'm overlooking something. |
…xp1 (branches/release_50)
|
This will work and will be very fast, I think. However: You have declared the global stack to be |
|
Isn't every global in some namespace, even if only ::?
I'm not sure what internal linkage means here. Does it mean the global stack is invisible from other translation units? If so, I'd think that'd be a problem already.
|
|
Yeah, I think "namespace-level" and "global" are basically the same. It means each translation unit that includes this chainablestack.hpp will have its own copy of the AD stack. I actually think this is most likely fine for the reasons Sebastian outlined above (and we had some previous discussion). For this not to work (and as far as I can tell), we would need someone to compile some Stan functions into one translation unit, and then link it with some other Stan functions in another translation unit, and for this user to expect that both would access the same AD graph. This seems like a weird use-case, no? Plus, I imagine the way that we inline every function/method (thus generating a separate copy of each method for each translation unit) would have the same effect as it is? |
|
No, i don't think that the inlining leads to the same effect. The ad stack is now part of the translational unit whereas in the standard c++ singleton design which is now in, it is guaranteed that in a given program (no matter how many translational units) there will only ever be a single stack instance. So our ad stack is not any more a true singleton now, but it is a translational unit/namespace instance. Since each model is its own translational unit everything is fine given how stan works...though people on the net advise against this pattern as it causes confusion. Unless someone else has good reasons not to merge this, we should probably go ahead with this for now until we find a fast singleton pattern. |
|
Not weird at all. It's how we'd be trying to save compile time for instance---we'd compile all the matrix operations in their own translation unit and then link rather than recompile every time. In fact, I think doing this should be a priority for us as they take a long time to compile and won't be too large to precompile---but that's a different issue.
|
|
Damn. Guess this won't work either.
…On Fri, Apr 13, 2018 at 11:21 Bob Carpenter ***@***.***> wrote:
Not weird at all. It's how we'd be trying to save compile time for
instance---we'd compile all the matrix operations in their own translation
unit and then link rather than recompile every time. In fact, I think doing
this should be a priority for us as they take a long time to compile and
won't be too large to precompile---but that's a different issue.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#836 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAxJ7K-5anpW7f9G9T46sv3PhGTERryUks5toMJygaJpZM4TSOcK>
.
|
|
Wait, I'm confused - I don't think we could do that as is! As far as I can see, a symbol either has external or internal linkage, meaning either the symbol is exported during link time and visible from other translation units, or it is not. If it is, then you would get conflicts for defining the same symbol in different translation units. Classes have something special in C++ if they are exactly the same class, but static class members suffer the same problem. I propose that this PR maintains the current state of affairs (before the threaded AD PR), solves the performance issues, and lets us use threading. I started a discourse thread on how to add new libraries easily, which I believe is the proper solution (compiling a separate library with just the static ChainableStack, and linking that in at the end just once). |
|
All the matrix functions would be defined exactly once in their own translation unit. They will be externally linked (default behavior).
Other translation units would have to use pure header files (without definitions) for matrix functions to ensure the matrix functions don't get recompiled (like they would if we inlined definitions).
If this PR prevents compiled autodiff functions in standalone translation units, it's going to incur a lot of technical debt that we'll probably have to pay off in the future. And it'll probably break things our users are doing.
… On Apr 13, 2018, at 1:53 PM, seantalts ***@***.***> wrote:
Wait, I'm confused - I don't think we could do that as is! As far as I can see, a symbol either has external or internal linkage, meaning either the symbol is exported during link time and visible from other translation units, or it is not. If it is, then you would get conflicts for defining the same symbol in different translation units. Classes have something special in C++ if they are exactly the same class, but static class members suffer the same problem.
I propose that this PR maintains the current state of affairs (before the threaded AD PR), solves the performance issues, and lets us use threading. I started a discourse thread on how to add new libraries easily, which I believe is the proper solution (compiling a separate library with just the static ChainableStack, and linking that in at the end just once).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
|
Yep, sounds good. But where does the ChainableStack live? There must be two copies, or else we wouldn't be able to link two translation units together that both include chainablestack.hpp. My hypothesis is that this preserves the status quo here, as static globals have the same linkage as static class members (what we had before these PRs). I think it's best if I make a proof-of-concept... |
|
I'm happy to defer to your wisdom on this. As long as this doesn't preclude us from writing matrix functions in a standalone translation unit any more than whatever we had before, I'm totally OK with the change.
I'm less keen if it introduces technical debt that precludes us from ever compiling in multiple translation units.
Maybe we need to talk about this in person. I'm not very good with all the build terminology, so may be describing things the wrong way and just confusing everyone.
… On Apr 13, 2018, at 3:42 PM, seantalts ***@***.***> wrote:
Yep, sounds good. But where does the ChainableStack live? There must be two copies, or else we wouldn't be able to link two translation units together that both include chainablestack.hpp.
My hypothesis is that this preserves the status quo here, as static globals have the same linkage as static class members (what we had before these PRs).
I think it's best if I make a proof-of-concept...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
|
I made a POC and things do not operate like I have been reading / thinking they should. The version we used to have has a single AD stack across multiple translation units and no link-time errors about multiple definitions (maybe due to the weird thing C++ does for classes in these cases). The version in this PR behaves as expected and has a different AD stack for each TU. So let's not do this version... Any ideas for doing this in a performant way? @wds15 I didn't really understand what that guy was proposing, exactly... Basically if you wrapped chainable_stack in this PR in a class as a static variable instead of as a namespace level global static? |
|
A possible quick solution for now could be to declare the global ad stack as external in the header files and always require that an additional cpp is linked with the program using I did not get the solution I linked immediately up and running. I need a moment to go through that and will do that early next week the latest. But basically, I think you are right in that we are looking for a |
|
I got a version along the lines of the 2nd thing working, but need to test
performance
…On Fri, Apr 13, 2018 at 17:22 wds15 ***@***.***> wrote:
A possible quick solution for now could be to declare the global ad stack
as external in the header files and always require that an additional cpp
is linked with the program using stan-math. This should give us a single
AD stack in any program, but requires to pass into the linker this
additional file.
I did not get the solution I linked immediately up and running. I need a
moment to go through that and will do that early next week the latest. But
basically, I think you are right in that we are looking for a static
class member (with fast access).
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#836 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAxJ7PM4cV6i_IdtcozhW2XZJwYhHQ_aks5toRcsgaJpZM4TSOcK>
.
|
|
Do we need anything more than the following to implement a mutable thread-local singleton? I can't imagine that using |
|
I think that's it - we actually don't even want the |
|
Check out #840 |
I'm hoping to get comments on this, because it seems too simple but has the performance characteristics we want, and I don't think we have to worry about any of the thread contention issues that cause people to use most of the singleton patterns we've seen flying around. Those are all trying to avoid one thread accessing a static global before it has been initialized, but we are either running in single-threaded mode or we are running with a thread_local static global variable instead. So there's never an opportunity for contention.
Fixes #824.
Develop vs this PR:
Copyright and Licensing
Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company):
Columbia University
By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses: