fix performance regression bug, fixes #824 #825

wds15 · 2018-04-08T16:41:26Z

Submission Checklist

Run unit tests: ./runTests.py test/unit
Run cpplint: make cpplint
Declare copyright holder and open-source license: see below

Summary:

Adresses performance regression bug introduced with threaded AD stack change.

It turns out that using a static variables declared as member of a static function is causing problems for the compiler to optimize. This PR changes this such that directly a global instance of ChainableStack is declared. Whenever threads are to be used, the thread_local keyword is used.

This change did solve the performance regression problems:

performance prior to the threaded AD pull (cmdstan hash 8f218b6c0584af995ed7e48faa8408d03cb040ee:

stat_comp_benchmarks/benchmarks/arK/arK.stan,2.10290490389

performance after this change:

stat_comp_benchmarks/benchmarks/arK/arK.stan,1.90999811888

for reference, performance with the threaded AD changes which caused the slow down:

stat_comp_benchmarks/benchmarks/arK/arK.stan,2.34019263983

all of the above are without threading enabled.

Intended Effect:

recover speed.

How to Verify:

Run the performance regression framework. Please use as reference for the cmdstan hash 8f218b6c0584af995ed7e48faa8408d03cb040ee.

Side Effects:

Looks like that things get faster.

Documentation:

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Sebastian Weber

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

…stable/2017-11-14)

seantalts · 2018-04-08T17:17:20Z

I'm looking into this usage of static for a global variable, and I don't think it will work for us in the multiple translation unit case (like rstarnarm needs). SO link.

bgoodri · 2018-04-08T19:41:39Z

For rstanarm, each model is a single translation unit and then they are linked together. We haven't had any linker issues since we inlined everything, but I haven't built rstanarm yet with the new threading stuff.

…

On Sun, Apr 8, 2018 at 1:17 PM, seantalts ***@***.***> wrote: I'm looking into this usage of static for a global variable, and I don't think it will work for us in the multiple translation unit case (like rstarnarm needs). SO link <https://stackoverflow.com/questions/15235526/the-static-keyword-and-its-various-uses-in-c> . — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#825 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADOrqpsuP4ujOqUyS98N28j6nxe4re6Eks5tmkYhgaJpZM4TLm1-> .

wds15 · 2018-04-08T20:09:07Z

Hmm...the multiple translation thing could be an issue, let’s see. I need to run these benchmarks with thread local as this could solve this translation unit issue should we have a problem here.

bgoodri · 2018-04-08T22:19:07Z

Can we have a thing like this inside the model's namepsace?

bgoodri · 2018-04-09T01:29:20Z

Still working with Debian stable

wds15 · 2018-04-09T07:01:37Z

@bgoodri one such thing per namespace may work.. but that is probably not nice.

@seantalts Is there a test which should break for this multiple translation unit? I need to read a bit more about this, but I think you are right. If I got on a quick read your stackoverflow article right, then the pattern which we have right now follows their recommendations.

Now, turning on the thread_local thing causes now quite a slow down to 3 with this code (a bigger hit than what we had before). If we could make thread_local's work fast this should solve the multiple translation unit. A possibility for that could be to hold thread_local pointers in the functions which need to access the stack. This is a bit tricky given all the constraints we have.

seantalts · 2018-04-09T13:34:18Z

@bgoodri I think you have more experience with the multiple translation unit stuff from rstanarm. Let me outline what I think happens, and hopefully you can correct me as needed. Each model gets built into a shared library and then they are all linked together into a single binary. That binary can execute only one model on any given run. If this is true, maybe it's okay that each model compiled into each translation unit has its own autodiff stack? Since each translation unit / model has basically the entire math library coded into it. I'm just not sure what happens during linking - if there's a link-time optimization phase and it normally notices that all of the math library is the same between models and it can eliminate that redundancy, now it either 1) might not be able to because of this new global static variable or 2) thinks it still can, and potentially something weird happens where some functions are using one autodiff stack and others are using another? or 3) it can perfectly eliminate the redundancy and there is only one autodiff stack left at the end of link time optimization. 3 would be nice :)

Our multiple translation unit tests are not really prepared to answer questions like these. I'm hoping @bgoodri knows, or we just need to try building and testing rstanarm with these changes. Also open to other ideas if people have them.

wds15 · 2018-04-09T20:15:29Z

compilers are really weird. I think I found a solution which I put into another pull.

Sebastian Weber and others added 2 commits April 8, 2018 18:28

fix performance regression bug

2ccfa6c

[Jenkins] auto-formatting by clang-format version 6.0.0 (tags/google/…

0722efc

…stable/2017-11-14)

seantalts self-requested a review April 8, 2018 16:55

wds15 closed this Apr 9, 2018

seantalts mentioned this pull request Apr 13, 2018

Fix performance regression due to #809 #836

Closed

wds15 deleted the feature/issue-824-fix-speed branch April 17, 2018 19:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix performance regression bug, fixes #824 #825

fix performance regression bug, fixes #824 #825

Uh oh!

wds15 commented Apr 8, 2018

Uh oh!

seantalts commented Apr 8, 2018

Uh oh!

bgoodri commented Apr 8, 2018 via email

Uh oh!

wds15 commented Apr 8, 2018

Uh oh!

bgoodri commented Apr 8, 2018

Uh oh!

bgoodri commented Apr 9, 2018

Uh oh!

wds15 commented Apr 9, 2018

Uh oh!

seantalts commented Apr 9, 2018

Uh oh!

wds15 commented Apr 9, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

fix performance regression bug, fixes #824 #825

fix performance regression bug, fixes #824 #825

Uh oh!

Conversation

wds15 commented Apr 8, 2018

Submission Checklist

Summary:

Intended Effect:

How to Verify:

Side Effects:

Documentation:

Copyright and Licensing

Uh oh!

seantalts commented Apr 8, 2018

Uh oh!

bgoodri commented Apr 8, 2018 via email

Uh oh!

wds15 commented Apr 8, 2018

Uh oh!

bgoodri commented Apr 8, 2018

Uh oh!

bgoodri commented Apr 9, 2018

Uh oh!

wds15 commented Apr 9, 2018

Uh oh!

seantalts commented Apr 9, 2018

Uh oh!

wds15 commented Apr 9, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants