Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/intel tbb lib #1180

Merged
merged 81 commits into from
Oct 6, 2019
Merged

Feature/intel tbb lib #1180

merged 81 commits into from
Oct 6, 2019

Conversation

wds15
Copy link
Contributor

@wds15 wds15 commented Mar 24, 2019

Summary

Adds the Intel TBB 2019 Update 8 to stan-math's makefile system.

The new dependency checklist has been filled out on the wiki.

The Intel TBB is dynamically linked to programs. For robust linkage with the final programs, the rpath facility of linkers are used. This hard-codes the absolute path to the TBB library into the final binaries such that no additional user settings have to be made.

However, the Windows platform is special in two ways:

  1. The rpath linking logic is not supported. Instead, the PATH variable of the user must include the directory of the binary TBB libraries.
  2. On Windows the make variant mingw32-make must be used to build the Intel TBB since only this variant of make is supported by the Intel TBB sources.

The Intel TBB is introduced to the C++ stan-math sources in the PR #1376

Tests

The Intel TBB has been added as dependency to the tests such that the test itself is if it builds ok.

Side Effects

  • On Windows the mingw32-make has to be used from the RTools instead of the make command. Otherwise the TBB makefiles do not work.

  • On Windows the PATH variable must include the directory where TBB binaries are stored which is configured to be by default stan-math/lib/tbb.

Checklist

  • Math issue Add Intel TBB library #1179

  • Copyright holder: Sebastian Weber (copyright for the Intel TBB is with Intel, of course)

    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
    - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
    - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

@wds15 wds15 changed the title Feature/intel tbb lib WIP Feature/intel tbb lib Mar 24, 2019
@SteveBronder
Copy link
Collaborator

Q, is this going to be bad for non-Intel systems?

@bgoodri
Copy link
Contributor

bgoodri commented Mar 24, 2019

Is there a restriction on the bitness of the Stan program when linking with an Intel TBB that was build with 32 bit mingw? I guess we are only using 32 bit integers at the moment, but I had started to write a script to convert to 64 bit integers.

@bgoodri
Copy link
Contributor

bgoodri commented Mar 25, 2019

@SteveBronder there are a few mentions of changes to support ARM on the Intel TBB change log
https://github.com/01org/tbb/blob/tbb_2019/CHANGES
and the RcppParallel package has been including (an older version of) this for a while so I am pretty sure it is okay on SPARC.

@bgoodri
Copy link
Contributor

bgoodri commented Mar 25, 2019

Also, Intel's Parallel STL, which is intended to be an implementation of the C++17 standard, uses TBB and Intel gave it to the LLVM repo and LLVM works fine on non-Intel hardware AFAIK.

@wds15
Copy link
Contributor Author

wds15 commented Mar 25, 2019

@bgoodri I am not sure why 64 bit integers are a consideration from your perspective for the Intel TBB. I don't see how that interacts, but I don't know. I guess we are good; not sure though.

@SteveBronder No, this is not bad news for AMD, for example. Reasons:

  • Intel TBB is a C++ library. Any compiler you use will make optimised assembler out of it optimal for whatever processor it should run on
  • The LLVM project will use the TBB as basis for their C++17 implementation of the parallel STL bits
  • The gcc project does the same

What is certainly correct is that there are some ifdefs in their code make things work super-optimal on intel compilers and moreover the unit tests for the library are executed on Intel hardware only.

I asked myself this questions and for the above reasons I came to the conclusion we are good.

@wds15
Copy link
Contributor Author

wds15 commented Mar 25, 2019

@seantalts Does Jenkins wipe out all previous results between commits when retesting? I am asking as I wonder if I need to do a manual clean once in a while. Thanks.

@bgoodri
Copy link
Contributor

bgoodri commented Mar 25, 2019

Like if Stan moves to 64 bit integers and someone passes x_i to map_rect, will their computer explode? I suspect not because other projects (albeit not R) must have been using 64 bit integers with TBB sometime over the past decade.

@seantalts
Copy link
Member

@seantalts Does Jenkins wipe out all previous results between commits when retesting? I am asking as I wonder if I need to do a manual clean once in a while. Thanks.

It very much should, but I have seen (rare) cases of a job exiting in a really unusual way (maybe the machine drops out of the network or the whole process segfaults) and then there can be stuff left to clean up. That said, I think it should get a new workspace even in that situation.

@wds15
Copy link
Contributor Author

wds15 commented Mar 25, 2019

I really do not see how 64bit integers are a problem for the TBB. So, yes, this will work to my expectation.

@rok-cesnovar
Copy link
Member

Well in that case I didnt have to change the update-script at all and I will just revert back to what it was.

build the Intel TBB on Windows (and not otherwise)
@wds15
Copy link
Contributor Author

wds15 commented Oct 6, 2019

Ok. If you can revert that would be great (EDIT: ah, it's already done - great). Sorry for the confusion.

I just filed an important update to the makefile logic with the TBB: The makefile check on windows for the mingw32-make was flawed. The issue was that the check was marked ".PHONY" which implied that the TBB library targets (which do depend on the makefile check) were never clean for make, because they did depend on something which is never clean. This way the make for the tbb libraries were always called even when the TBB was already build.

I changed this such that the make check now depends on a file which I simply create by touch after a successful check for mingw32-make on Windows. This new behavior now ensures that you will only need mingw32-make to build the TBB and moreover the makefiles for the TBB are never triggered again once the TBB is build.

So the cool thing now is that users only have to remember the mingw32-make thing when they actually build the TBB. They can continue using "make examples/bernoulli/bernoulli.exe" on windows (after the TBB has been build). It will just work, because the TBB makefiles are not touched any longer.

@rok-cesnovar
Copy link
Member

Ok. If you can revert that would be great. Sorry for the confusion.

Done. No problem. I was the one confused :)

This way the make for the tbb libraries were always called even when the TBB was already build.
So the cool thing now is that users only have to remember the mingw32-make thing when they actually build the TBB.

Great! I think this simplifies it a lot. I will review the changes later.

So to sum up:

  • cmdstan: doc question needs resolving + me testing your changes on path work
  • math: tbbmalloc policy for mac only needs to be added here or in the integrate

Will we need to merge the "integrate TBB" PR to this branch or can you switch the PR base branch?

@wds15
Copy link
Contributor Author

wds15 commented Oct 6, 2019

  • cmdstan: doc should be good now
  • math: the init branch is branched of this one. So I think I can just merge this lib branch into the other init one and then thats good. We should merge like this:
  1. this one into stan-math (PR-1180)
  2. the respective branch on stan (PR-2769)
  3. the cmdstan PR (PR-710)

What I do not know is how we handle the updates of the submodules. So the stan PR-2769 needs to be bumped to point to the right stan-math commit. Same thing for the cmdstan PR-710 which needs to point to stan PR-2769.

Right now I am running the tests for PR-1180, but that run now uses PR-2769 and PR-710 in its upstream tests - so that should come out "green". I will start respective ones for stan and for cmdstan.

The update to the makefile wrt to tbbmalloc should be handled in the init PR.

@rok-cesnovar
Copy link
Member

What I do not know is how we handle the updates of the submodules. So the stan PR-2769 needs to be bumped to point to the right stan-math commit. Same thing for the cmdstan PR-710 which needs to point to stan PR-2769.

  • this one into stan-math (PR-1180)
  • the respective branch on stan (PR-2769)
  • the cmdstan PR (PR-710)

But how will that work if we first merge the math one to develop? Wont that break develop then? Develop tests wont have that info?

Couldnt we avoid setting submodules by first merging the cmdstan PR? That wont break anything as the tbb targets will just be empty. Then merge the Stan one (the changes there are only for tests).

And once that is in we merge this one and then the map rect one?

@wds15
Copy link
Contributor Author

wds15 commented Oct 6, 2019

Hmm... that order could work, yes.

@rok-cesnovar
Copy link
Member

Ok, lets move the discussion to the cmdstan one and go one by one.

@stan-buildbot
Copy link
Contributor

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 0.99)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 0.97)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 1.0)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 0.98)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 0.98)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 1.0)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 1.0)
(performance.compilation, 0.99)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 1.0)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 1.01)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 0.99)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 1.0)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 0.99)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 0.95)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 1.0)
Result: 0.98936325739
Commit hash: 8c18f2b704446116499f7b4c87c0144791b33c7e

@stan-buildbot
Copy link
Contributor

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 1.0)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 0.96)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 1.0)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 0.98)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 0.98)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 1.0)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 1.0)
(performance.compilation, 0.99)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 0.97)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 1.01)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 1.01)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 1.0)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 0.99)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 0.99)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 0.98)
Result: 0.99085982256
Commit hash: 8c18f2b704446116499f7b4c87c0144791b33c7e

@stan-buildbot
Copy link
Contributor

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 1.17)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 0.98)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 1.05)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 1.17)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 1.2)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 1.15)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 1.02)
(performance.compilation, 0.99)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 0.99)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 1.01)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 1.11)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 1.08)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 1.03)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 1.04)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 1.07)
Result: 1.07110834232
Commit hash: e7257882e9ab34bac7f2866636551dd249599a0e

Copy link
Member

@rok-cesnovar rok-cesnovar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets go! Very cool @wds15 !

@rok-cesnovar rok-cesnovar merged commit 93ab904 into develop Oct 6, 2019
@wds15 wds15 mentioned this pull request Oct 7, 2019
@seantalts
Copy link
Member

the Daily Stan Performance build failed and it seems related to this PR - https://jenkins.mc-stan.org/blue/organizations/jenkins/Daily%20Stan%20Performance/detail/Daily%20Stan%20Performance/1703/changes/

Can someone take a look? If we can't fix it by the 12th we should probably revert before the feature freeze.

@rok-cesnovar
Copy link
Member

It seems that its trying to rune the command "Stan". And there are spaces in the path (Daily Stan Performance) which leaves me to believe that its probably some compile/link path not in quotes somewhere.

@rok-cesnovar
Copy link
Member

This looks like the culprit:

tbb_root="/Users/Shared/Jenkins/gelman-group-mac/workspace/Daily Stan Performance/lib/stan_math/lib/tbb_2019_U8" CXX="/usr/local/opt/llvm@6/bin/clang++" CC="/usr/local/opt/llvm@6/bin/clang" LDFLAGS="-Wl,-L,"/Users/Shared/Jenkins/gelman-group-mac/workspace/Daily Stan Performance/lib/stan_math/lib/tbb" -Wl,-rpath,"/Users/Shared/Jenkins/gelman-group-mac/workspace/Daily Stan Performance/lib/stan_math/lib/tbb"" /Library/Developer/CommandLineTools/usr/bin/make -C lib/stan_math/lib/tbb -r -f "/Users/Shared/Jenkins/gelman-group-mac/workspace/Daily Stan Performance/lib/stan_math/lib/tbb_2019_U8/build/Makefile.tbbmalloc" compiler=clang cfg=release stdver=c++1y malloc

Specifically I am looking at:

LDFLAGS="-Wl,-L,"/Users/Shared/Jenkins/gelman-group-mac/workspace/Daily Stan Performance/lib/stan_math/lib/tbb" -Wl,-rpath,"/Users/Shared/Jenkins/gelman-group-mac/workspace/Daily Stan Performance/lib/stan_math/lib/tbb""

Those nested quotes look ok? seems iffy.

How can we be sure this fixed it? Get what we believe to be the fix merged and then manually restart the daily performance test?

Hopefully we can replicate this locally otherwise this one is going to be slow to debug.

@wds15
Copy link
Contributor Author

wds15 commented Oct 8, 2019

I just filed a PR on cmdstan which initialises the TBB threadpool. I would assume that the performance tests gets triggered there as well. So if we use the build with parameters there, then we should have a chance to debug in finite time - since this allows us to use any math branch and we jump over the math tests and the Stan tests.

I think you are right on and I will locally set this up and see what happens. Spaces should really be disallowed... I had this problem in the past with boost builds.

I am tempted to disallow spaces...

@seantalts
Copy link
Member

Ah, that's good that that's all it was.

These "Daily " ones don't get triggered, they just run once a night on develop. It seems like it's just the path thing so if you can fix paths with spaces that should do it. Alternatively you could move to disallow spaces but that seems like a larger change that hinders backwards compatibility somewhat (e.g. if you upgrade your install could stop working) so we might wanna raise it at a meeting or something to be sure.

@wds15
Copy link
Contributor Author

wds15 commented Oct 8, 2019

The problem with space is resolved now. The bits for that are part of PR-1376.

There are two Windows specific things which are introduced:

  • PATH variable must be set to include the TBB binary libraries. We have made nice make-file targets for that.
  • mingw32-make must be used on Windows in order to build the TBB

Both of these above things are not yet spread into the last corner of our pipelines. However, the math tests all run just fine with it - on Windows and on any other platform.

So we should by all means be able to sort this out. It's by now - I think - a configuration matter. These configuration requirements are well documented and users having to do this are being helped.

@seantalts
Copy link
Member

Where else do the PATH variable and mingw32-make need to go? CmdStan?

@rok-cesnovar
Copy link
Member

We have that in cmdstan/stan and math. We are just having issues with Windows+Jenkins setup.

@ahartikainen
Copy link
Contributor

If this is a Windows thing, you really should use the system-level (short) path

See windows_short_path in https://github.com/stan-dev/cmdstanpy/blob/master/cmdstanpy/utils.py

@seantalts
Copy link
Member

I guess my question is - what are the remaining issues? Because I think most of the Jenkins stuff should be purely reliant on CmdStan, so if CmdStan is fixed for e.g. paths with spaces in them, I'm not sure what else would need to change.

@wds15
Copy link
Contributor Author

wds15 commented Oct 8, 2019

The space thing is fixed for me. So the branch of PR-1376 has makefiles which can deal with

/some-path/cmdstan some space/

with such a weird path you get a working TBB binary. There is more trouble which I believe has to do with PATH stuff.

I don't know about this short path thing yet.

@rok-cesnovar rok-cesnovar deleted the feature/intel-tbb-lib branch December 1, 2019 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants