-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split and harmonize Object Files of Core UnitTests to increase build parallelism #484
Comments
Also split DefaultUnitTest. |
Here is some timing info. Building core unit tests with just the serial backend enabled on my workstation takes 164s (Parallel build). It takes 161s to just build the TestSerial.o. Subview 111s So it looks like our now much more comprehensive subview testing is the main culprit. I will split that further. |
After breaking the Subview.cpp another 11ways I got total build time for core unittests with Serial backend down to 24.7s which is a good 6.5x improvement. |
Splitting yet another bit further (also splitting ViewAPI and DefaultDeviceType_a) gets me down to 16.7s. Now no object file on its own takes more than 14s. Gonna split the other execution spaces now the same way. |
Compile time for core/unit_tests when enabling the Threads backend is down to about 18s from 166s. |
I changed the issue name to reflect the increased scope of the issue. In particular Cuda and OpenMP are not split enough yet. |
OpenMP is now at 19s. |
@crtrott what machine and what compilers? |
This is all on my workstation using gcc 5.3. Also this is the core/unit_test directory only right now. |
The biggest time hogs are combinatorical tests for subviews and reducers tests which are implemented via recursive template tests to cover the thousands of possibilities in arguments. Those I got now in object files all of their own. |
Got Cuda 8 build from 244s to 45s. |
Times for make build-test ; make test for everything (i.e. in the directory where one issued KOKKOS_PATH/generate_makefile.bash --with-***): Serial 1:54 1:22 Everything done with GCC 5.3.0 and Cuda 8.0.44 |
This splits the serial backend files and defaultdevice type. The goal is to have no object file take longer than 15s with gcc. Addresses issue #484
This splits the threads backend files. The goal is to have no object file take longer than 15s with gcc. Addresses issue #484
This splits the OpenMP backend files. The goal is to have no object file take longer than 15s with gcc. Addresses issue #484
This splits the Cuda backend files. The goal is to have no object file take longer than 15s with gcc. Addresses issue #484
Add missing tests and split defaultdevicetype further Related to #484
The next step is to actually on the fly install a Kokkos library in the build directory, and compile all the examples against that instead of rebuilding the library for each subdirectory. |
I am also doing the "build the examples against an installed library" thing (the library gets installed into a lib directory inside of the directory where kokkos/generate_makefile.bash was called). See: #498 |
@crtrott are you sure we shouldn't just move this over to something like shudder CMake, in the end I think having a professional build system makes more sense in environments like cross compile etc. |
I don't know. So far my pain maintaining and improving our GNU make build system is way less than the pain I experience regularly with every other "professional" build system I have to use ;-). |
Also this issue here is unrelated to the build system, only #498 has something to do with that. |
This was done for OpenMP and Cuda but needs to be done for Pthreads and Serial as well.
The text was updated successfully, but these errors were encountered: