Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split and harmonize Object Files of Core UnitTests to increase build parallelism #484

Closed
crtrott opened this issue Oct 19, 2016 · 17 comments
Assignees
Labels
Enhancement Improve existing capability; will potentially require voting
Milestone

Comments

@crtrott
Copy link
Member

crtrott commented Oct 19, 2016

This was done for OpenMP and Cuda but needs to be done for Pthreads and Serial as well.

@crtrott crtrott added the Enhancement Improve existing capability; will potentially require voting label Oct 19, 2016
@crtrott crtrott added this to the Fall 2016 milestone Oct 19, 2016
@crtrott
Copy link
Member Author

crtrott commented Oct 19, 2016

Also split DefaultUnitTest.

@crtrott
Copy link
Member Author

crtrott commented Oct 21, 2016

Here is some timing info. Building core unit tests with just the serial backend enabled on my workstation takes 164s (Parallel build). It takes 161s to just build the TestSerial.o.
Splitting the TestSerial.cpp multiple ways according to test category reveals the following:

Subview 111s
ViewAPI 17s (contains Layout)
Reductions 11s
Atomics 8s
Team 6.5s
Other 8.5s

So it looks like our now much more comprehensive subview testing is the main culprit. I will split that further.

@crtrott
Copy link
Member Author

crtrott commented Oct 21, 2016

After breaking the Subview.cpp another 11ways I got total build time for core unittests with Serial backend down to 24.7s which is a good 6.5x improvement.

@crtrott
Copy link
Member Author

crtrott commented Oct 21, 2016

Splitting yet another bit further (also splitting ViewAPI and DefaultDeviceType_a) gets me down to 16.7s. Now no object file on its own takes more than 14s. Gonna split the other execution spaces now the same way.

@crtrott
Copy link
Member Author

crtrott commented Oct 21, 2016

Compile time for core/unit_tests when enabling the Threads backend is down to about 18s from 166s.

@crtrott crtrott self-assigned this Oct 22, 2016
@crtrott crtrott changed the title Split Threads and Serial Core UnitTest in multiple Object files Split and harmonize Object Files of Core UnitTests to increase build parallelism Oct 22, 2016
@crtrott
Copy link
Member Author

crtrott commented Oct 22, 2016

I changed the issue name to reflect the increased scope of the issue. In particular Cuda and OpenMP are not split enough yet.

@crtrott
Copy link
Member Author

crtrott commented Oct 22, 2016

OpenMP is now at 19s.

@nmhamster
Copy link
Contributor

@crtrott what machine and what compilers?

@crtrott
Copy link
Member Author

crtrott commented Oct 22, 2016

This is all on my workstation using gcc 5.3. Also this is the core/unit_test directory only right now.

@crtrott
Copy link
Member Author

crtrott commented Oct 22, 2016

The biggest time hogs are combinatorical tests for subviews and reducers tests which are implemented via recursive template tests to cover the thousands of possibilities in arguments. Those I got now in object files all of their own.

@crtrott
Copy link
Member Author

crtrott commented Oct 22, 2016

Got Cuda 8 build from 244s to 45s.

@crtrott
Copy link
Member Author

crtrott commented Oct 22, 2016

Times for make build-test ; make test for everything (i.e. in the directory where one issued KOKKOS_PATH/generate_makefile.bash --with-***):

Serial 1:54 1:22
Threads 2:17 1:30
OpenMP 2:31 1:26
Cuda 7:20 3:52

Everything done with GCC 5.3.0 and Cuda 8.0.44

crtrott added a commit that referenced this issue Oct 22, 2016
This splits the serial backend files and defaultdevice type.
The goal is to have no object file take longer than 15s with gcc.

Addresses issue #484
crtrott added a commit that referenced this issue Oct 22, 2016
This splits the threads backend files.
The goal is to have no object file take longer than 15s with gcc.

Addresses issue #484
crtrott added a commit that referenced this issue Oct 22, 2016
This splits the OpenMP backend files.
The goal is to have no object file take longer than 15s with gcc.

Addresses issue #484
crtrott added a commit that referenced this issue Oct 22, 2016
This splits the Cuda backend files.
The goal is to have no object file take longer than 15s with gcc.

Addresses issue #484
crtrott added a commit that referenced this issue Oct 22, 2016
Add missing tests and split defaultdevicetype further

Related to #484
@crtrott
Copy link
Member Author

crtrott commented Oct 22, 2016

The next step is to actually on the fly install a Kokkos library in the build directory, and compile all the examples against that instead of rebuilding the library for each subdirectory.

@crtrott
Copy link
Member Author

crtrott commented Oct 23, 2016

I am also doing the "build the examples against an installed library" thing (the library gets installed into a lib directory inside of the directory where kokkos/generate_makefile.bash was called). See: #498

@nmhamster
Copy link
Contributor

@crtrott are you sure we shouldn't just move this over to something like shudder CMake, in the end I think having a professional build system makes more sense in environments like cross compile etc.

@crtrott
Copy link
Member Author

crtrott commented Oct 23, 2016

I don't know. So far my pain maintaining and improving our GNU make build system is way less than the pain I experience regularly with every other "professional" build system I have to use ;-).

@crtrott
Copy link
Member Author

crtrott commented Oct 23, 2016

Also this issue here is unrelated to the build system, only #498 has something to do with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Improve existing capability; will potentially require voting
Projects
None yet
Development

No branches or pull requests

2 participants