-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding ETI for DeepCopy / ViewFill etc. #1578
Comments
Thanks @crtrott ! If I can help please let me know; compile time is a big deal for all of us. |
OK the following is my test code: #include<Kokkos_Core.hpp>
int main(int argc, char* argv[]) {
Kokkos::initialize(argc,argv);
{
int N = (argc>1) ? atoi(argv[1]) : 1000;
int M = (argc>2) ? atoi(argv[2]) : 1000;
int R = (argc>3) ? atoi(argv[3]) : 10;
Kokkos::View<double****,Kokkos::LayoutLeft> al("Al",N,M,R,10);
Kokkos::View<double****,Kokkos::LayoutRight> ar("Ar",N,M,R,10);
Kokkos::View<double***[10],Kokkos::LayoutLeft> al1("Al1",N,M,R);
Kokkos::View<double***[10],Kokkos::LayoutRight> ar1("Ar1",N,M,R);
Kokkos::View<double**[10][10],Kokkos::LayoutLeft> al2("Al2",N,M);
Kokkos::View<double**[10][10],Kokkos::LayoutRight> ar2("Ar2",N,M);
Kokkos::deep_copy(al,1.0);
Kokkos::deep_copy(ar,2.0);
Kokkos::deep_copy(al1,3.0);
Kokkos::deep_copy(al2,4.0);
Kokkos::deep_copy(ar1,5.0);
Kokkos::deep_copy(ar2,6.0);
Kokkos::deep_copy(al,al1);
Kokkos::deep_copy(al,al2);
Kokkos::deep_copy(ar,ar1);
Kokkos::deep_copy(ar,ar2);
Kokkos::deep_copy(al,ar1);
Kokkos::deep_copy(al,ar2);
Kokkos::deep_copy(ar,al1);
Kokkos::deep_copy(ar,al2);
}
Kokkos::finalize();
} This is my compile line:
This is where I am right now in terms of Compile Times:
I will post performance later. I am also looking into whether I can decrease the rest of that time back to get closer to the 7.0. |
OK made some more progress. The following is file sizes in MB for this experiment if I instantiate all the way to Rank 8, for
Now the question is how much of this do we want to ETI. I do some more experiments with up to which rank we should go. In particular how much difference it makes to only instantiate up to rank 5. The other question is should ETI be on by default or off? |
Pushed branch "issue-1578" which has most of the stuff in it for OpenMP. Note: I didn't yet commit the cpp files, since we first need to decide how much to make available via ETI. |
@crtrott yikes 400 MB :( How bad is it when you do up to rank 4 or 5? |
One thing is that the linker will throw out most of the stuff (you see that the executable in my case goes back to 6.6MB, vs 3.6 with no ETI). But there are some options to reduce that:
|
Does this have debug turned on (when you get to 400MB)? |
Yeah this is with -g |
Ok that’s good. Try it without, that will give you a better idea of the code size. |
Disabling float as Scalar, int as IndexType and LayoutStride as primary Layout gets me down to 109MB for the lib and 4.2 MB for the executable (with deprecated code off). |
Here is the comparison for the reduced set with and without "-g":
|
One thing I am considering is that an app could provide its own ETI list. I.e. we give you the scripts to generate the headers and cpp files you need which take as input the Scalar types and Layouts, and then you can bent around the ETI directory to be used. |
@crtrott an ETI list sounds like a good idea. Edit: nevermind, index type is different from scalar type. Always using |
So one option here could be to use GCC diagnostics to drop the default debug generation down. We probably only need call info here. |
@crtrott We SPARC folks like the "app provides its own ETI list" option, at least if the simplest set of ETI enables doesn't help build times. If it's easy for Kokkos to distill a minimal set of ETI enables, then it would be nicer for Kokkos to handle it. |
With the recent changes which improve runtime for those functions significantly for higher rank views, we unfortunately increased compile times significantly for some applications which make haevy use of those functions. On the plus side if you made haevy use of say rank 4 or rank5 views these functions sped up by up orders of magnitude. (See: #1270)
To mitigate compile times we need to introduce ETI, since in most cases we can not choose at compile time to simply drop to a simple binary operation due to padding and the way subviews work.
The text was updated successfully, but these errors were encountered: