Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Name all parallel kernels and regions #124

Closed
crtrott opened this issue Nov 28, 2017 · 3 comments
Closed

Name all parallel kernels and regions #124

crtrott opened this issue Nov 28, 2017 · 3 comments

Comments

@crtrott
Copy link
Member

crtrott commented Nov 28, 2017

I think we should add labels to all our parallel calls and also add regions for top level calls.
Consider our unit tests, right now simple kernel timer will print this:

Regions: 

                                                               KokkosBlas::Test::gemm_complex_double (Region)          12.53954            1        12.53954  53.671  15.326
                                                                       KokkosBlas::Test::gemm_double (Region)           2.87859            1         2.87859  12.321   3.518
                                                                               KokkosBlas::gemm[ETI] (Region)           1.83371           64         0.02865   7.849   2.241

-------------------------------------------------------------------------
Kernels: 

                                                                       KokkosBlas::Test::VanillaGEMM (ParFor)          13.31410           80         0.16643  56.986  16.273
                                                                        Kokkos::View::initialization (ParFor)           2.19648        37758         0.00006   9.401   2.685
                                                                       KokkosBlas::gemv[SingleLevel] (ParFor)           0.70460           48         0.01468   3.016   0.861
                                                                                KokkosBlas::gemm[NN] (ParFor)           0.45747           16         0.02859   1.958   0.559
                                                                                KokkosBlas::gemm[NC] (ParFor)           0.38384            8         0.04798   1.643   0.469
                                                                                KokkosBlas::gemm[CN] (ParFor)           0.38346            8         0.04793   1.641   0.469
                                                                                KokkosBlas::gemm[CC] (ParFor)           0.37958            8         0.04745   1.625   0.464
N12KokkosSparse4Impl11GaussSeidelIN13KokkosKernels12Experimental19KokkosKernelsHandleIKiS5_KN6Kokkos7complexIdEENS6_6OpenMPENS6_9HostSpaceESB_EENS6_4ViewIPS5_JNS6_10LayoutLeftENS6_6DeviceISA_SB_EENS6_12MemoryTraitsILj1EEEEEESK_NSD_IPS9_JSF_SH_SJ_EEEE4PSGSE (ParFor)           0.33012        29000         0.00001   1.413   0.403
N12KokkosSparse4Impl11GaussSeidelIN13KokkosKernels12Experimental19KokkosKernelsHandleIKmKiKN6Kokkos7complexIdEENS7_6OpenMPENS7_9HostSpaceESC_EENS7_4ViewIPS5_JNS7_10LayoutLeftENS7_6DeviceISB_SC_EENS7_12MemoryTraitsILj1EEEEEENSE_IPS6_JSG_SI_SK_EEENSE_IPSA_JSG_SI_SK_EEEE4PSGSE (ParFor)           0.32352        28800         0.00001   1.385   0.395
N12KokkosSparse4Impl11GaussSeidelIN13KokkosKernels12Experimental19KokkosKernelsHandleIKiKlKN6Kokkos7complexIdEENS7_6OpenMPENS7_9HostSpaceESC_EENS7_4ViewIPS5_JNS7_10LayoutLeftENS7_6DeviceISB_SC_EENS7_12MemoryTraitsILj1EEEEEENSE_IPS6_JSG_SI_SK_EEENSE_IPSA_JSG_SI_SK_EEEE4PSGSE (ParFor)           0.32184        28800         0.00001   1.378   0.393

Then another 820 lines of mangled CType names and then this:

                                                              17ArithTraitsTesterIdN6Kokkos6OpenMPEE (ParRed)           0.00001            1         0.00001   0.000   0.000
                                               17ArithTraitsTesterIN6Kokkos7complexIdEENS0_6OpenMPEE (ParRed)           0.00001            1         0.00001   0.000   0.000
                                                              17ArithTraitsTesterIxN6Kokkos6OpenMPEE (ParRed)           0.00001            1         0.00001   0.000   0.000
                                                              17ArithTraitsTesterIyN6Kokkos6OpenMPEE (ParRed)           0.00001            1         0.00001   0.000   0.000
                                                              17ArithTraitsTesterIfN6Kokkos6OpenMPEE (ParRed)           0.00001            1         0.00001   0.000   0.000
                                               17ArithTraitsTesterIN6Kokkos7complexIfEENS0_6OpenMPEE (ParRed)           0.00001            1         0.00001   0.000   0.000

-------------------------------------------------------------------------
Summary:

Total Execution Time (incl. Kokkos + Non-Kokkos:                   81.81700 seconds
Total Time in Kokkos kernels:                                      23.36379 seconds
   -> Time outside Kokkos kernels:                                 58.45320 seconds
   -> Percentage in Kokkos kernels:                                   28.56 %
Total Calls to Kokkos Kernels:                                       298588

-------------------------------------------------------------------------

What I propose is the following:
Mark Kernels with something like

Namespace::TopLevelFunction::Subfunction[Options]

For example (there is no subfunction for gemm):

KokkosBlas::gemm[NN]

Also we should mark regions and those should add the info whether its a TPL call, ETI call or noETI call:

KokkosBlas::gemm[TPL_BLAS]
KokkosBlas::gemm[ETI]
KokkosBlas::gemm[noETI]

Any thoughts?

@ndellingwood
Copy link
Contributor

Cross-reference #239

@crtrott crtrott removed their assignment Dec 4, 2018
@kyungjoo-kim
Copy link
Contributor

See PR #359

@ndellingwood
Copy link
Contributor

PR #359 merged with added labels.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants