Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ETI System and file structure #31

Closed
crtrott opened this issue May 21, 2017 · 17 comments
Closed

ETI System and file structure #31

crtrott opened this issue May 21, 2017 · 17 comments

Comments

@crtrott
Copy link
Member

crtrott commented May 21, 2017

Ok I think I am finally close to make this ETI stuff work properly. There is some funky compiler stuff with regards to using extern template instantiations for classes, in particular if you want to allow instantiations of other types but I believe my solution is now fool proof ......

Furthermore I believe the file structure and naming etc needs some cleanup. In particular this focus on MultiVector which historically comes from Tpetra is confusing for standalone users.

Lets start with some requirements what we need to be able to do::

  • pre-compile functions, and prevent them from being implicitly instantiated (ETI)
  • Even with ETI on, allow other input types (say for example extended precision, or nonstandard data layouts)
  • call TPLs (MKL, CUBLAS etc.) for input types which allow it
  • disallow anything other than ETI types if requested
  • check what type of instantiation gets hit in apps (ETI, Non-ETI, TPL)

In order to do all this we came up with a design which has 3 functionality layers (I will go into details later):

  1. User Interface: void foo(ViewType a, Scalar alpha): takes views accepts all kinds of combinations; calls the specialization layer
  2. Specialization Layer: struct Foo { static void foo(ViewInternalType a, Scalar alpha); }; makes sure that only the minimally necessary number of instantiations exists, serves as ETI specialization layer, serves as TPL specialization layer
  3. Implementation Layer: This is called by the specialization layer, and has the actual functors etc.

Now I want to go through a couple of design aspects in the next posts.

@crtrott
Copy link
Member Author

crtrott commented May 22, 2017

I thought now long and hard about a sane, sustainable way for organizing this into files. Here is what I came up with:

src:
  KokkosBlas.hpp: includes all the KokkosBlas function header files
      KokkosBlas1_foo.hpp (contains user interface functions for foo)
src/impl:
  KokkosBlas1_foo_impl.hpp: The actual implementation of the functions (Functors etc.)
  KokkosBlas1_foo_spec.hpp: The specialization layer
src/impl/tpl
  KokkosBlas1_foo_tpl_spec_avail.hpp: Availability of TPLs for particular types
  KokkosBlas1_foo_tpl_spec_decl.hpp: The Specialization declaration for using tuples
src/impl/generated_specializations_hpp
  KokkosBlas1_foo_eti_spec_avail.hpp: Availability declarations for ETI types
  KokkosBlas1_foo_eti_spec_decl.hpp: Specialization declarations for ETI types
src/impl/generated_specializations_cpp/foo
  KokkosBlas1_foo_eti_spec_inst_double_LayoutRight_Cuda_CudaSpace.cpp: one instantiation for an extern template

Lets talk about what you need to touch to do specific things:

Add a new function:

  • Add all those files based on the template provided later
  • Modify the scripts which generate the auto generated files

Modify the implementation of a function

  • Only src/impl/KokkosBlas1_foo_impl.hpp needs to be modified

Add a new ETI type

  • modify the scripts which generate the auto generated files

Add a new TPL variant

  • Modify the files in impl/tpl/ to add the new TPL (declare its availability, and provide the implementation of how to call it)

@crtrott
Copy link
Member Author

crtrott commented May 22, 2017

Lets look at the code and what those things do.

Public API in src/KokkosBlas1_foo.hpp

This file provides the public API for the function foo. The function internally calls the specialization layer after explicitly filling in all the necessary template arguments for the ViewTypes etc. For example for a dot(a,b) product, const modifiers should be added to the scalar type, if they are not already there. Otherwise this would require to compile the code potentially 4 times:

  • dot(View<double*>, View<double*>);
  • dot(View<double*>, View<const double*>);
  • dot(View<const double*>, View<double*>);
  • dot(View<const double*>, View<const double*>);
    If you then factor in explicit vs implicit specification of Layout, Memory Space, and MemoryTraits we end up with over 100 possible instantiations for something which is technically the exact same thing!

Furthermore this function should also do static asserts on things which are not allowed (for example wrong Rank of the view) in order to give users an early exit in a function which they can directly associate with the code they written.

Here is an example for:

// Include the specialziation layer which define the Impl::Foo struct
#include<impl/KokkosBlas1_foo_spec.hpp>

namespace KokkosBlas1 {
// User facing function accepts any ViewType
template<class ViewType>
void foo(const ViewType& a) {

  // Static assert on prohibited types
  static_assert(ViewType::rank==1, "Trying to call foo with View of rank other than 1");

  // Convert ViewType to internal ViewType to reduce instantiations
  // Without this wether you explicitly specify a Layout or not would be 
  // two different instantiations since Views have variadic template parameters
  // Furthermore this is the place to add missing const etc.
  typedef Kokkos::View<typename ViewType::data_type,
                       typename ViewType::array_layout,
                       typename ViewType::device_type>
          ViewTypeInternal;

  // Call the actual implementation
  Impl::Foo<ViewTypeInternal>::foo(a);
}
}

@crtrott
Copy link
Member Author

crtrott commented May 22, 2017

Next up:

The Specialization Layer

This layer is the one which not only serves as the focal point for the unified instantiation of the things the public layer requires, it is also the layer which allows for specialization for third party libraries (such as MKL and CUBLAS) and explicit template instantiation (ETI).

Generally this layer is very thin again and basically just passes through arguments.

The basic mechanism for ETI is the extern template mechanism of C++11. Unfortunately that thing has some funky semantics with respect to classes. In particular it looks like the compile can still choose to inline the implementation of the class, if it is visible in the same compilation unit instead of calling the externally available instantiation. This might also be compiler dependent.

To enable both TPL specialization and ETI specialization additional bool template parameters are added to the specialization layer which are defaulted to values based on whether said specializations are available:

From impl/KokkosBlas1_foo_spec.hpp:

template<class ViewType>
struct foo_eti_spec_avail {
  enum : bool { value = false };
};

template<class ViewType, bool tpl_spec_avail = foo_tpl_spec_avail<ViewType>::value,
                         bool eti_spec_avail = foo_eti_spec_avail<ViewType>::value>
struct Foo {
  static void foo(const ViewType& a);
};

In order to declare a specialization available a full specialization of foo_tpl_spec_avail or foo_eti_spec_avail must be made available. Those functions live in impl/tpls/KokkosBlas1_foo_tpl_spec_avail.hpp and impl/generated_specializations_hpp/KokkosBlas1_foo_eti_spec_avail.hpp respectively with the latter auto generated. We come back to those files in a bit.

The next part in the specialization layer is the definition of the specialization layer for when no TPL is used. This calls the actual implementation provided in impl/KokkosBlas1_foo_impl.hpp
Note that the TPL bool is set to false, while the other one is set to KOKKOSKERNELS_IMPL_COMPILE_LIBRARY. The latter one is only going to be true while compiling the KokkosKernels library with its explicit template instantiations.

template<class ViewType>
struct Foo<ViewType,false,KOKKOSKERNELS_IMPL_COMPILE_LIBRARY> {
  static void foo(const ViewType& a) {
    execute_foo(a);
  }
};

In this file we also need to define the macros which are later used in the auto generated files:

// Availability Macro
#define KOKKOSBLAS1_IMPL_FOO_ETI_SPEC_AVAIL( SCALAR, LAYOUT, EXECSPACE, MEMSPACE ) \
template<> \
struct foo_eti_spec_avail<Kokkos::View<SCALAR*,LAYOUT,Kokkos::Device<EXECSPACE,MEMSPACE> > > { \
  enum : bool { value = true }; \
}; 

// Declaration Macro
#define KOKKOSBLAS1_IMPL_FOO_ETI_SPEC_DECL( SCALAR, LAYOUT, EXECSPACE, MEMSPACE ) \
extern template struct Foo<Kokkos::View<SCALAR*,LAYOUT,Kokkos::Device<EXECSPACE,MEMSPACE>>,false,true>;

// Instantiation Macro
#define KOKKOSBLAS1_IMPL_FOO_ETI_SPEC_INST( SCALAR, LAYOUT, EXECSPACE, MEMSPACE ) \
template struct Foo<Kokkos::View<SCALAR*,LAYOUT,Kokkos::Device<EXECSPACE,MEMSPACE>>,false,true>;

// Include the actual declarations for tpls and eti
#if !KOKKOSKERNELS_IMPL_COMPILE_LIBRARY
#include<impl/tpls/foo_tpl_spec_decl.hpp>
#include<impl/generated_specializations_hpp/foo_eti_spec_decl.hpp>
#endif

Note how the actual declarations of those classes are only included when we are NOT compiling the library.

I'll post the whole file later after discussing some more Macro stuff.

@crtrott
Copy link
Member Author

crtrott commented May 22, 2017

The implementation layer in impl/KokkosBlas1_foo_impl.hpp is pretty much whatever we need it to be. In this case its just a simple function:

  template<class ViewType>
  void execute_foo(const ViewType& a) {
    Kokkos::parallel_for("KokkosBlas1::foo",a.extent(0), KOKKOS_LAMBDA (const int& i) {
      a(i) = i;
    });
  }

If we want to distinguish between multi vector and normal vector where to put the stuff the implementation layer may be one of the places.

@crtrott
Copy link
Member Author

crtrott commented May 22, 2017

The TPL layer consists of two files: the one which declares the availability of a specialization and the one which provides the specialization. The first one is impl/tpls/KokkosBlas1_foo_tpl_spec_avail.hpp:

template<class ViewType>
struct foo_tpl_spec_avail {
  enum : bool { value = false };
};

#ifdef KOKKOSKERNELS_ENABLE_MKL
template<>
struct foo_tpl_spec_avail<Kokkos::View<double*,Kokkos::LayoutRight,Kokkos::Device<Kokkos::Serial,Kokkos::HostSpace>>> {
  enum : bool { value = true };
};
#endif

Basically for every new TPL which we want to support we drop another full specialization of this stuff in.

The implementation is the counter part to it. Note that we can use the implementation to decide based on input parameters whether to call our own code or the tpl code. We also need to have two full specializations here based on whether ETI for the same type combination would be available or not.

#ifdef KOKKOSKERNELS_ENABLE_MKL
#include<mkl_foo.hpp>
namespace KokkosBlas1 {
namespace Impl {

// Only a TPL specialization is available
template<>
struct Foo<Kokkos::View<double*,Kokkos::LayoutRight,Kokkos::Device<Kokkos::Serial,Kokkos::HostSpace>>,true,false> {
  typedef Kokkos::View<double*,Kokkos::LayoutRight,Kokkos::Device<Kokkos::Serial,Kokkos::HostSpace>> ViewType;

  static void foo(const ViewType& a) {
    #if (KOKKOSKERNELS_ENABLE_CHECK_SPECIALIZATION)
    printf("Calling MKL Specialization\n");
    #endif
    mkl_foo(a.data(),a.extent(0));
  }
};

// Both a TPL specialization and an ETI instantiation are available
template<>
struct Foo<Kokkos::View<double*,Kokkos::LayoutRight,Kokkos::Device<Kokkos::Serial,Kokkos::HostSpace>>,true,true> {
  typedef Kokkos::View<double*,Kokkos::LayoutRight,Kokkos::Device<Kokkos::Serial,Kokkos::HostSpace>> ViewType;

  static void foo(const ViewType& a) {
    // Our code is better for large number of entries, so only use TPL for small lengths
    if(a.extent(0) < 100000)
      Foo<ViewType,true,false>::foo(a);
    else
      Foo<ViewType,false,true>::foo(a);
  }
};
}
}
#endif

@crtrott
Copy link
Member Author

crtrott commented May 22, 2017

Last but not least there are three auto generated files which are kind of like the TPL files: declare a ETI specialization available, provide the extern template declaration of those ETI specializations, and instantiate them in cpp files. Those simply use the previously defined macros with the right type combinations.

There is one more detail using two additional macros:

  • KOKKOSKERNELS_ENABLE_ETI_ONLY: is used to prevent instantiations of Non-ETI or Non-TPL types. This is used to hide the actual definition of the specialization layer when not compiling the library cpp files.
  • KOKKOSKERNELS_ENABLE_CHECK_SPECIALIZATION: this is more of a debug option which enables print statements stating which specialization (ETI, Non-ETI, TPL) was called. This is useful to make sure we don't instantiate stuff in cases where we can't turn on full ETI_ONLY.

Also one more word to KOKKOSKERNELS_IMPL_COMPILE_LIBRARY. This macro is always defined as false, except inside the auto generated ETI cpp files.

@crtrott
Copy link
Member Author

crtrott commented May 22, 2017

I will check in the actual full example code soon.

@crtrott
Copy link
Member Author

crtrott commented May 22, 2017

Some more thoughts: while this is a lot of different files, we are trying to serve a pretty complex use-case scenario. Most of this stuff is pretty boiler plate and doesn't really use much advanced C++ stuff. It basically comes down to a bunch of full specializations. The particular nice thing this scheme does for us is that it decouples the actual implementation from, providing specializations for TPLs, from providing ETI specializations. All three things can be modified independently. Furthermore this scheme clearly separates which files are responsible for which part of the hierarchy.

@crtrott
Copy link
Member Author

crtrott commented May 22, 2017

@mhoemmen @dsunder @hcedwar @srajama1
Most folks on KokkosKernels are not that much interested in software engineering as long as what they have to work with works. But maybe you guys wanna take a look and tell me what you think (and also if the explanation makes sense why this is the design I came up with).

@hcedwar
Copy link
Contributor

hcedwar commented May 22, 2017

You can static_assert( is_view<T>::value , ... as well

Thought (tbd): Should we have something in Kokkos core to canonicalize a View?

template< class ViewType >
using canonical_view_of_const = 
  View< typename ViewType::const_data_type 
          , typename ViewType::layout 
          , typename ViewType::device_type 
          , typename ViewType::memory_traits > ;

The foo_eti_spec_avail and foo_tpl_spec_avail is an unfortunate need and, at first glance, a good minimalist approach.

@mhoemmen
Copy link
Contributor

@crtrott I like @hcedwar 's idea of adding some "canonicalize the View" type functions.

I think the design makes sense, especially its ETI / TPL aspects. In particular, I think it's enough for us to specialize on whether some TPL is available. Very few users in practice want to swap different TPLs in and out at compile or run time. (They just want to know what's the fastest TPL to use on each platform.) I don't think it's worth complicating the design for this use case, which may only be of interest to the occasional computer science publication. We're a national lab; that should be at best a tertiary interest for us.

This design is good for "node-global" kernels. What about single-team or single-thread kernels? Are we worried about potential inlining overhead at those lower levels?

Also, what about asynchronous dispatch? This is relevant to design of the implementation layer's interface, because Views may need to stay managed as they enter the implementation layer.

@crtrott
Copy link
Member Author

crtrott commented May 22, 2017

Regarding asynchronous dispatch: the internal view types are is function specific. So for asynchronous ones the internal views must be managed.

@mndevec
Copy link
Contributor

mndevec commented Aug 14, 2017

By the way, it might be better to move this issue and #28 to Wiki.

@mhoemmen
Copy link
Contributor

mhoemmen commented Aug 15, 2017

@mndevec I would say, @crtrott finished implementing the first-pass (more accurately, second-pass, or third-pass if you count Chris Baker's Tpetra kernels) design. Thus, it is my view that it would be proper to close this issue. We can always open new issues for new things to do.

@mndevec
Copy link
Contributor

mndevec commented Aug 15, 2017

I mean, this issue was a nice guideline for me. It would be nice to save it in wiki of Kokkoskernels so that it can be easily found, rather than searching it in the issue history.

@mhoemmen
Copy link
Contributor

@mndevec wrote:

It would be nice to save it in wiki of Kokkoskernels so that it can be easily found, rather than searching it in the issue history.

That's a good idea. I think it would be best, then, to close this issue, but copy its contents into the wiki. How about that?

@mndevec
Copy link
Contributor

mndevec commented Aug 17, 2017

Okay, I moved this topic to here:
https://github.com/kokkos/kokkos-kernels/wiki/ETI-System-and-file-structure

@mndevec mndevec closed this as completed Aug 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants