subview construction on Cuda backend #615

dholladay00 · 2017-01-22T01:43:49Z

I am having an issue with subviews on the Cuda backend. I was told by @crtrott to use this View<> a(b, i, ALL) rather than View<> a(subview(b, i, ALL)). I don't recall actual difference is but the preferred method results in a compile time error (on up to date master and develop branches) and the other method does not.

kokkos/core/src/impl/Kokkos_ViewMapping.hpp(3063): error: no instance of overloaded function "Kokkos::Experimental::Impl::ViewDataHandle<Traits, std::enable_if<<expression>, void>::type>::assign [with Traits=Kokkos::ViewTraits<const double *, Kokkos::LayoutStride, RA>]" matches the argument list
            argument types are: (double *const, size_t)
          detected during:
            instantiation of "void Kokkos::Experimental::Impl::ViewMapping<std::enable_if<<expression>, void>::type, SrcTraits, Args...>::assign(Kokkos::Experimental::Impl::ViewMapping<DstTraits, void> &, const Kokkos::Experimental::Impl::ViewMapping<SrcTraits, void> &, Args...) [with SrcTraits=Kokkos::ViewTraits<double **, Kokkos::LayoutStride>, Args=<int, Kokkos::Impl::ALL_t>, DstTraits=Kokkos::ViewTraits<const double *, Kokkos::LayoutStride, RA>]" 
/home/dholladay00/kokkos_tutorial/kokkos/core/src/Kokkos_View.hpp(1260): here
            instantiation of "Kokkos::View<DataType, Properties...>::View(const Kokkos::View<RT, RP...> &, const Arg0 &, Args...) [with DataType=const double *, Properties=<Kokkos::LayoutStride, RA>, RT=double **, RP=<Kokkos::LayoutStride>, Arg0=int, Args=<Kokkos::Impl::ALL_t>]" 
main.cpp(32): here

However, using the method that compiles results in a runtime error (both branches): :0: : block: [0,0,0], thread: [0,0,0] Assertion `Cannot create Cuda texture object from within a Cuda kernel` failed. Code below should illustrate what I am describing.

#include <Kokkos_Core.hpp>
#include <cstdio>

using Kokkos::parallel_for;
using Kokkos::ALL;
using Kokkos::MemoryTraits;
using Kokkos::RandomAccess;
using Kokkos::subview;
using Kokkos::ALL;

using Kokkos::LayoutStride;

typedef MemoryTraits< RandomAccess > RA;

template <class D, class ... P>
using View = Kokkos::View<D, P ... >;

int main (int argc, char* argv[]) {

  Kokkos::initialize(argc, argv);

  const int N = 10;
  
  // stores 2D mutable data
  View<double**, LayoutStride> A("A", N, N);

  parallel_for(N, KOKKOS_LAMBDA (const int i)
  {
    // get a 1D slice of the view and enter a region in which data is const
    //View<const double*, LayoutStride, RA> v(subview(A, i, ALL())); // compiles, runtime error
    View<const double*, LayoutStride, RA> v(A, i, ALL()); // compile time error
  });

  Kokkos::finalize();

  return 0;
}

The text was updated successfully, but these errors were encountered:

crtrott · 2017-01-23T16:56:04Z

Can you try already creating a RA view outside of the kernel, and then get the subview?
Basically you are getting caught up in the Texture object creation which has to happen on the host ...
Need to think about how to better handle this.

dholladay00 · 2017-01-23T19:02:21Z

This worked (and I was able to remove the subview(…) in the ctor. However, this does not behave as I would like. It's probably my lack of understanding of how texture objects work.

  View<double**, LayoutStride> A("A", N, N);
  View<const double**, LayoutStride, RA> B(A);

  parallel_for(N, KOKKOS_LAMBDA (const int i)
  {
    A(i,0) = 5.0;
    View<const double*, LayoutStride, RA> v(B, i, ALL()); // works
    printf("v(0): %g \n", v(0));
  });

This results in:

v(0): 0 
v(0): 0 
v(0): 0 
v(0): 0 
v(0): 0 
v(0): 0 
v(0): 0 
v(0): 0 
v(0): 0 
v(0): 0

It seems that v is not aliasing the data in A.

One of my use cases is that I have a team_scratch view that is populated with values and then is not changed after some point. Perhaps I need to use different memory traits for these. Is that your take as well?

crtrott · 2017-01-23T19:11:29Z

The issue here is that you are lying and thus get what you deserve ;-)
Jokes aside: v (and B) are "const" and thus within the scope of kernel the underlying data is not allowed to change. That has consequences for things like operation reordering. For example the compiler is perfectly in its right to first create the subview, then print, and then do the assignment to A. Because the assignment to A is not supposed to have any effect on B or v ...

In this case the actual underlying issue is that texture fetches are non-coherent within a kernel. That means updates to the underlying data may or may not be seen depending on the state of the cash, data flushes etc.

If you change the data of A in a separate kernel then you should see the changes.

dholladay00 · 2017-01-23T19:34:27Z

So that example is a little over simplistic, but it did reproduce the compile errors. Also, replacing v(B, i, ALL) with v(A, i, ALL) in the above example won't compile.

This is a little more representative (pseudo-code):

parallel_for (policy, …
{
  View<double **> A(team_scratch);
  func_to_populate_A(A);

  parallel_for(TeamThreadRange … (const int& i)
  {
    View<const double*, RA> v(A, i, ALL);
    parallel_for(ThreadVectorRange … (const int& j)
    {
      // do something with v
    });
  };)

});

crtrott · 2017-01-23T19:50:05Z

You need to get a View<const double**,RA> c_A before the the parallel_for and get the subview from that. Then it should work.

parallel_for (policy, …
{
  View<double **> A(team_scratch);
  View<const double**, RA> c_A(A);
  func_to_populate_A(A);

  parallel_for(TeamThreadRange … (const int& i)
  {
    View<const double*, RA> v(c_A, i, ALL);
    parallel_for(ThreadVectorRange … (const int& j)
    {
      // do something with v
    });
  };)

});

dholladay00 · 2017-01-23T19:51:35Z

Before the TeamThread parallel_for or the overall team_policy parallel_for?

crtrott · 2017-01-23T19:51:59Z

OK sorry I didn't look correctly.

crtrott · 2017-01-23T19:54:25Z

Here we go again (i mean now comes the real explanation after I understood this is using scratch space): you can't actually have a RandomAccess const view of scratch memory. This is an issue we probably need to fix. Basically texture objects can't reference scratch memory, so we would need to not use texture object. The right way to do this is most likely to enforce the usage of the correct memory space (exec_space::scratch_memory_space) for such scratch space views. That way everything would work, since the specialization for texture objects only kicks in for CudaSpace and CudaUVMSpace.

dholladay00 · 2017-01-23T21:49:36Z

Ah, ok. Perhaps for the time being I can use different memory traits so that it doesn't trigger the texture specialization. Currently I default to using the largest memory space for all scratch allocations whereas I should do something to take better advantage of the faster scratch memory spaces if possible.

hcedwar · 2017-02-15T19:11:16Z

Identified a bug: Erroneously attempts to attach texture object to Cuda space view with unmanaged memory.

hcedwar · 2017-02-22T19:49:56Z

Consider other cuda-const-random-access options, such as if an unmanaged view use 'ldg' intrinsics.

hcedwar · 2017-09-21T16:08:14Z

Resolve bug by verifying that a View to const random access Cuda memory, which currently uses texture objects, has a texture object available. Otherwise generate a meaningful error message.
If Kokkos configured to use 'ldg' then this verification is unnecessary.

Creating a const random access Cuda memory View assumes the use of texture objects, so verify the texture object can be created or retrieved.

dholladay00 · 2017-09-21T17:25:07Z

Will this fix allow my team scratch views (const, random access) to use __ldg (if enabled), or will it now emit a meaningful error message?

hcedwar · 2017-09-21T17:29:02Z

If KOKKOS_ENABLE_CUDA_LDG_INTRINSIC is set then all const random access Cuda Views will use the __ldg, otherwise an error message is emitted when trying to use a const random access Cuda View on non-allocated memory.

Add error check with meaningful message for issue #615.

dholladay00 · 2017-09-21T20:48:27Z

I checked out develop branch was trying this out and I am getting compile errors:

kokkos/core/src/impl/Kokkos_ViewMapping.hpp(3108): error: no instance of overloaded function "Kokkos::Impl::ViewDataHandle<Traits, std::enable_if<<expression>, void>::type>::assign [with Traits=Kokkos::ViewTraits<const int *, Kokkos::LayoutStride, Kokkos::MemoryTraits<3U>>]" matches the argument list
            argument types are: (int *const, size_t)
          detected during:
            instantiation of "void Kokkos::Impl::ViewMapping<std::enable_if<<expression>, void>::type, SrcTraits, Args...>::assign(Kokkos::Impl::ViewMapping<DstTraits, void> &, const Kokkos::Impl::ViewMapping<SrcTraits, void> &, Args...) [with SrcTraits=Kokkos::ViewTraits<int **, Kokkos::LayoutStride>, Args=<Kokkos::Impl::ALL_t, int>, DstTraits=Kokkos::ViewTraits<const int *, Kokkos::LayoutStride, Kokkos::MemoryTraits<3U>>]" 
kokkos/core/src/Kokkos_View.hpp(1127): here
            instantiation of "Kokkos::View<DataType, Properties...>::View(const Kokkos::View<RT, RP...> &, const Arg0 &, Args...) [with DataType=const int *, Properties=<Kokkos::LayoutStride, Kokkos::MemoryTraits<3U>>, RT=int **, RP=<Kokkos::LayoutStride>, Arg0=Kokkos::Impl::ALL_t, Args=<int>]"

Any ideas?

hcedwar · 2017-09-21T20:58:02Z

What is your calling code?

dholladay00 · 2017-09-21T21:27:05Z

Here is a small reproducer; it is simpler structure (no nested parallelism), but yields the same error.

#include <Kokkos_Core.hpp>
#include <cstdio>

using Kokkos::parallel_for;
using Kokkos::MemoryTraits;
using Kokkos::RandomAccess;
using Kokkos::subview;
#define KOKKOSALL Kokkos::Impl::ALL_t()
using Kokkos::LayoutStride;

typedef MemoryTraits< RandomAccess > RA;

template <class D, class ... P>
using View = Kokkos::View<D, P ... >;

int main (int argc, char* argv[]) {
  
  Kokkos::initialize(argc, argv);

  const int N = 10;
  
  View<double**, LayoutStride> A("A", N, N);
  View<const double**, LayoutStride, RA> B(A);
  // if use const view, compile error
  //View<const double*[4], LayoutRight, RA> A("A", N);

  parallel_for(N, KOKKOS_LAMBDA (const int i)
  {
    A(i,0) = 5.0;
    View<const double*, LayoutStride, RA> v(A, i, KOKKOSALL);
    //View<const double*, LayoutStride, RA> v(A, i, ALL());
    printf("v(0): %g \n", v(0));
  });

  Kokkos::finalize();
  
  return 0;
}

hcedwar · 2017-09-21T21:31:18Z

What is the default execution space?

dholladay00 · 2017-09-21T21:41:42Z

Should be Cuda.

Verification, I added:

  std::cout << "Default Execution Space: "
	    << typeid(Kokkos::DefaultExecutionSpace).name() << "\n";

which prints out:

[dholladay00@cn143 compile_dimension_subview]$ ./main.cuda 
Default Execution Space: N6Kokkos4CudaE

ibaned · 2017-09-21T21:51:38Z

for future reference, I believe Kokkos::DefaultExecutionSpace::name() prints a nicer string.

hcedwar · 2017-09-21T22:39:48Z

I have a reproducing unit test...

dholladay00 · 2017-09-21T22:43:23Z

Subviews should be MemoryRandomAccess which is both random access and unmanaged.

Good to know.

I'll need to change some things around to get this. Had been using const unsigned int ra = Unmanaged | Kokkos::RandomAccess;

crtrott · 2017-09-21T22:58:56Z

Thats actually the same
from Kokkos:

typedef Kokkos::MemoryTraits< Kokkos::Unmanaged | Kokkos::RandomAccess > MemoryRandomAccess

hcedwar · 2017-09-21T23:18:46Z

Found the error and lack of good error message. When taking a subview inside a functor
View<const double *, Cuda, MemoryRandomAccess>
from a
View< double * , Cuda >
the type conversion is not valid because a texture object needs to be created and cannot be from within a functor.

Solution:

#include <Kokkos_Core.hpp>
#include <cstdio>

using Kokkos::parallel_for;
using Kokkos::MemoryTraits;
using Kokkos::MemoryRandomAccess;
using Kokkos::subview;
using Kokkos::ALL ;
using Kokkos::LayoutStride;
using Kokkos::LayoutRight;

template <class D, class ... P>
using View = Kokkos::View<D, P ... >;

int main (int argc, char* argv[]) {
  
  Kokkos::initialize(argc, argv);
  {
    const int N = 10;
  
    View<double*[4], LayoutRight> A("A", N); // Allocate
    View<const double**, LayoutStride, MemoryRandomAccess> B(A); // const-Cuda-RandomAccess alias

    parallel_for(N, KOKKOS_LAMBDA (const int i) { A(i,0) = 5.0; } );
    // Cannot write and randomly read in the same loop!
    parallel_for(N, KOKKOS_LAMBDA (const int)
    {
      // Take a subview of a View that is already const-Cuda-RandomAccess:
      View<const double*, LayoutStride,MemoryRandomAccess> v(B, i, ALL);
      printf("v(0): %g \n", v(0));
    });
  }
  Kokkos::finalize();
  
  return 0;
}

dholladay00 · 2017-09-21T23:24:07Z

What about for this (more complex, but more representative) case:

parallel_for (policy, …
{
  View<double **> A(team_scratch);
  func_to_populate_A(A);

  parallel_for(TeamThreadRange … (const int& i)
  {
    View<const double*, RA> v(A, i, ALL);
    parallel_for(ThreadVectorRange … (const int& j)
    {
      // do something with v
    });
  };)

});

hcedwar · 2017-09-21T23:29:06Z

In this case the random access trait would not provide any help, even if it did compile, because the team scratch memory is already placed in __shared__ memory. Even so, the random access trait should not be aliasing data that is modified within the same kernel - a big no-no.

hcedwar added the Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos) label Feb 15, 2017

hcedwar added this to the Backlog milestone Feb 15, 2017

ibaned added the Enhancement Improve existing capability; will potentially require voting label May 17, 2017

hcedwar added a commit that referenced this issue Sep 21, 2017

Add error check with meaningful message for issue #615.

0ba0181

Creating a const random access Cuda memory View assumes the use of texture objects, so verify the texture object can be created or retrieved.

hcedwar mentioned this issue Sep 21, 2017

Add error check with meaningful message for issue #615. #1118

Merged

hcedwar modified the milestones: Backlog, 2017 September Sep 21, 2017

hcedwar self-assigned this Sep 21, 2017

hcedwar added a commit that referenced this issue Sep 21, 2017

Merge pull request #1118 from kokkos/issue-615

a039da3

Add error check with meaningful message for issue #615.

hcedwar added the InDevelop label Sep 21, 2017

crtrott closed this as completed Oct 28, 2017

crtrott mentioned this issue Oct 28, 2017

Kokkos promotion 2.04.04 -> 2.04.11 trilinos/Trilinos#1916

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

subview construction on Cuda backend #615

subview construction on Cuda backend #615

dholladay00 commented Jan 22, 2017 •

edited

crtrott commented Jan 23, 2017

dholladay00 commented Jan 23, 2017 •

edited

crtrott commented Jan 23, 2017

dholladay00 commented Jan 23, 2017 •

edited

crtrott commented Jan 23, 2017 •

edited

dholladay00 commented Jan 23, 2017

crtrott commented Jan 23, 2017

crtrott commented Jan 23, 2017 •

edited

dholladay00 commented Jan 23, 2017

hcedwar commented Feb 15, 2017

hcedwar commented Feb 22, 2017

hcedwar commented Sep 21, 2017 •

edited

dholladay00 commented Sep 21, 2017

hcedwar commented Sep 21, 2017

dholladay00 commented Sep 21, 2017

hcedwar commented Sep 21, 2017

dholladay00 commented Sep 21, 2017 •

edited

hcedwar commented Sep 21, 2017

dholladay00 commented Sep 21, 2017

ibaned commented Sep 21, 2017

hcedwar commented Sep 21, 2017

dholladay00 commented Sep 21, 2017 •

edited

crtrott commented Sep 21, 2017

hcedwar commented Sep 21, 2017

dholladay00 commented Sep 21, 2017

hcedwar commented Sep 21, 2017

subview construction on Cuda backend #615

subview construction on Cuda backend #615

Comments

dholladay00 commented Jan 22, 2017 • edited

crtrott commented Jan 23, 2017

dholladay00 commented Jan 23, 2017 • edited

crtrott commented Jan 23, 2017

dholladay00 commented Jan 23, 2017 • edited

crtrott commented Jan 23, 2017 • edited

dholladay00 commented Jan 23, 2017

crtrott commented Jan 23, 2017

crtrott commented Jan 23, 2017 • edited

dholladay00 commented Jan 23, 2017

hcedwar commented Feb 15, 2017

hcedwar commented Feb 22, 2017

hcedwar commented Sep 21, 2017 • edited

dholladay00 commented Sep 21, 2017

hcedwar commented Sep 21, 2017

dholladay00 commented Sep 21, 2017

hcedwar commented Sep 21, 2017

dholladay00 commented Sep 21, 2017 • edited

hcedwar commented Sep 21, 2017

dholladay00 commented Sep 21, 2017

ibaned commented Sep 21, 2017

hcedwar commented Sep 21, 2017

dholladay00 commented Sep 21, 2017 • edited

crtrott commented Sep 21, 2017

hcedwar commented Sep 21, 2017

dholladay00 commented Sep 21, 2017

hcedwar commented Sep 21, 2017

dholladay00 commented Jan 22, 2017 •

edited

dholladay00 commented Jan 23, 2017 •

edited

dholladay00 commented Jan 23, 2017 •

edited

crtrott commented Jan 23, 2017 •

edited

crtrott commented Jan 23, 2017 •

edited

hcedwar commented Sep 21, 2017 •

edited

dholladay00 commented Sep 21, 2017 •

edited

dholladay00 commented Sep 21, 2017 •

edited