Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subview construction on Cuda backend #615

Closed
dholladay00 opened this issue Jan 22, 2017 · 26 comments
Closed

subview construction on Cuda backend #615

dholladay00 opened this issue Jan 22, 2017 · 26 comments
Assignees
Labels
Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos) Enhancement Improve existing capability; will potentially require voting
Milestone

Comments

@dholladay00
Copy link

dholladay00 commented Jan 22, 2017

I am having an issue with subviews on the Cuda backend. I was told by @crtrott to use this View<> a(b, i, ALL) rather than View<> a(subview(b, i, ALL)). I don't recall actual difference is but the preferred method results in a compile time error (on up to date master and develop branches) and the other method does not.

kokkos/core/src/impl/Kokkos_ViewMapping.hpp(3063): error: no instance of overloaded function "Kokkos::Experimental::Impl::ViewDataHandle<Traits, std::enable_if<<expression>, void>::type>::assign [with Traits=Kokkos::ViewTraits<const double *, Kokkos::LayoutStride, RA>]" matches the argument list
            argument types are: (double *const, size_t)
          detected during:
            instantiation of "void Kokkos::Experimental::Impl::ViewMapping<std::enable_if<<expression>, void>::type, SrcTraits, Args...>::assign(Kokkos::Experimental::Impl::ViewMapping<DstTraits, void> &, const Kokkos::Experimental::Impl::ViewMapping<SrcTraits, void> &, Args...) [with SrcTraits=Kokkos::ViewTraits<double **, Kokkos::LayoutStride>, Args=<int, Kokkos::Impl::ALL_t>, DstTraits=Kokkos::ViewTraits<const double *, Kokkos::LayoutStride, RA>]" 
/home/dholladay00/kokkos_tutorial/kokkos/core/src/Kokkos_View.hpp(1260): here
            instantiation of "Kokkos::View<DataType, Properties...>::View(const Kokkos::View<RT, RP...> &, const Arg0 &, Args...) [with DataType=const double *, Properties=<Kokkos::LayoutStride, RA>, RT=double **, RP=<Kokkos::LayoutStride>, Arg0=int, Args=<Kokkos::Impl::ALL_t>]" 
main.cpp(32): here

However, using the method that compiles results in a runtime error (both branches): :0: : block: [0,0,0], thread: [0,0,0] Assertion `Cannot create Cuda texture object from within a Cuda kernel` failed. Code below should illustrate what I am describing.

#include <Kokkos_Core.hpp>
#include <cstdio>

using Kokkos::parallel_for;
using Kokkos::ALL;
using Kokkos::MemoryTraits;
using Kokkos::RandomAccess;
using Kokkos::subview;
using Kokkos::ALL;

using Kokkos::LayoutStride;

typedef MemoryTraits< RandomAccess > RA;

template <class D, class ... P>
using View = Kokkos::View<D, P ... >;

int main (int argc, char* argv[]) {

  Kokkos::initialize(argc, argv);

  const int N = 10;
  
  // stores 2D mutable data
  View<double**, LayoutStride> A("A", N, N);

  parallel_for(N, KOKKOS_LAMBDA (const int i)
  {
    // get a 1D slice of the view and enter a region in which data is const
    //View<const double*, LayoutStride, RA> v(subview(A, i, ALL())); // compiles, runtime error
    View<const double*, LayoutStride, RA> v(A, i, ALL()); // compile time error
  });

  Kokkos::finalize();

  return 0;
}
@crtrott
Copy link
Member

crtrott commented Jan 23, 2017

Can you try already creating a RA view outside of the kernel, and then get the subview?
Basically you are getting caught up in the Texture object creation which has to happen on the host ...
Need to think about how to better handle this.

@dholladay00
Copy link
Author

dholladay00 commented Jan 23, 2017

This worked (and I was able to remove the subview(…) in the ctor. However, this does not behave as I would like. It's probably my lack of understanding of how texture objects work.

  View<double**, LayoutStride> A("A", N, N);
  View<const double**, LayoutStride, RA> B(A);

  parallel_for(N, KOKKOS_LAMBDA (const int i)
  {
    A(i,0) = 5.0;
    View<const double*, LayoutStride, RA> v(B, i, ALL()); // works
    printf("v(0): %g \n", v(0));
  });

This results in:

v(0): 0 
v(0): 0 
v(0): 0 
v(0): 0 
v(0): 0 
v(0): 0 
v(0): 0 
v(0): 0 
v(0): 0 
v(0): 0 

It seems that v is not aliasing the data in A.

One of my use cases is that I have a team_scratch view that is populated with values and then is not changed after some point. Perhaps I need to use different memory traits for these. Is that your take as well?

@crtrott
Copy link
Member

crtrott commented Jan 23, 2017

The issue here is that you are lying and thus get what you deserve ;-)
Jokes aside: v (and B) are "const" and thus within the scope of kernel the underlying data is not allowed to change. That has consequences for things like operation reordering. For example the compiler is perfectly in its right to first create the subview, then print, and then do the assignment to A. Because the assignment to A is not supposed to have any effect on B or v ...

In this case the actual underlying issue is that texture fetches are non-coherent within a kernel. That means updates to the underlying data may or may not be seen depending on the state of the cash, data flushes etc.

If you change the data of A in a separate kernel then you should see the changes.

@dholladay00
Copy link
Author

dholladay00 commented Jan 23, 2017

So that example is a little over simplistic, but it did reproduce the compile errors. Also, replacing v(B, i, ALL) with v(A, i, ALL) in the above example won't compile.

This is a little more representative (pseudo-code):

parallel_for (policy, …
{
  View<double **> A(team_scratch);
  func_to_populate_A(A);

  parallel_for(TeamThreadRange … (const int& i)
  {
    View<const double*, RA> v(A, i, ALL);
    parallel_for(ThreadVectorRange … (const int& j)
    {
      // do something with v
    });
  };)

});

@crtrott
Copy link
Member

crtrott commented Jan 23, 2017

You need to get a View<const double**,RA> c_A before the the parallel_for and get the subview from that. Then it should work.

parallel_for (policy, …
{
  View<double **> A(team_scratch);
  View<const double**, RA> c_A(A);
  func_to_populate_A(A);

  parallel_for(TeamThreadRange … (const int& i)
  {
    View<const double*, RA> v(c_A, i, ALL);
    parallel_for(ThreadVectorRange … (const int& j)
    {
      // do something with v
    });
  };)

});

@dholladay00
Copy link
Author

Before the TeamThread parallel_for or the overall team_policy parallel_for?

@crtrott
Copy link
Member

crtrott commented Jan 23, 2017

OK sorry I didn't look correctly.

@crtrott
Copy link
Member

crtrott commented Jan 23, 2017

Here we go again (i mean now comes the real explanation after I understood this is using scratch space): you can't actually have a RandomAccess const view of scratch memory. This is an issue we probably need to fix. Basically texture objects can't reference scratch memory, so we would need to not use texture object. The right way to do this is most likely to enforce the usage of the correct memory space (exec_space::scratch_memory_space) for such scratch space views. That way everything would work, since the specialization for texture objects only kicks in for CudaSpace and CudaUVMSpace.

@dholladay00
Copy link
Author

Ah, ok. Perhaps for the time being I can use different memory traits so that it doesn't trigger the texture specialization. Currently I default to using the largest memory space for all scratch allocations whereas I should do something to take better advantage of the faster scratch memory spaces if possible.

@hcedwar
Copy link
Contributor

hcedwar commented Feb 15, 2017

Identified a bug: Erroneously attempts to attach texture object to Cuda space view with unmanaged memory.

@hcedwar hcedwar added the Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos) label Feb 15, 2017
@hcedwar hcedwar added this to the Backlog milestone Feb 15, 2017
@hcedwar
Copy link
Contributor

hcedwar commented Feb 22, 2017

Consider other cuda-const-random-access options, such as if an unmanaged view use 'ldg' intrinsics.

@ibaned ibaned added the Enhancement Improve existing capability; will potentially require voting label May 17, 2017
@hcedwar
Copy link
Contributor

hcedwar commented Sep 21, 2017

Resolve bug by verifying that a View to const random access Cuda memory, which currently uses texture objects, has a texture object available. Otherwise generate a meaningful error message.
If Kokkos configured to use 'ldg' then this verification is unnecessary.

hcedwar added a commit that referenced this issue Sep 21, 2017
Creating a const random access Cuda memory View assumes
the use of texture objects, so verify the texture object
can be created or retrieved.
@hcedwar hcedwar modified the milestones: Backlog, 2017 September Sep 21, 2017
@hcedwar hcedwar self-assigned this Sep 21, 2017
@dholladay00
Copy link
Author

Will this fix allow my team scratch views (const, random access) to use __ldg (if enabled), or will it now emit a meaningful error message?

@hcedwar
Copy link
Contributor

hcedwar commented Sep 21, 2017

If KOKKOS_ENABLE_CUDA_LDG_INTRINSIC is set then all const random access Cuda Views will use the __ldg, otherwise an error message is emitted when trying to use a const random access Cuda View on non-allocated memory.

hcedwar added a commit that referenced this issue Sep 21, 2017
Add error check with meaningful message for issue #615.
@dholladay00
Copy link
Author

I checked out develop branch was trying this out and I am getting compile errors:

kokkos/core/src/impl/Kokkos_ViewMapping.hpp(3108): error: no instance of overloaded function "Kokkos::Impl::ViewDataHandle<Traits, std::enable_if<<expression>, void>::type>::assign [with Traits=Kokkos::ViewTraits<const int *, Kokkos::LayoutStride, Kokkos::MemoryTraits<3U>>]" matches the argument list
            argument types are: (int *const, size_t)
          detected during:
            instantiation of "void Kokkos::Impl::ViewMapping<std::enable_if<<expression>, void>::type, SrcTraits, Args...>::assign(Kokkos::Impl::ViewMapping<DstTraits, void> &, const Kokkos::Impl::ViewMapping<SrcTraits, void> &, Args...) [with SrcTraits=Kokkos::ViewTraits<int **, Kokkos::LayoutStride>, Args=<Kokkos::Impl::ALL_t, int>, DstTraits=Kokkos::ViewTraits<const int *, Kokkos::LayoutStride, Kokkos::MemoryTraits<3U>>]" 
kokkos/core/src/Kokkos_View.hpp(1127): here
            instantiation of "Kokkos::View<DataType, Properties...>::View(const Kokkos::View<RT, RP...> &, const Arg0 &, Args...) [with DataType=const int *, Properties=<Kokkos::LayoutStride, Kokkos::MemoryTraits<3U>>, RT=int **, RP=<Kokkos::LayoutStride>, Arg0=Kokkos::Impl::ALL_t, Args=<int>]"

Any ideas?

@hcedwar
Copy link
Contributor

hcedwar commented Sep 21, 2017

What is your calling code?

@dholladay00
Copy link
Author

dholladay00 commented Sep 21, 2017

Here is a small reproducer; it is simpler structure (no nested parallelism), but yields the same error.

#include <Kokkos_Core.hpp>
#include <cstdio>

using Kokkos::parallel_for;
using Kokkos::MemoryTraits;
using Kokkos::RandomAccess;
using Kokkos::subview;
#define KOKKOSALL Kokkos::Impl::ALL_t()
using Kokkos::LayoutStride;

typedef MemoryTraits< RandomAccess > RA;

template <class D, class ... P>
using View = Kokkos::View<D, P ... >;

int main (int argc, char* argv[]) {
  
  Kokkos::initialize(argc, argv);

  const int N = 10;
  
  View<double**, LayoutStride> A("A", N, N);
  View<const double**, LayoutStride, RA> B(A);
  // if use const view, compile error
  //View<const double*[4], LayoutRight, RA> A("A", N);

  parallel_for(N, KOKKOS_LAMBDA (const int i)
  {
    A(i,0) = 5.0;
    View<const double*, LayoutStride, RA> v(A, i, KOKKOSALL);
    //View<const double*, LayoutStride, RA> v(A, i, ALL());
    printf("v(0): %g \n", v(0));
  });

  Kokkos::finalize();
  
  return 0;
}

@hcedwar
Copy link
Contributor

hcedwar commented Sep 21, 2017

What is the default execution space?

@dholladay00
Copy link
Author

Should be Cuda.

Verification, I added:

  std::cout << "Default Execution Space: "
	    << typeid(Kokkos::DefaultExecutionSpace).name() << "\n";

which prints out:

[dholladay00@cn143 compile_dimension_subview]$ ./main.cuda 
Default Execution Space: N6Kokkos4CudaE

@ibaned
Copy link
Contributor

ibaned commented Sep 21, 2017

for future reference, I believe Kokkos::DefaultExecutionSpace::name() prints a nicer string.

@hcedwar
Copy link
Contributor

hcedwar commented Sep 21, 2017

I have a reproducing unit test...

@dholladay00
Copy link
Author

dholladay00 commented Sep 21, 2017

Subviews should be MemoryRandomAccess which is both random access and unmanaged.

Good to know.

I'll need to change some things around to get this. Had been using const unsigned int ra = Unmanaged | Kokkos::RandomAccess;

@crtrott
Copy link
Member

crtrott commented Sep 21, 2017

Thats actually the same
from Kokkos:

typedef Kokkos::MemoryTraits< Kokkos::Unmanaged | Kokkos::RandomAccess > MemoryRandomAccess

@hcedwar
Copy link
Contributor

hcedwar commented Sep 21, 2017

Found the error and lack of good error message. When taking a subview inside a functor
View<const double *, Cuda, MemoryRandomAccess>
from a
View< double * , Cuda >
the type conversion is not valid because a texture object needs to be created and cannot be from within a functor.

Solution:

#include <Kokkos_Core.hpp>
#include <cstdio>

using Kokkos::parallel_for;
using Kokkos::MemoryTraits;
using Kokkos::MemoryRandomAccess;
using Kokkos::subview;
using Kokkos::ALL ;
using Kokkos::LayoutStride;
using Kokkos::LayoutRight;

template <class D, class ... P>
using View = Kokkos::View<D, P ... >;

int main (int argc, char* argv[]) {
  
  Kokkos::initialize(argc, argv);
  {
    const int N = 10;
  
    View<double*[4], LayoutRight> A("A", N); // Allocate
    View<const double**, LayoutStride, MemoryRandomAccess> B(A); // const-Cuda-RandomAccess alias

    parallel_for(N, KOKKOS_LAMBDA (const int i) { A(i,0) = 5.0; } );
    // Cannot write and randomly read in the same loop!
    parallel_for(N, KOKKOS_LAMBDA (const int)
    {
      // Take a subview of a View that is already const-Cuda-RandomAccess:
      View<const double*, LayoutStride,MemoryRandomAccess> v(B, i, ALL);
      printf("v(0): %g \n", v(0));
    });
  }
  Kokkos::finalize();
  
  return 0;
}

@dholladay00
Copy link
Author

What about for this (more complex, but more representative) case:

parallel_for (policy, …
{
  View<double **> A(team_scratch);
  func_to_populate_A(A);

  parallel_for(TeamThreadRange … (const int& i)
  {
    View<const double*, RA> v(A, i, ALL);
    parallel_for(ThreadVectorRange … (const int& j)
    {
      // do something with v
    });
  };)

});

@hcedwar
Copy link
Contributor

hcedwar commented Sep 21, 2017

In this case the random access trait would not provide any help, even if it did compile, because the team scratch memory is already placed in __shared__ memory. Even so, the random access trait should not be aliasing data that is modified within the same kernel - a big no-no.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos) Enhancement Improve existing capability; will potentially require voting
Projects
None yet
Development

No branches or pull requests

4 participants