MD iteration policy ignores lower bound on GPUs #1041

azrael417 · 2017-08-15T18:51:09Z

hello, it seems that the md iteration policy ignore lower loop bounds in the GPU code. It computes the total range correctly:

core/src/Cuda/Kokkos_Cuda_Parallle.hpp:434

```const dim3 block( m_rp.m_tile[0] , m_rp.m_tile[1] , m_rp.m_tile[2] );
  const dim3 grid(
      std::min( ( m_rp.m_upper[0] - m_rp.m_lower[0] + block.x - 1 ) / block.x , maxblocks )
    , std::min( ( m_rp.m_upper[1] - m_rp.m_lower[1] + block.y - 1 ) / block.y , maxblocks )
    , std::min( ( m_rp.m_upper[2] - m_rp.m_lower[2] + block.z - 1 ) / block.z , maxblocks )
    );```

but then forgets to add the offset back. For example:

template< typename RP , typename Functor >
struct DeviceIterateTile<3,RP,Functor,void >
{
  using index_type = typename RP::index_type;

  __device__
  DeviceIterateTile( const RP & rp_ , const Functor & f_ )
  : m_rp(rp_)
  , m_func(f_)
  {}

  inline __device__
  void exec_range() const
  {
    // LL                                                                                                                      
    if (RP::inner_direction == RP::Left) {
      for ( index_type tile_id2 = (index_type)blockIdx.z; tile_id2 < m_rp.m_tile_end[2]; tile_id2 += gridDim.z ) {
        const index_type offset_2 = tile_id2*m_rp.m_tile[2] + (index_type)threadIdx.z;
        if ( offset_2 < m_rp.m_upper[2] && (index_type)threadIdx.z < m_rp.m_tile[2] ) {

          for ( index_type tile_id1 = (index_type)blockIdx.y; tile_id1 < m_rp.m_tile_end[1]; tile_id1 += gridDim.y ) {
            const index_type offset_1 = tile_id1*m_rp.m_tile[1] + (index_type)threadIdx.y;
            if ( offset_1 < m_rp.m_upper[1] && (index_type)threadIdx.y < m_rp.m_tile[1] ) {

              for ( index_type tile_id0 = (index_type)blockIdx.x; tile_id0 < m_rp.m_tile_end[0]; tile_id0 += gridDim.x ) {
                const index_type offset_0 = tile_id0*m_rp.m_tile[0] + (index_type)threadIdx.x;
                if ( offset_0 < m_rp.m_upper[0] && (index_type)threadIdx.x < m_rp.m_tile[0] ) {
                  m_func(offset_0 , offset_1 , offset_2);
                }
              }
            }
          }
        }
      }```

Some of the specializations look OK though:

```// Specializations for void tag type                                                                                           
template< typename RP , typename Functor , typename Tag, typename ValueType >
struct DeviceIterateTile<4,RP,Functor,Tag,ValueType, typename std::enable_if< !is_array_type<ValueType>::value && !is_void< Ta\
g >::value >::type >```

In here we have:

``` // LL                                                                                                                  
        if (RP::inner_direction == RP::Left) {
          for (int i=0; i<RP::rank; ++i) {
            m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
            tile_idx /= m_rp.m_tile_end[i];```

And we have more of these in the following, starting after declaring functions namespace Reduce.

Best Regards
Thorsten

The text was updated successfully, but these errors were encountered:

ndellingwood · 2017-08-15T22:01:50Z

Fixed and testing now.

This commit addresses issue kokkos#1041 The lower range offset was missing from the multi-dim index calculations in the Cuda_iterateTile ParallelFor impls. Added the missing offset. Enhanced unit tests to properly exercise this.

ndellingwood · 2017-08-15T22:42:43Z

PR #1042 issued with fix for this issue.

ibaned assigned ndellingwood Aug 15, 2017

ibaned added this to the 2017-September (mid) milestone Aug 15, 2017

ibaned added the Question For Kokkos internal and external contributors and users label Aug 15, 2017

ndellingwood added the Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos) label Aug 15, 2017

ndellingwood mentioned this issue Aug 15, 2017

Cuda_IterateTile: Add missing lower range loop offset #1042

Merged

crtrott added InDevelop and removed Question For Kokkos internal and external contributors and users labels Aug 17, 2017

crtrott closed this as completed Sep 11, 2017

crtrott mentioned this issue Sep 11, 2017

Kokkos + KokkosKernels promotion trilinos/Trilinos#1721

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MD iteration policy ignores lower bound on GPUs #1041

MD iteration policy ignores lower bound on GPUs #1041

azrael417 commented Aug 15, 2017 •

edited

ndellingwood commented Aug 15, 2017

ndellingwood commented Aug 15, 2017

MD iteration policy ignores lower bound on GPUs #1041

MD iteration policy ignores lower bound on GPUs #1041

Comments

azrael417 commented Aug 15, 2017 • edited

ndellingwood commented Aug 15, 2017

ndellingwood commented Aug 15, 2017

azrael417 commented Aug 15, 2017 •

edited