Add configuration to target CUDA compute capability 8.6 #3713

keichi · 2021-01-07T10:49:12Z

This PR adds a flag to compile Kokkos with CUDA compute capability 8.6 (changes are based on #3122). I need this because I'm working with Ampere-based RTX cards.

dalg24-jenkins · 2021-01-07T10:49:14Z

Can one of the admins verify this patch?

masterleinad

Looks good to me.

dalg24 · 2021-01-07T14:02:45Z

OK to test

keichi · 2021-01-07T16:29:18Z

I ran the unit tests on RTX3090 and the following tests in KokkosCore_UnitTest_Cuda2 failed.

kokkos/core/unit_test/TestTeamTeamSize.hpp

Lines 163 to 167 in 78538c8

    
           test_team_policy_max_recommended<double, 2, policy_type_1024_2>(0); 
        
           test_team_policy_max_recommended<double, 2, policy_type_1024_2>( 
        
               max_scratch_size / 3 / 2); 
        
           test_team_policy_max_recommended<double, 2, policy_type_1024_2>( 
        
               max_scratch_size / 2);

kokkos/core/unit_test/TestTeamTeamSize.hpp

Lines 178 to 182 in 78538c8

    
           test_team_policy_max_recommended<double, 16, policy_type_1024_2>(0); 
        
           test_team_policy_max_recommended<double, 16, policy_type_1024_2>( 
        
               max_scratch_size / 3 / 2); 
        
           test_team_policy_max_recommended<double, 16, policy_type_1024_2>( 
        
               max_scratch_size / 2);

After some debugging I figured out cuda_get_max_block_size() was returning zero. I think the reason is that properties.MaxThreadsPerBlock is 1536 on CC 8.6 while it's 2048 on other devices (I've tested on P100 and V100). I modified cuda_deduce_block_size() to account for that but I'm unsure if the fix is correct. Please review.

Changes were added

dalg24

LGTM

Add configuration to target CUDA compute capability 8.6

d32fe2b

masterleinad previously approved these changes Jan 7, 2021

View reviewed changes

Add shared memory limit for CUDA arch 86

1dc7db6

dalg24 previously approved these changes Jan 7, 2021

View reviewed changes

Fix issue wheree cuda_get_max_block_size() returns 0

536ee70

crtrott approved these changes Jan 7, 2021

View reviewed changes

dalg24 approved these changes Jan 7, 2021

View reviewed changes

crtrott merged commit 7f76f68 into kokkos:develop Jan 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add configuration to target CUDA compute capability 8.6 #3713

Add configuration to target CUDA compute capability 8.6 #3713

keichi commented Jan 7, 2021

dalg24-jenkins commented Jan 7, 2021

masterleinad left a comment

dalg24 commented Jan 7, 2021

keichi commented Jan 7, 2021

dalg24 left a comment

Add configuration to target CUDA compute capability 8.6 #3713

Add configuration to target CUDA compute capability 8.6 #3713

Conversation

keichi commented Jan 7, 2021

dalg24-jenkins commented Jan 7, 2021

masterleinad left a comment

Choose a reason for hiding this comment

dalg24 commented Jan 7, 2021

keichi commented Jan 7, 2021

dalg24 left a comment

Choose a reason for hiding this comment