Skip to content

Commit

Permalink
[CUDA][Convolution] Add missing launch bounds to vol2col_kernel (#1…
Browse files Browse the repository at this point in the history
…28740)

Fix "too many resources requested" that can happen with recent toolkits on V100.

Pull Request resolved: #128740
Approved by: https://github.com/mikaylagawarecki
  • Loading branch information
eqy authored and pytorchmergebot committed Jun 18, 2024
1 parent 26e374e commit 4e03263
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions aten/src/ATen/native/cuda/vol2col.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ using namespace at::cuda::detail;

// Kernel for fast unfold+copy on volumes
template <typename T>
C10_LAUNCH_BOUNDS_1(1024)
__global__ void vol2col_kernel(
const int64_t n,
const T* data_vol,
Expand Down

0 comments on commit 4e03263

Please sign in to comment.