-
Notifications
You must be signed in to change notification settings - Fork 24.8k
Open
Labels
module: cudaRelated to torch.cuda, and CUDA support in generalRelated to torch.cuda, and CUDA support in generaltriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
🚀 Feature
Motivation
to increase throughput
Pitch
for every feasible CUDA kernel, take input as THCDeviceTensor<T, DIM, IndexT, RestrictPtrTraits>
instead of DefaultPtrTraits
in order to add __restrict__
keyword to the device tensors. This alone seems to increase throughput about 3~5%. The increase of throughput is tested with upsampling bilinear code.
Alternatives
Additional context
Metadata
Metadata
Assignees
Labels
module: cudaRelated to torch.cuda, and CUDA support in generalRelated to torch.cuda, and CUDA support in generaltriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module