CUDA optimization: using `__restrict__` whenever possible

## 🚀 Feature


## Motivation

to increase throughput

## Pitch
for every feasible CUDA kernel, take input as `THCDeviceTensor<T, DIM, IndexT, RestrictPtrTraits>` instead of `DefaultPtrTraits` in order to add `__restrict__` keyword to the device tensors. This alone seems to increase throughput about 3~5%. The increase of throughput is tested with upsampling bilinear code.

## Alternatives



## Additional context

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA optimization: using `restrict` whenever possible #19335

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CUDA optimization: using __restrict__ whenever possible #19335

Description

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

CUDA optimization: using `restrict` whenever possible #19335