Add scripts for comprehensive benchmark TensorIterator #30248
Labels
feature
A request for a proper, new feature.
module: performance
Issues related to performance, either of kernel code or framework glue
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
馃殌 Feature
Proposed in: #29743
TensorIterator is performance-critical but complicated. Changing some parts of it might
cause hard-to-realize regression on the other parts. It would be very convenient to have a script that benchmark all the case we could imagine:
Memory layout: trivial 1D, (contiguous dim, non-contiguous dims)
Problem size: small, medium, large
Type of computation: unary ops, binary ops, compare ops, reduction
Data type: all dtypes with/without promotion
Inplace: True, False
Device: CPU, CUDA
The designed usage of the script could be:
Step 1:
Install a PyTorch build of the master branch, and run
Step 2:
Go to your branch, build install and run
Step 3:
Run the following command to get the report:
cc: @ngimel @VitalyFedyunin
cc: @csarofeen
cc @VitalyFedyunin @ngimel @mruberry
The text was updated successfully, but these errors were encountered: