-
Notifications
You must be signed in to change notification settings - Fork 21.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
GEMM with int8 datatype throws RuntimeError on GPU #49890
Comments
Thanks for suggesting this feature, @ilovepytorch. We've had requests for integer matmul support in the past, too. In fact, there's probably another issue with this same request. I think we would accept a PR implementing integer matrix multiplication. |
I want to try to work on this. Could you give me some guide on how to start on this |
Supporting integer matrix multiplication is a challenging task suitable for PyTorch and GEMM experts, so there probably won't be a guide for this. Would you be interested in a simpler task? |
That's what I thought. The problem just sounded interesting. Yes, sure i'll try something simpler. |
To support fast INT8 GEMM on GPUs, I think we need to change https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/cuda/CUDABlas.cpp to add the support of INT8 GEMM (with cublasGemmEx where the A/B data types are CUDA_R_8I, CUDA_R_8I, and the C data type is CUDA_R_32I. cc @ngimel |
|
Should there be a |
According to the current doc, there isn't. From my perspective, I think it would be better if the user does not need to explicitly pass the argument for dtype in |
But I think it would be good to allow to configure custom accumulation dtype (if it makes sense) and output dtype if the user for some reason wants to supply these (e.g. before we have finalized defaults for casting) |
@jonykarki @mruberry
To be frank, a number of them are not that critical to fix, like torch.ceil (mainly used for floating-point computations), while fixing some of them (especially those frequently-used) can be quite beneficial, like GEMM-related, MaxPool/AvgPool-related and Loss-related. Sometimes when users want to train/test a neural network using special datatypes, fixing those APIs will be helpful to some extent:) |
Yeah, this makes sense to me! Sorry I am not an expert at designing APIs:) Please feel free to discuss this with other experts as well. |
What's the motivation behind enabling int8 for all those operations? If it's quantization and quantization-aware training, pytorch supports a number of operations for quantized tensors, and in general, requests should be discussed with quantization team. Enabling training operations for plain int8 type doesn't make much sense to me, tbh. |
Thanks for your answers! I have just checked quantization in Pytorch, and found that "At the moment PyTorch doesn鈥檛 provide quantized operator implementations on CUDA" and this is for "for future work". I believe the support for int8 on GPU will deliver better performance (especially most-frequently used GEMM), and this seems to be the trend. Nvidia A100 has Tensor Core support for int8 GEMM, and I suppose Pytorch could benefit a lot from exploiting those hardware intrinsics. |
馃悰 Bug
To Reproduce
Simply run
python3 gemm.py
will cause the following RuntimeError:gemm(True)
passes, namely running GEMM with with data int8 on CPU is fine.Expected behavior
GEMM with data int8 on GPU should be supported.
Environment
PyTorch version: 1.7.0+cu101
Is debug build: True
CUDA used to build PyTorch: 10.1
ROCM used to build PyTorch: N/A
OS: CentOS Linux release 7.7.1908 (Core) (x86_64)
GCC version: (GCC) 7.2.0
Clang version: 9.0.0 (tags/RELEASE_900/final)
CMake version: version 3.12.2
Python version: 3.6 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: Tesla V100-PCIE-16GB
Nvidia driver version: 455.45.01
cuDNN version: Probably one of the following:
/usr/lib64/libcudnn.so.7.5.0
/usr/lib64/libcudnn.so.8.0.5
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7.5.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.18.0
[pip3] numpydoc==0.7.0
[pip3] pytorch-sublstm==0.0.2
[pip3] torch==1.7.0+cu101
[pip3] torch-tvm==0.0.1
[pip3] torchaudio==0.7.0
[pip3] torchfile==0.1.0
[pip3] torchnet==0.0.4
[pip3] torchvision==0.8.1+cu101
[conda] blas 1.0 mkl
[conda] cuda100 1.0 0 pytorch
[conda] cudatoolkit 10.1.243 h6bb024c_0
[conda] magma-cuda101 2.5.2 1 pytorch
[conda] mkl 2020.0 166
[conda] mkl-include 2020.0 166
[conda] mkl-service 2.3.0 py36he904b0f_0
[conda] mkl_fft 1.0.15 py36ha843d7b_0
[conda] mkl_random 1.1.0 py36hd6b4f25_0
[conda] numpy 1.18.0 pypi_0 pypi
[conda] numpydoc 0.7.0 py36h18f165f_0
[conda] pytorch-sublstm 0.0.2 pypi_0 pypi
[conda] torch 1.7.0+cu101 pypi_0 pypi
[conda] torch-tvm 0.0.1 pypi_0 pypi
[conda] torchaudio 0.7.0 pypi_0 pypi
[conda] torchfile 0.1.0 pypi_0 pypi
[conda] torchnet 0.0.4 pypi_0 pypi
[conda] torchvision 0.8.1+cu101 pypi_0 pypi
cc @jianyuh @nikitaved @pearu @mruberry @heitorschueroff @walterddr
The text was updated successfully, but these errors were encountered: