You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Mansterteddy
changed the title
🐛 [Bug] Encountered 710 error when apply Torch-TensorRT to BERT
🐛 [Bug] Encountered cuda 710 error when apply Torch-TensorRT to BERT
Oct 25, 2022
I can confirm the issue is still occurring on the latest commit (ce29cc), built with PyTorch 1.13, and I am investigating the cause. When only using the first two arguments (tokens_tensor and segments_tensors), the model is currently succeeding (compilation + inference) with my configuration, so it seems passing 3+ tensor arguments to the model is causing the error.
I can confirm the issue is still occurring on the latest commit (ce29cc), built with PyTorch 1.13, and I am investigating the cause. When only using the first two arguments (tokens_tensor and segments_tensors), the model is currently succeeding (compilation + inference) with my configuration, so it seems passing 3+ tensor arguments to the model is causing the error.
- Issue arising when compiling BERT models with 3+ inputs
- Added temporary fix by decreasing the range of allowed values to the
random number generator for creating input tensors to [0,2), instead of [0,5)
- Used random float inputs in the range [0, 2) instead of int, then casted to desired
type. The ultimate effect of this change with regard to bug pytorch#1418, is
random floats are selected in the range [0, 2), then casted to Int, effectively making the
range of allowed ints {0, 1}, as required by the model
- More robust fix to follow
Bug Description
I wanted to use Torch-TensorRT to boost BERT model inference, but met following errors:
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [32,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [33,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [34,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [35,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [36,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [37,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [38,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [39,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [40,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [41,0,0] Assertion
srcIndex < srcSelectDimSize
failed.CUDA initialization failure with error: 710. Please check your CUDA installation: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
Segmentation fault (core dumped)
To Reproduce
Environment
conda
,pip
,libtorch
, source): pipAdditional context
The text was updated successfully, but these errors were encountered: