Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion srcIndex < srcSelectDimSize failed. #46

Closed
SparkJiao opened this issue Nov 20, 2018 · 9 comments
Closed

Assertion srcIndex < srcSelectDimSize failed. #46

SparkJiao opened this issue Nov 20, 2018 · 9 comments

Comments

@SparkJiao
Copy link

Sorry to bother you
I recently have used your extract_features.py to extract features of some data set but failed. The error information is as follows:
/opt/conda/conda-bld/pytorch_1532584813488/work/aten/src/THC/THCTensorIndex.cu:362: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [11,0,0], thread: [95,0,0] Assertion srcIndex < srcSelectDimSizefailed. Traceback (most recent call last): File "examples/extract_features.py", line 405, in <module> main() File "examples/extract_features.py", line 375, in main all_encoder_layers, _ = model(input_ids, token_type_ids=None, attention_mask=input_mask) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 610, in forward output_all_encoded_layers=output_all_encoded_layers) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 328, in forward hidden_states = layer_module(hidden_states, attention_mask) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 313, in forward attention_output = self.attention(hidden_states, attention_mask) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 273, in forward self_output = self.self(input_tensor, attention_mask) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 224, in forward mixed_query_layer = self.query(hidden_states) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 55, in forward return F.linear(input, self.weight, self.bias) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/nn/functional.py", line 1026, in linear output = input.matmul(weight.t()) RuntimeError: cublas runtime error : resource allocation failed at /opt/conda/conda-bld/pytorch_1532584813488/work/aten/src/THC/THCGeneral.cpp:333
It seems that the index_select function in the models crashed. I read my own data from json files and construct examples from them. I set the batch-size equals 1 and I modified the max_seq_length to the max_length of the input sentences.
Thanks for your help!

@thomwolf
Copy link
Member

Your log is very hard to read. Can you format it cleanly?

@SparkJiao
Copy link
Author

SparkJiao commented Nov 20, 2018

I'm so sorry
The first error log is as follows:

/opt/conda/conda-bld/pytorch_1532584813488/work/aten/src/THC/THCTensorIndex.cu:362: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [11,0,0], thread: [95,0,0] Assertion \`srcIndex < srcSelectDimSize\` failed.

And then the Traceback finally points to line 1026 torch/nn/functional.py in linear:
output = input.matmul(weight.t())
It seems that somewhere crashed while using torch.index_select() , do you think it is because my sentence is too long? I will check other aspects, thank you very much

@thomwolf
Copy link
Member

It seems like a failed resource allocation.
Maybe you don't have enough RAM or your GPU is too small ?

@SparkJiao
Copy link
Author

My GPU has 12400 MB and I think that's enough, may be I should use 'yield' to input the data one by one? I will load less data to try, thanks u a lot!

@thomwolf
Copy link
Member

Ok feel free to re-open the issue if you still have troubles.

stevezheng23 added a commit to stevezheng23/transformers that referenced this issue Mar 24, 2020
add mat-coqa runner with multitask + adversarial training support
@zyfedward
Copy link

Hi @SparkJiao

I met the same issue here, how did you resolve this?

@nv-quan
Copy link

nv-quan commented Jun 3, 2020

I have the same issue, did you resolve this? @zyfedward @SparkJiao

@LysandreJik
Copy link
Member

@nv-quan, do you mind opening a new issue with the template so that we may help?

@SparkJiao
Copy link
Author

I have forgot how to reproduce the problem but the index_select error usually happened due to wrong index. You can use a smaller batch size and run the script on CPU to check the full traceback since the traceback while using GPU is delayed.

xloem pushed a commit to xloem/transformers that referenced this issue Apr 9, 2023
* Update trainer and model flows to accommodate sparseml

Disable FP16 on QAT start (huggingface#12)

* Override LRScheduler when using LRModifiers

* Disable FP16 on QAT start

* keep wrapped scaler object for training after disabling

Using QATMatMul in DistilBERT model class (huggingface#41)

Removed double quantization of output of context layer. (huggingface#45)

Fix DataParallel validation forward signatures (huggingface#47)

* Fix: DataParallel validation forward signatures

* Update: generalize forward_fn selection

Best model after epoch (huggingface#46)

fix sclaer check for non fp16 mode in trainer (huggingface#38)

Mobilebert QAT (huggingface#55)

* Remove duplicate quantization of vocabulary.

enable a QATWrapper for non-parameterized matmuls in BERT self attention (huggingface#9)

* Utils and auxillary changes

update Zoo stub loading for SparseZoo 1.1 refactor (huggingface#54)

add flag to signal NM integration is active (huggingface#32)

Add recipe_name to file names

* Fix errors introduced in manual cherry-pick upgrade

Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>
jameshennessytempus pushed a commit to jameshennessytempus/transformers that referenced this issue Jun 1, 2023
ocavue pushed a commit to ocavue/transformers that referenced this issue Sep 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants