Assertion `srcIndex < srcSelectDimSize` failed. #46

SparkJiao · 2018-11-20T12:50:41Z

Sorry to bother you
I recently have used your extract_features.py to extract features of some data set but failed. The error information is as follows:
/opt/conda/conda-bld/pytorch_1532584813488/work/aten/src/THC/THCTensorIndex.cu:362: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [11,0,0], thread: [95,0,0] Assertion srcIndex < srcSelectDimSizefailed. Traceback (most recent call last): File "examples/extract_features.py", line 405, in <module> main() File "examples/extract_features.py", line 375, in main all_encoder_layers, _ = model(input_ids, token_type_ids=None, attention_mask=input_mask) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 610, in forward output_all_encoded_layers=output_all_encoded_layers) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 328, in forward hidden_states = layer_module(hidden_states, attention_mask) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 313, in forward attention_output = self.attention(hidden_states, attention_mask) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 273, in forward self_output = self.self(input_tensor, attention_mask) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 224, in forward mixed_query_layer = self.query(hidden_states) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 55, in forward return F.linear(input, self.weight, self.bias) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/nn/functional.py", line 1026, in linear output = input.matmul(weight.t()) RuntimeError: cublas runtime error : resource allocation failed at /opt/conda/conda-bld/pytorch_1532584813488/work/aten/src/THC/THCGeneral.cpp:333
It seems that the index_select function in the models crashed. I read my own data from json files and construct examples from them. I set the batch-size equals 1 and I modified the max_seq_length to the max_length of the input sentences.
Thanks for your help!

The text was updated successfully, but these errors were encountered:

thomwolf · 2018-11-20T13:06:07Z

Your log is very hard to read. Can you format it cleanly?

SparkJiao · 2018-11-20T13:16:32Z

I'm so sorry
The first error log is as follows:

/opt/conda/conda-bld/pytorch_1532584813488/work/aten/src/THC/THCTensorIndex.cu:362: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [11,0,0], thread: [95,0,0] Assertion \`srcIndex < srcSelectDimSize\` failed.

And then the Traceback finally points to line 1026 torch/nn/functional.py in linear:
output = input.matmul(weight.t())
It seems that somewhere crashed while using torch.index_select() , do you think it is because my sentence is too long? I will check other aspects, thank you very much

thomwolf · 2018-11-20T13:19:35Z

It seems like a failed resource allocation.
Maybe you don't have enough RAM or your GPU is too small ?

SparkJiao · 2018-11-20T13:25:13Z

My GPU has 12400 MB and I think that's enough, may be I should use 'yield' to input the data one by one? I will load less data to try, thanks u a lot!

thomwolf · 2018-11-21T09:02:51Z

Ok feel free to re-open the issue if you still have troubles.

add mat-coqa runner with multitask + adversarial training support

zyfedward · 2020-04-29T16:57:20Z

Hi @SparkJiao

I met the same issue here, how did you resolve this?

nv-quan · 2020-06-03T08:48:33Z

I have the same issue, did you resolve this? @zyfedward @SparkJiao

LysandreJik · 2020-06-03T17:45:52Z

@nv-quan, do you mind opening a new issue with the template so that we may help?

SparkJiao · 2020-06-03T23:25:15Z

I have forgot how to reproduce the problem but the index_select error usually happened due to wrong index. You can use a smaller batch size and run the script on CPU to check the full traceback since the traceback while using GPU is delayed.

* Update trainer and model flows to accommodate sparseml Disable FP16 on QAT start (huggingface#12) * Override LRScheduler when using LRModifiers * Disable FP16 on QAT start * keep wrapped scaler object for training after disabling Using QATMatMul in DistilBERT model class (huggingface#41) Removed double quantization of output of context layer. (huggingface#45) Fix DataParallel validation forward signatures (huggingface#47) * Fix: DataParallel validation forward signatures * Update: generalize forward_fn selection Best model after epoch (huggingface#46) fix sclaer check for non fp16 mode in trainer (huggingface#38) Mobilebert QAT (huggingface#55) * Remove duplicate quantization of vocabulary. enable a QATWrapper for non-parameterized matmuls in BERT self attention (huggingface#9) * Utils and auxillary changes update Zoo stub loading for SparseZoo 1.1 refactor (huggingface#54) add flag to signal NM integration is active (huggingface#32) Add recipe_name to file names * Fix errors introduced in manual cherry-pick upgrade Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>

ffaa

Smart execution providers (Merges huggingface#35 into main)

thomwolf closed this as completed Nov 21, 2018

maeotaku mentioned this issue May 23, 2019

bert->onnx ->caffe2 weird error #633

Closed

stevezheng23 added a commit to stevezheng23/transformers that referenced this issue Mar 24, 2020

Merge pull request huggingface#46 from stevezheng23/dev/zheng/coqa

93a2707

add mat-coqa runner with multitask + adversarial training support

yanqiangmiffy mentioned this issue Feb 22, 2021

输入两个句子导致索引越界 Ethan-yt/guwenbert#11

Closed

jameshennessytempus pushed a commit to jameshennessytempus/transformers that referenced this issue Jun 1, 2023

Merge pull request huggingface#46 from huggingface/main

cc04637

ffaa

lwmlyy mentioned this issue Aug 15, 2023

add util for ram efficient loading of model when using fsdp #25107

Merged

1 task

ocavue pushed a commit to ocavue/transformers that referenced this issue Sep 13, 2023

Merge pull request huggingface#46 from xenova/smart-execution-providers

b936cb8

Smart execution providers (Merges huggingface#35 into main)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assertion `srcIndex < srcSelectDimSize` failed. #46

Assertion `srcIndex < srcSelectDimSize` failed. #46

SparkJiao commented Nov 20, 2018

thomwolf commented Nov 20, 2018

SparkJiao commented Nov 20, 2018 •

edited by thomwolf

thomwolf commented Nov 20, 2018

SparkJiao commented Nov 20, 2018

thomwolf commented Nov 21, 2018

zyfedward commented Apr 29, 2020

nv-quan commented Jun 3, 2020

LysandreJik commented Jun 3, 2020

SparkJiao commented Jun 3, 2020

Assertion srcIndex < srcSelectDimSize failed. #46

Assertion srcIndex < srcSelectDimSize failed. #46

Comments

SparkJiao commented Nov 20, 2018

thomwolf commented Nov 20, 2018

SparkJiao commented Nov 20, 2018 • edited by thomwolf

thomwolf commented Nov 20, 2018

SparkJiao commented Nov 20, 2018

thomwolf commented Nov 21, 2018

zyfedward commented Apr 29, 2020

nv-quan commented Jun 3, 2020

LysandreJik commented Jun 3, 2020

SparkJiao commented Jun 3, 2020

Assertion `srcIndex < srcSelectDimSize` failed. #46

Assertion `srcIndex < srcSelectDimSize` failed. #46

SparkJiao commented Nov 20, 2018 •

edited by thomwolf