-
Notifications
You must be signed in to change notification settings - Fork 25.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assertion srcIndex < srcSelectDimSize
failed.
#46
Comments
Your log is very hard to read. Can you format it cleanly? |
I'm so sorry /opt/conda/conda-bld/pytorch_1532584813488/work/aten/src/THC/THCTensorIndex.cu:362: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [11,0,0], thread: [95,0,0] Assertion \`srcIndex < srcSelectDimSize\` failed. And then the Traceback finally points to line 1026 torch/nn/functional.py in linear: |
It seems like a failed resource allocation. |
My GPU has 12400 MB and I think that's enough, may be I should use 'yield' to input the data one by one? I will load less data to try, thanks u a lot! |
Ok feel free to re-open the issue if you still have troubles. |
add mat-coqa runner with multitask + adversarial training support
Hi @SparkJiao I met the same issue here, how did you resolve this? |
I have the same issue, did you resolve this? @zyfedward @SparkJiao |
@nv-quan, do you mind opening a new issue with the template so that we may help? |
I have forgot how to reproduce the problem but the |
* Update trainer and model flows to accommodate sparseml Disable FP16 on QAT start (huggingface#12) * Override LRScheduler when using LRModifiers * Disable FP16 on QAT start * keep wrapped scaler object for training after disabling Using QATMatMul in DistilBERT model class (huggingface#41) Removed double quantization of output of context layer. (huggingface#45) Fix DataParallel validation forward signatures (huggingface#47) * Fix: DataParallel validation forward signatures * Update: generalize forward_fn selection Best model after epoch (huggingface#46) fix sclaer check for non fp16 mode in trainer (huggingface#38) Mobilebert QAT (huggingface#55) * Remove duplicate quantization of vocabulary. enable a QATWrapper for non-parameterized matmuls in BERT self attention (huggingface#9) * Utils and auxillary changes update Zoo stub loading for SparseZoo 1.1 refactor (huggingface#54) add flag to signal NM integration is active (huggingface#32) Add recipe_name to file names * Fix errors introduced in manual cherry-pick upgrade Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>
Smart execution providers (Merges huggingface#35 into main)
Sorry to bother you
I recently have used your extract_features.py to extract features of some data set but failed. The error information is as follows:
/opt/conda/conda-bld/pytorch_1532584813488/work/aten/src/THC/THCTensorIndex.cu:362: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [11,0,0], thread: [95,0,0] Assertion
srcIndex < srcSelectDimSizefailed. Traceback (most recent call last): File "examples/extract_features.py", line 405, in <module> main() File "examples/extract_features.py", line 375, in main all_encoder_layers, _ = model(input_ids, token_type_ids=None, attention_mask=input_mask) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 610, in forward output_all_encoded_layers=output_all_encoded_layers) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 328, in forward hidden_states = layer_module(hidden_states, attention_mask) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 313, in forward attention_output = self.attention(hidden_states, attention_mask) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 273, in forward self_output = self.self(input_tensor, attention_mask) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 224, in forward mixed_query_layer = self.query(hidden_states) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 55, in forward return F.linear(input, self.weight, self.bias) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/nn/functional.py", line 1026, in linear output = input.matmul(weight.t()) RuntimeError: cublas runtime error : resource allocation failed at /opt/conda/conda-bld/pytorch_1532584813488/work/aten/src/THC/THCGeneral.cpp:333
It seems that the index_select function in the models crashed. I read my own data from json files and construct examples from them. I set the batch-size equals 1 and I modified the max_seq_length to the max_length of the input sentences.
Thanks for your help!
The text was updated successfully, but these errors were encountered: