RuntimeError when testing using CoNLL2003 dataset #2

f1amigo · 2023-02-09T08:03:24Z

Hi, I've followed the instructions on the README to test the model on the CoNLL2003 dataset. However, I ran into a RuntimeError when I tried to run python run_ner.py conf/conll03.json. Any help in resolving this issue would be appreciated.

The following are the logs I got during the error:
Traceback (most recent call last): File "/home/chenweiyi/Binder/run_ner.py", line 742, in <module> main() File "/home/chenweiyi/Binder/run_ner.py", line 683, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/home/chenweiyi/.conda/envs/binder/lib/python3.9/site-packages/transformers/trainer.py", line 1501, in train return inner_training_loop( File "/home/chenweiyi/.conda/envs/binder/lib/python3.9/site-packages/transformers/trainer.py", line 1749, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/chenweiyi/.conda/envs/binder/lib/python3.9/site-packages/transformers/trainer.py", line 2508, in training_step loss = self.compute_loss(model, inputs) File "/home/chenweiyi/.conda/envs/binder/lib/python3.9/site-packages/transformers/trainer.py", line 2540, in compute_loss outputs = model(**inputs) File "/home/chenweiyi/.conda/envs/binder/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "/home/chenweiyi/.conda/envs/binder/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 171, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/chenweiyi/.conda/envs/binder/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 181, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/chenweiyi/.conda/envs/binder/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 89, in parallel_apply output.reraise() File "/home/chenweiyi/.conda/envs/binder/lib/python3.9/site-packages/torch/_utils.py", line 543, in reraise raise exception RuntimeError: Caught RuntimeError in replica 0 on device 0. Original Traceback (most recent call last): File "/home/chenweiyi/.conda/envs/binder/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker output = module(*input, **kwargs) File "/home/chenweiyi/.conda/envs/binder/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "/home/chenweiyi/Binder/src/model.py", line 239, in forward start_negative_mask = ner["start_negative_mask"].view(batch_size * num_types, seq_length) RuntimeError: shape '[16, 256]' is invalid for input of size 8192

The text was updated successfully, but these errors were encountered:

mukurgupta · 2023-03-24T07:50:05Z

@f1amigo I'm also facing a similar error. Did you find a fix?

f1amigo · 2023-03-27T06:57:33Z

@mukurgupta unfortunately not. I suspect that it may be an issue with the size of the GPU as it is able to run after I switched to an RTX 3090, which is much larger than the previous GPU I was using.

andrew-umjangyun · 2023-07-03T07:06:09Z

If you are using multiple gpu, setting it to 1 gpu will solve the following error.

mukurgupta · 2023-07-03T08:48:13Z

Yes, it runs fine on 1 GPU. But I couldn't find any fix for running it on multiple GPUs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError when testing using CoNLL2003 dataset #2

RuntimeError when testing using CoNLL2003 dataset #2

f1amigo commented Feb 9, 2023

mukurgupta commented Mar 24, 2023

f1amigo commented Mar 27, 2023

andrew-umjangyun commented Jul 3, 2023

mukurgupta commented Jul 3, 2023

RuntimeError when testing using CoNLL2003 dataset #2

RuntimeError when testing using CoNLL2003 dataset #2

Comments

f1amigo commented Feb 9, 2023

mukurgupta commented Mar 24, 2023

f1amigo commented Mar 27, 2023

andrew-umjangyun commented Jul 3, 2023

mukurgupta commented Jul 3, 2023