Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError when testing using CoNLL2003 dataset #2

Open
f1amigo opened this issue Feb 9, 2023 · 4 comments
Open

RuntimeError when testing using CoNLL2003 dataset #2

f1amigo opened this issue Feb 9, 2023 · 4 comments

Comments

@f1amigo
Copy link

f1amigo commented Feb 9, 2023

Hi, I've followed the instructions on the README to test the model on the CoNLL2003 dataset. However, I ran into a RuntimeError when I tried to run python run_ner.py conf/conll03.json. Any help in resolving this issue would be appreciated.

The following are the logs I got during the error:
Traceback (most recent call last): File "/home/chenweiyi/Binder/run_ner.py", line 742, in <module> main() File "/home/chenweiyi/Binder/run_ner.py", line 683, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/home/chenweiyi/.conda/envs/binder/lib/python3.9/site-packages/transformers/trainer.py", line 1501, in train return inner_training_loop( File "/home/chenweiyi/.conda/envs/binder/lib/python3.9/site-packages/transformers/trainer.py", line 1749, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/chenweiyi/.conda/envs/binder/lib/python3.9/site-packages/transformers/trainer.py", line 2508, in training_step loss = self.compute_loss(model, inputs) File "/home/chenweiyi/.conda/envs/binder/lib/python3.9/site-packages/transformers/trainer.py", line 2540, in compute_loss outputs = model(**inputs) File "/home/chenweiyi/.conda/envs/binder/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "/home/chenweiyi/.conda/envs/binder/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 171, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/chenweiyi/.conda/envs/binder/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 181, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/chenweiyi/.conda/envs/binder/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 89, in parallel_apply output.reraise() File "/home/chenweiyi/.conda/envs/binder/lib/python3.9/site-packages/torch/_utils.py", line 543, in reraise raise exception RuntimeError: Caught RuntimeError in replica 0 on device 0. Original Traceback (most recent call last): File "/home/chenweiyi/.conda/envs/binder/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker output = module(*input, **kwargs) File "/home/chenweiyi/.conda/envs/binder/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "/home/chenweiyi/Binder/src/model.py", line 239, in forward start_negative_mask = ner["start_negative_mask"].view(batch_size * num_types, seq_length) RuntimeError: shape '[16, 256]' is invalid for input of size 8192

@mukurgupta
Copy link

@f1amigo I'm also facing a similar error. Did you find a fix?

@f1amigo
Copy link
Author

f1amigo commented Mar 27, 2023

@mukurgupta unfortunately not. I suspect that it may be an issue with the size of the GPU as it is able to run after I switched to an RTX 3090, which is much larger than the previous GPU I was using.

@andrew-umjangyun
Copy link

If you are using multiple gpu, setting it to 1 gpu will solve the following error.

@mukurgupta
Copy link

Yes, it runs fine on 1 GPU. But I couldn't find any fix for running it on multiple GPUs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants