-
Notifications
You must be signed in to change notification settings - Fork 870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding accelerate to transformer
mdoels
#404
Comments
We have many examples of this! Check out (most) any of the folders in transformers/examples/pytorch and look for scripts ending with no_trainer. They all use accelerate Such as this QA example here: https://github.com/huggingface/transformers/blob/main/examples/pytorch/question-answering/run_qa_no_trainer.py |
Hmmm I think I miscommunicated. I would like to add support for accelerate to a model on the hub (specifically, GPT-NeoX and GPT-J) that doesn't currently have it. When I try to run the models with accelerate it says
|
Hello, the error is due to |
Thanks for the help! I'm now getting 05/29/2022 06:02:17 - INFO - __main__ - ***** Running training *****
05/29/2022 06:02:17 - INFO - __main__ - Num examples = 73997
05/29/2022 06:02:17 - INFO - __main__ - Num Epochs = 3
05/29/2022 06:02:17 - INFO - __main__ - Instantaneous batch size per device = 1
05/29/2022 06:02:17 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 8
05/29/2022 06:02:17 - INFO - __main__ - Gradient Accumulation steps = 1
05/29/2022 06:02:17 - INFO - __main__ - Total optimization steps = 27750
0%| | 0/27750 [00:00<?, ?it/s]Traceback (most recent call last):
File "examples/pytorch/language-modeling/run_clm_no_trainer.py", line 648, in <module>
Traceback (most recent call last):
File "examples/pytorch/language-modeling/run_clm_no_trainer.py", line 648, in <module>
Traceback (most recent call last):
Traceback (most recent call last):
File "examples/pytorch/language-modeling/run_clm_no_trainer.py", line 648, in <module>
File "examples/pytorch/language-modeling/run_clm_no_trainer.py", line 648, in <module>
Traceback (most recent call last):
File "examples/pytorch/language-modeling/run_clm_no_trainer.py", line 648, in <module>
main()
File "examples/pytorch/language-modeling/run_clm_no_trainer.py", line 560, in main
main()
File "examples/pytorch/language-modeling/run_clm_no_trainer.py", line 560, in main
main()main()
File "examples/pytorch/language-modeling/run_clm_no_trainer.py", line 560, in main
File "examples/pytorch/language-modeling/run_clm_no_trainer.py", line 560, in main
outputs = model(**batch)
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
main()
File "examples/pytorch/language-modeling/run_clm_no_trainer.py", line 560, in main
outputs = model(**batch)
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
Traceback (most recent call last):
outputs = model(**batch)
File "examples/pytorch/language-modeling/run_clm_no_trainer.py", line 648, in <module>
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
outputs = model(**batch)
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
outputs = model(**batch)
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/mchorse/.local/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn
return forward_call(*input, **kwargs)
File "/home/mchorse/.local/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn
Traceback (most recent call last):
return func(*args, **kwargs) File "examples/pytorch/language-modeling/run_clm_no_trainer.py", line 648, in <module>
return forward_call(*input, **kwargs) File "/home/mchorse/.local/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1616, in forward
File "/home/mchorse/.local/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn
Traceback (most recent call last):
return forward_call(*input, **kwargs) File "examples/pytorch/language-modeling/run_clm_no_trainer.py", line 648, in <module>
main()return func(*args, **kwargs)
File "examples/pytorch/language-modeling/run_clm_no_trainer.py", line 560, in main
File "/home/mchorse/.local/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn
File "/home/mchorse/.local/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1616, in forward
return func(*args, **kwargs)return forward_call(*input, **kwargs)
File "/home/mchorse/.local/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1616, in forward
File "/home/mchorse/.local/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn
return func(*args, **kwargs)
File "/home/mchorse/.local/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1616, in forward
return func(*args, **kwargs)
File "/home/mchorse/.local/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1616, in forward
outputs = model(**batch)
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
loss = self.module(*inputs, **kwargs)
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
main()
File "examples/pytorch/language-modeling/run_clm_no_trainer.py", line 560, in main
loss = self.module(*inputs, **kwargs)
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
main()
File "examples/pytorch/language-modeling/run_clm_no_trainer.py", line 560, in main
return forward_call(*input, **kwargs)loss = self.module(*inputs, **kwargs)
loss = self.module(*inputs, **kwargs) File "/home/mchorse/.local/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
result = forward_call(*input, **kwargs)
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 597, in forward
outputs = model(**batch)
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
loss = self.module(*inputs, **kwargs)
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
result = forward_call(*input, **kwargs)return func(*args, **kwargs)
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 597, in forward
File "/home/mchorse/.local/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1616, in forward
outputs = model(**batch)
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
outputs = self.gpt_neox(
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
result = forward_call(*input, **kwargs)
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 597, in forward
return forward_call(*input, **kwargs)
outputs = self.gpt_neox(
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
File "/home/mchorse/.local/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn
result = forward_call(*input, **kwargs)result = forward_call(*input, **kwargs)
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 597, in forward
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 597, in forward
return func(*args, **kwargs)
File "/home/mchorse/.local/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1616, in forward
loss = self.module(*inputs, **kwargs)
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
result = forward_call(*input, **kwargs)
outputs = self.gpt_neox( File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 489, in forward
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
return forward_call(*input, **kwargs)
File "/home/mchorse/.local/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn
result = forward_call(*input, **kwargs)
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 489, in forward
outputs = self.gpt_neox(
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
outputs = layer(
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
outputs = self.gpt_neox(return func(*args, **kwargs)
File "/home/mchorse/.local/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1616, in forward
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
result = forward_call(*input, **kwargs)
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 597, in forward
outputs = layer(result = forward_call(*input, **kwargs)loss = self.module(*inputs, **kwargs)
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 489, in forward
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
result = forward_call(*input, **kwargs)
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 489, in forward
outputs = self.gpt_neox(
result = forward_call(*input, **kwargs) File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 297, in forward
outputs = layer(
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
result = forward_call(*input, **kwargs)result = forward_call(*input, **kwargs)
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 597, in forward
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 297, in forward
result = forward_call(*input, **kwargs)attention_layer_outputs = self.attention( outputs = layer(
loss = self.module(*inputs, **kwargs)
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 489, in forward
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
result = forward_call(*input, **kwargs)
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 489, in forward
attention_layer_outputs = self.attention(
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
result = forward_call(*input, **kwargs)
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 297, in forward
outputs = self.gpt_neox(
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
outputs = layer(
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
attention_layer_outputs = self.attention(result = forward_call(*input, **kwargs)
result = forward_call(*input, **kwargs)outputs = layer( File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 149, in forward
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 297, in forward
result = forward_call(*input, **kwargs)
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 597, in forward
result = forward_call(*input, **kwargs)
result = forward_call(*input, **kwargs) File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 149, in forward
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 489, in forward
attn_output, attn_weights = self._attn(query, key, value, attention_mask, head_mask)
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 209, in _attn
attention_layer_outputs = self.attention(
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
attn_output, attn_weights = self._attn(query, key, value, attention_mask, head_mask)
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 209, in _attn
raise RuntimeError()outputs = layer(result = forward_call(*input, **kwargs)
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
outputs = self.gpt_neox( File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 297, in forward
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
RuntimeErrorresult = forward_call(*input, **kwargs)
raise RuntimeError() File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 149, in forward
result = forward_call(*input, **kwargs)RuntimeError
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 297, in forward
result = forward_call(*input, **kwargs)
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 149, in forward
attn_output, attn_weights = self._attn(query, key, value, attention_mask, head_mask)
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 209, in _attn
attention_layer_outputs = self.attention(attention_layer_outputs = self.attention(
attn_output, attn_weights = self._attn(query, key, value, attention_mask, head_mask)
result = forward_call(*input, **kwargs) File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 209, in _attn
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
raise RuntimeError()
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 297, in forward
RuntimeError
raise RuntimeError()
result = forward_call(*input, **kwargs)
RuntimeError File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 489, in forward
attention_layer_outputs = self.attention(
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
result = forward_call(*input, **kwargs)
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 149, in forward
attn_output, attn_weights = self._attn(query, key, value, attention_mask, head_mask)
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 209, in _attn
outputs = layer(
result = forward_call(*input, **kwargs) File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
result = forward_call(*input, **kwargs)
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 149, in forward
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 149, in forward
raise RuntimeError()
RuntimeError
attn_output, attn_weights = self._attn(query, key, value, attention_mask, head_mask)
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 209, in _attn
attn_output, attn_weights = self._attn(query, key, value, attention_mask, head_mask)
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 209, in _attn
raise RuntimeError()
RuntimeErrorresult = forward_call(*input, **kwargs)
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 297, in forward
raise RuntimeError()
attention_layer_outputs = self.attention(
File "/home/mchorse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
RuntimeError
result = forward_call(*input, **kwargs)
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 149, in forward
attn_output, attn_weights = self._attn(query, key, value, attention_mask, head_mask)
File "/home/mchorse/test/transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py", line 209, in _attn
raise RuntimeError()
RuntimeError
0%| | 0/27750 [00:01<?, ?it/s]
[2022-05-29 06:02:23,480] [INFO] [launch.py:178:sigkill_handler] Killing subprocess 282026
[2022-05-29 06:02:23,480] [INFO] [launch.py:178:sigkill_handler] Killing subprocess 282027
[2022-05-29 06:02:23,480] [INFO] [launch.py:178:sigkill_handler] Killing subprocess 282028
[2022-05-29 06:02:23,480] [INFO] [launch.py:178:sigkill_handler] Killing subprocess 282029
[2022-05-29 06:02:23,480] [INFO] [launch.py:178:sigkill_handler] Killing subprocess 282030
[2022-05-29 06:02:23,480] [INFO] [launch.py:178:sigkill_handler] Killing subprocess 282031
[2022-05-29 06:02:23,480] [INFO] [launch.py:178:sigkill_handler] Killing subprocess 282032
[2022-05-29 06:02:23,480] [INFO] [launch.py:178:sigkill_handler] Killing subprocess 282077
[2022-05-29 06:02:23,480] [ERROR] [launch.py:184:sigkill_handler] ['/usr/bin/python3', '-u', 'examples/pytorch/language-modeling/run_clm_no_trainer.py', '--model_name_or_path', 'EleutherAI/gpt-neox-20b', '--dataset_name', 'wikitext_tl39'] exits with return code = 1
Traceback (most recent call last):
File "/home/mchorse/.local/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/home/mchorse/.local/lib/python3.8/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/home/mchorse/.local/lib/python3.8/site-packages/accelerate/commands/launch.py", line 524, in launch_command
deepspeed_launcher(args)
File "/home/mchorse/.local/lib/python3.8/site-packages/accelerate/commands/launch.py", line 332, in deepspeed_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['deepspeed', '--no_local_rank', '--num_gpus', '8', 'examples/pytorch/language-modeling/run_clm_no_trainer.py', '--model_name_or_path', 'EleutherAI/gpt-neox-20b', '--dataset_name', 'wikitext_tl39']' returned non-zero exit status 1. But I don't think that has anything to do with Accelerate. |
Note that for the first issue, you just need to implement the |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Is there a guide to assign accelerate support to models that are already implemented in the
transformers
library?The text was updated successfully, but these errors were encountered: