distributed launch raise Error #7160

xixiaoyao · 2020-09-16T06:18:04Z

Environment info

transformers version:
Platform: linux
Python version: 3.7
PyTorch version (GPU?): 1.6.5
Tensorflow version (GPU?):
Using GPU in script?: yes
Using distributed or parallel set-up in script?: distributed on a single node with 4 gpu cards

Who can help

Longformer/Reformer: @patrickvonplaten

-->

Information

Model I am using (LongformerForQuestionAnswering):

The problem arises when using:

[ Y ] the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

[ Y ] an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

run run_squad.py with longformer and enable fp16 and gradient_checkpointing.

09/16/2020 06:04:57 - INFO - __main__ -   ***** Running training *****
09/16/2020 06:04:57 - INFO - __main__ -     Num examples = 800
09/16/2020 06:04:57 - INFO - __main__ -     Num Epochs = 2
09/16/2020 06:04:57 - INFO - __main__ -     Instantaneous batch size per GPU = 6
09/16/2020 06:04:57 - INFO - __main__ -     Total train batch size (w. parallel, distributed & accumulation) = 24
09/16/2020 06:04:57 - INFO - __main__ -     Gradient Accumulation steps = 1
09/16/2020 06:04:57 - INFO - __main__ -     Total optimization steps = 68
09/16/2020 06:04:57 - INFO - __main__ -     Starting fine-tuning.
Epoch:   0%|                                                                                                             | 0/2 [00:00<?, ?it/s/opt/conda/lib/python3.7/site-packages/transformers/modeling_longformer.py:72: UserWarning: This overload of nonzero is deprecated::00<?, ?it/s]
	nonzero()
Consider using one of the following signatures instead:
	nonzero(*, bool as_tuple) (Triggered internally at  /opt/conda/conda-bld/pytorch_1595629403081/work/torch/csrc/utils/python_arg_parser.cpp:766.)
  sep_token_indices = (input_ids == sep_token_id).nonzero()
/opt/conda/lib/python3.7/site-packages/transformers/modeling_longformer.py:72: UserWarning: This overload of nonzero is deprecated:
	nonzero()
Consider using one of the following signatures instead:
	nonzero(*, bool as_tuple) (Triggered internally at  /opt/conda/conda-bld/pytorch_1595629403081/work/torch/csrc/utils/python_arg_parser.cpp:766.)
  sep_token_indices = (input_ids == sep_token_id).nonzero()
/opt/conda/lib/python3.7/site-packages/transformers/modeling_longformer.py:72: UserWarning: This overload of nonzero is deprecated:
	nonzero()
Consider using one of the following signatures instead:
	nonzero(*, bool as_tuple) (Triggered internally at  /opt/conda/conda-bld/pytorch_1595629403081/work/torch/csrc/utils/python_arg_parser.cpp:766.)
  sep_token_indices = (input_ids == sep_token_id).nonzero()
/opt/conda/lib/python3.7/site-packages/transformers/modeling_longformer.py:72: UserWarning: This overload of nonzero is deprecated:
	nonzero()
Consider using one of the following signatures instead:
	nonzero(*, bool as_tuple) (Triggered internally at  /opt/conda/conda-bld/pytorch_1595629403081/work/torch/csrc/utils/python_arg_parser.cpp:766.)
  sep_token_indices = (input_ids == sep_token_id).nonzero()
Traceback (most recent call last):
  File "run_squad.py", line 839, in <module>
    main()
  File "run_squad.py", line 780, in main
    global_step, tr_loss = train(args, train_dataset, model, tokenizer)
  File "run_squad.py", line 213, in train
    scaled_loss.backward()
  File "/opt/conda/lib/python3.7/site-packages/torch/tensor.py", line 185, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/opt/conda/lib/python3.7/site-packages/torch/autograd/__init__.py", line 127, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases yet.
Exception raised from mark_variable_ready at /opt/conda/conda-bld/pytorch_1595629403081/work/torch/csrc/distributed/c10d/reducer.cpp:453 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x4d (0x7f7ace01177d in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10d::Reducer::mark_variable_ready(c10d::Reducer::VariableIndex) + 0x4cd (0x7f7b07e1239d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #2: c10d::Reducer::autograd_hook(c10d::Reducer::VariableIndex) + 0xeb (0x7f7b07e12bdb in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #3: <unknown function> + 0xabdd16 (0x7f7b07e12d16 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0xac4dc6 (0x7f7b07e19dc6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptr<torch::autograd::ReadyQueue> const&) + 0x4dd (0x7f7b0355693d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #6: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&) + 0x451 (0x7f7b03558401 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #7: torch::autograd::Engine::execute_with_graph_task(std::shared_ptr<torch::autograd::GraphTask> const&, std::shared_ptr<torch::autograd::Node>) + 0x25c (0x7f7b035559fc in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #8: torch::autograd::python::PythonEngine::execute_with_graph_task(std::shared_ptr<torch::autograd::GraphTask> const&, std::shared_ptr<torch::autograd::Node>) + 0x3c (0x7f7b0787fdcc in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #9: torch::autograd::Engine::execute(std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&, std::vector<at::Tensor, std::allocator<at::Tensor> > const&, bool, bool, std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&) + 0x803 (0x7f7b03554e53 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #10: torch::autograd::python::PythonEngine::execute(std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&, std::vector<at::Tensor, std::allocator<at::Tensor> > const&, bool, bool, std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&) + 0x4e (0x7f7b0787fbbe in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #11: THPEngine_run_backward(THPEngine*, _object*, _object*) + 0xa29 (0x7f7b07880889 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #12: _PyMethodDef_RawFastCallKeywords + 0x306 (0x7f7b2b5e2d36 in /opt/conda/bin/python)
frame #13: _PyCFunction_FastCallKeywords + 0x21 (0x7f7b2b5e2db1 in /opt/conda/bin/python)
frame #14: _PyEval_EvalFrameDefault + 0x52b5 (0x7f7b2b64ea85 in /opt/conda/bin/python)
frame #15: _PyEval_EvalCodeWithName + 0x2f9 (0x7f7b2b5922b9 in /opt/conda/bin/python)
frame #16: _PyFunction_FastCallKeywords + 0x325 (0x7f7b2b5e2435 in /opt/conda/bin/python)
frame #17: _PyEval_EvalFrameDefault + 0x4a59 (0x7f7b2b64e229 in /opt/conda/bin/python)
frame #18: _PyEval_EvalCodeWithName + 0x2f9 (0x7f7b2b5922b9 in /opt/conda/bin/python)
frame #19: _PyFunction_FastCallDict + 0x1d5 (0x7f7b2b5933e5 in /opt/conda/bin/python)
frame #20: _PyEval_EvalFrameDefault + 0x1d4a (0x7f7b2b64b51a in /opt/conda/bin/python)
frame #21: _PyEval_EvalCodeWithName + 0x2f9 (0x7f7b2b5922b9 in /opt/conda/bin/python)
frame #22: _PyFunction_FastCallDict + 0x1d5 (0x7f7b2b5933e5 in /opt/conda/bin/python)
frame #23: _PyObject_Call_Prepend + 0x63 (0x7f7b2b5b1b93 in /opt/conda/bin/python)
frame #24: PyObject_Call + 0x6e (0x7f7b2b5a495e in /opt/conda/bin/python)
frame #25: torch::autograd::PyNode::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) + 0x183 (0x7f7b07888033 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #26: <unknown function> + 0x30d1017 (0x7f7b0355c017 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #27: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptr<torch::autograd::ReadyQueue> const&) + 0x1400 (0x7f7b03557860 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #28: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&) + 0x451 (0x7f7b03558401 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #29: torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x89 (0x7f7b03550579 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #30: torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x4a (0x7f7b0787f99a in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #31: <unknown function> + 0xc819d (0x7f7b0a3b619d in /opt/conda/lib/python3.7/site-packages/torch/lib/../../../.././libstdc++.so.6)
frame #32: <unknown function> + 0x76db (0x7f7b2b03b6db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #33: clone + 0x3f (0x7f7b2ad6488f in /lib/x86_64-linux-gnu/libc.so.6)

Iteration:   0%|                                                                                                        | 0/34 [00:00<?, ?it/s]
Epoch:   0%|                                                                                                             | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "run_squad.py", line 839, in <module>
    main()
  File "run_squad.py", line 780, in main
    global_step, tr_loss = train(args, train_dataset, model, tokenizer)
  File "run_squad.py", line 213, in train
    scaled_loss.backward()
  File "/opt/conda/lib/python3.7/site-packages/torch/tensor.py", line 185, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/opt/conda/lib/python3.7/site-packages/torch/autograd/__init__.py", line 127, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases yet.
Exception raised from mark_variable_ready at /opt/conda/conda-bld/pytorch_1595629403081/work/torch/csrc/distributed/c10d/reducer.cpp:453 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x4d (0x7f78f667577d in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10d::Reducer::mark_variable_ready(c10d::Reducer::VariableIndex) + 0x4cd (0x7f793047639d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #2: c10d::Reducer::autograd_hook(c10d::Reducer::VariableIndex) + 0xeb (0x7f7930476bdb in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #3: <unknown function> + 0xabdd16 (0x7f7930476d16 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0xac4dc6 (0x7f793047ddc6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptr<torch::autograd::ReadyQueue> const&) + 0x4dd (0x7f792bbba93d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #6: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&) + 0x451 (0x7f792bbbc401 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #7: torch::autograd::Engine::execute_with_graph_task(std::shared_ptr<torch::autograd::GraphTask> const&, std::shared_ptr<torch::autograd::Node>) + 0x25c (0x7f792bbb99fc in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #8: torch::autograd::python::PythonEngine::execute_with_graph_task(std::shared_ptr<torch::autograd::GraphTask> const&, std::shared_ptr<torch::autograd::Node>) + 0x3c (0x7f792fee3dcc in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #9: torch::autograd::Engine::execute(std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&, std::vector<at::Tensor, std::allocator<at::Tensor> > const&, bool, bool, std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&) + 0x803 (0x7f792bbb8e53 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #10: torch::autograd::python::PythonEngine::execute(std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&, std::vector<at::Tensor, std::allocator<at::Tensor> > const&, bool, bool, std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&) + 0x4e (0x7f792fee3bbe in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #11: THPEngine_run_backward(THPEngine*, _object*, _object*) + 0xa29 (0x7f792fee4889 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #12: _PyMethodDef_RawFastCallKeywords + 0x306 (0x7f7953c46d36 in /opt/conda/bin/python)
frame #13: _PyCFunction_FastCallKeywords + 0x21 (0x7f7953c46db1 in /opt/conda/bin/python)
frame #14: _PyEval_EvalFrameDefault + 0x52b5 (0x7f7953cb2a85 in /opt/conda/bin/python)
frame #15: _PyEval_EvalCodeWithName + 0x2f9 (0x7f7953bf62b9 in /opt/conda/bin/python)
frame #16: _PyFunction_FastCallKeywords + 0x325 (0x7f7953c46435 in /opt/conda/bin/python)
frame #17: _PyEval_EvalFrameDefault + 0x4a59 (0x7f7953cb2229 in /opt/conda/bin/python)
frame #18: _PyEval_EvalCodeWithName + 0x2f9 (0x7f7953bf62b9 in /opt/conda/bin/python)
frame #19: _PyFunction_FastCallDict + 0x1d5 (0x7f7953bf73e5 in /opt/conda/bin/python)
frame #20: _PyEval_EvalFrameDefault + 0x1d4a (0x7f7953caf51a in /opt/conda/bin/python)
frame #21: _PyEval_EvalCodeWithName + 0x2f9 (0x7f7953bf62b9 in /opt/conda/bin/python)
frame #22: _PyFunction_FastCallDict + 0x1d5 (0x7f7953bf73e5 in /opt/conda/bin/python)
frame #23: _PyObject_Call_Prepend + 0x63 (0x7f7953c15b93 in /opt/conda/bin/python)
frame #24: PyObject_Call + 0x6e (0x7f7953c0895e in /opt/conda/bin/python)
frame #25: torch::autograd::PyNode::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) + 0x183 (0x7f792feec033 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #26: <unknown function> + 0x30d1017 (0x7f792bbc0017 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #27: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptr<torch::autograd::ReadyQueue> const&) + 0x1400 (0x7f792bbbb860 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #28: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&) + 0x451 (0x7f792bbbc401 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #29: torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x89 (0x7f792bbb4579 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #30: torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x4a (0x7f792fee399a in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #31: <unknown function> + 0xc819d (0x7f7932a1a19d in /opt/conda/lib/python3.7/site-packages/torch/lib/../../../.././libstdc++.so.6)
frame #32: <unknown function> + 0x76db (0x7f795369f6db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #33: clone + 0x3f (0x7f79533c888f in /lib/x86_64-linux-gnu/libc.so.6)
Traceback (most recent call last):

  File "run_squad.py", line 839, in <module>
    main()
  File "run_squad.py", line 780, in main
    global_step, tr_loss = train(args, train_dataset, model, tokenizer)
  File "run_squad.py", line 213, in train
    scaled_loss.backward()
  File "/opt/conda/lib/python3.7/site-packages/torch/tensor.py", line 185, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/opt/conda/lib/python3.7/site-packages/torch/autograd/__init__.py", line 127, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases yet.
Exception raised from mark_variable_ready at /opt/conda/conda-bld/pytorch_1595629403081/work/torch/csrc/distributed/c10d/reducer.cpp:453 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x4d (0x7f88d10da77d in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10d::Reducer::mark_variable_ready(c10d::Reducer::VariableIndex) + 0x4cd (0x7f890aedb39d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #2: c10d::Reducer::autograd_hook(c10d::Reducer::VariableIndex) + 0xeb (0x7f890aedbbdb in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #3: <unknown function> + 0xabdd16 (0x7f890aedbd16 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0xac4dc6 (0x7f890aee2dc6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptr<torch::autograd::ReadyQueue> const&) + 0x4dd (0x7f890661f93d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #6: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&) + 0x451 (0x7f8906621401 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #7: torch::autograd::Engine::execute_with_graph_task(std::shared_ptr<torch::autograd::GraphTask> const&, std::shared_ptr<torch::autograd::Node>) + 0x25c (0x7f890661e9fc in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #8: torch::autograd::python::PythonEngine::execute_with_graph_task(std::shared_ptr<torch::autograd::GraphTask> const&, std::shared_ptr<torch::autograd::Node>) + 0x3c (0x7f890a948dcc in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #9: torch::autograd::Engine::execute(std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&, std::vector<at::Tensor, std::allocator<at::Tensor> > const&, bool, bool, std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&) + 0x803 (0x7f890661de53 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #10: torch::autograd::python::PythonEngine::execute(std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&, std::vector<at::Tensor, std::allocator<at::Tensor> > const&, bool, bool, std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&) + 0x4e (0x7f890a948bbe in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #11: THPEngine_run_backward(THPEngine*, _object*, _object*) + 0xa29 (0x7f890a949889 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #12: _PyMethodDef_RawFastCallKeywords + 0x306 (0x7f892e6abd36 in /opt/conda/bin/python)
frame #13: _PyCFunction_FastCallKeywords + 0x21 (0x7f892e6abdb1 in /opt/conda/bin/python)
frame #14: _PyEval_EvalFrameDefault + 0x52b5 (0x7f892e717a85 in /opt/conda/bin/python)
frame #15: _PyEval_EvalCodeWithName + 0x2f9 (0x7f892e65b2b9 in /opt/conda/bin/python)
frame #16: _PyFunction_FastCallKeywords + 0x325 (0x7f892e6ab435 in /opt/conda/bin/python)
frame #17: _PyEval_EvalFrameDefault + 0x4a59 (0x7f892e717229 in /opt/conda/bin/python)
frame #18: _PyEval_EvalCodeWithName + 0x2f9 (0x7f892e65b2b9 in /opt/conda/bin/python)
frame #19: _PyFunction_FastCallDict + 0x1d5 (0x7f892e65c3e5 in /opt/conda/bin/python)
frame #20: _PyEval_EvalFrameDefault + 0x1d4a (0x7f892e71451a in /opt/conda/bin/python)
frame #21: _PyEval_EvalCodeWithName + 0x2f9 (0x7f892e65b2b9 in /opt/conda/bin/python)
frame #22: _PyFunction_FastCallDict + 0x1d5 (0x7f892e65c3e5 in /opt/conda/bin/python)
frame #23: _PyObject_Call_Prepend + 0x63 (0x7f892e67ab93 in /opt/conda/bin/python)
frame #24: PyObject_Call + 0x6e (0x7f892e66d95e in /opt/conda/bin/python)
frame #25: torch::autograd::PyNode::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) + 0x183 (0x7f890a951033 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #26: <unknown function> + 0x30d1017 (0x7f8906625017 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #27: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptr<torch::autograd::ReadyQueue> const&) + 0x1400 (0x7f8906620860 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #28: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&) + 0x451 (0x7f8906621401 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #29: torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x89 (0x7f8906619579 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #30: torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x4a (0x7f890a94899a in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #31: <unknown function> + 0xc819d (0x7f890d47f19d in /opt/conda/lib/python3.7/site-packages/torch/lib/../../../.././libstdc++.so.6)
frame #32: <unknown function> + 0x76db (0x7f892e1046db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #33: clone + 0x3f (0x7f892de2d88f in /lib/x86_64-linux-gnu/libc.so.6)

Traceback (most recent call last):
  File "run_squad.py", line 839, in <module>
    main()
  File "run_squad.py", line 780, in main
    global_step, tr_loss = train(args, train_dataset, model, tokenizer)
  File "run_squad.py", line 213, in train
    scaled_loss.backward()
  File "/opt/conda/lib/python3.7/site-packages/torch/tensor.py", line 185, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/opt/conda/lib/python3.7/site-packages/torch/autograd/__init__.py", line 127, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases yet.
Exception raised from mark_variable_ready at /opt/conda/conda-bld/pytorch_1595629403081/work/torch/csrc/distributed/c10d/reducer.cpp:453 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x4d (0x7fa6fe83577d in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10d::Reducer::mark_variable_ready(c10d::Reducer::VariableIndex) + 0x4cd (0x7fa73863639d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #2: c10d::Reducer::autograd_hook(c10d::Reducer::VariableIndex) + 0xeb (0x7fa738636bdb in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #3: <unknown function> + 0xabdd16 (0x7fa738636d16 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0xac4dc6 (0x7fa73863ddc6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptr<torch::autograd::ReadyQueue> const&) + 0x4dd (0x7fa733d7a93d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #6: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&) + 0x451 (0x7fa733d7c401 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #7: torch::autograd::Engine::execute_with_graph_task(std::shared_ptr<torch::autograd::GraphTask> const&, std::shared_ptr<torch::autograd::Node>) + 0x25c (0x7fa733d799fc in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #8: torch::autograd::python::PythonEngine::execute_with_graph_task(std::shared_ptr<torch::autograd::GraphTask> const&, std::shared_ptr<torch::autograd::Node>) + 0x3c (0x7fa7380a3dcc in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #9: torch::autograd::Engine::execute(std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&, std::vector<at::Tensor, std::allocator<at::Tensor> > const&, bool, bool, std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&) + 0x803 (0x7fa733d78e53 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #10: torch::autograd::python::PythonEngine::execute(std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&, std::vector<at::Tensor, std::allocator<at::Tensor> > const&, bool, bool, std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&) + 0x4e (0x7fa7380a3bbe in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #11: THPEngine_run_backward(THPEngine*, _object*, _object*) + 0xa29 (0x7fa7380a4889 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #12: _PyMethodDef_RawFastCallKeywords + 0x306 (0x7fa75be06d36 in /opt/conda/bin/python)
frame #13: _PyCFunction_FastCallKeywords + 0x21 (0x7fa75be06db1 in /opt/conda/bin/python)
frame #14: _PyEval_EvalFrameDefault + 0x52b5 (0x7fa75be72a85 in /opt/conda/bin/python)
frame #15: _PyEval_EvalCodeWithName + 0x2f9 (0x7fa75bdb62b9 in /opt/conda/bin/python)
frame #16: _PyFunction_FastCallKeywords + 0x325 (0x7fa75be06435 in /opt/conda/bin/python)
frame #17: _PyEval_EvalFrameDefault + 0x4a59 (0x7fa75be72229 in /opt/conda/bin/python)
frame #18: _PyEval_EvalCodeWithName + 0x2f9 (0x7fa75bdb62b9 in /opt/conda/bin/python)
frame #19: _PyFunction_FastCallDict + 0x1d5 (0x7fa75bdb73e5 in /opt/conda/bin/python)
frame #20: _PyEval_EvalFrameDefault + 0x1d4a (0x7fa75be6f51a in /opt/conda/bin/python)
frame #21: _PyEval_EvalCodeWithName + 0x2f9 (0x7fa75bdb62b9 in /opt/conda/bin/python)
frame #22: _PyFunction_FastCallDict + 0x1d5 (0x7fa75bdb73e5 in /opt/conda/bin/python)
frame #23: _PyObject_Call_Prepend + 0x63 (0x7fa75bdd5b93 in /opt/conda/bin/python)
frame #24: PyObject_Call + 0x6e (0x7fa75bdc895e in /opt/conda/bin/python)
frame #25: torch::autograd::PyNode::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) + 0x183 (0x7fa7380ac033 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #26: <unknown function> + 0x30d1017 (0x7fa733d80017 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #27: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptr<torch::autograd::ReadyQueue> const&) + 0x1400 (0x7fa733d7b860 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #28: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&) + 0x451 (0x7fa733d7c401 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #29: torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x89 (0x7fa733d74579 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #30: torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x4a (0x7fa7380a399a in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #31: <unknown function> + 0xc819d (0x7fa73abda19d in /opt/conda/lib/python3.7/site-packages/torch/lib/../../../.././libstdc++.so.6)
frame #32: <unknown function> + 0x76db (0x7fa75b85f6db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #33: clone + 0x3f (0x7fa75b58888f in /lib/x86_64-linux-gnu/libc.so.6)

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/opt/conda/lib/python3.7/site-packages/torch/distributed/launch.py", line 261, in <module>
    main()
  File "/opt/conda/lib/python3.7/site-packages/torch/distributed/launch.py", line 257, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python', '-u', 'run_squad.py', '--local_rank=3', '--model_type', 'longformer', '--do_train', '--model_name_or_path', 'longformer-base-len4K', '--do_eval', '--do_lower_case', '--threads', '30', '--fp16', '--eval_all_checkpoints', '--save_steps', '2500', '--train_file', './data/marco_v1.0/train.json.demo', '--predict_file', './data/marco_v1.0/dev.json', '--per_gpu_train_batch_size', '6', '--learning_rate', '3e-5', '--num_train_epochs', '2', '--max_seq_length', '2048', '--doc_stride', '1024', '--output_dir', 'output/marco_pyramidlocalatt']' returned non-zero exit status 1.

Expected behavior

The text was updated successfully, but these errors were encountered:

patrickvonplaten · 2020-09-20T19:03:17Z

Hey @xixiaoyao,

Could you please copy paste the command you used to run squad so that I can be 100% sure we are running the same command? How did you enable gradient_checkpointing ? Did you change the run_squad.py script?

Would be great if you can copy-paste a runnable code snippet here :-)

mitchelldehaven · 2020-10-22T20:43:06Z

Getting the same issue when using Reformer with pytorch-lightning's distributeddataparallel, although not using one of the official training scripts.

trias702 · 2020-12-10T01:25:16Z

I am also getting this exact same error with Reformer, but only when I wrap it with DDP and then train across multiple GPUs on the same box. I do not get this error with Longformer. If I don't use DDP with Reformer, then it works fine. Am doing vanilla AR language model training using a custom script. But my script works fine when used on a single GPU with no DDP. The error seems to indicate that there's something about Reformer which DDP does not yet support:

"RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the forward function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple checkpoint functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases yet."

Am using transformers 3.5.1.

yuanenming · 2020-12-15T08:10:44Z

I am also getting this exact same error with Reformer, but only when I wrap it with DDP and then train across multiple GPUs on the same box. I do not get this error with Longformer. If I don't use DDP with Reformer, then it works fine. Am doing vanilla AR language model training using a custom script. But my script works fine when used on a single GPU with no DDP. The error seems to indicate that there's something about Reformer which DDP does not yet support:

"RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the forward function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple checkpoint functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases yet."

Am using transformers 3.5.1.

same with @trias702

github-actions · 2021-03-06T00:17:21Z

This issue has been automatically marked as stale and been closed because it has not had recent activity. Thank you for your contributions.

If you think this still needs to be addressed please comment on this thread.

ShaneTian · 2022-03-28T08:33:38Z

I am also getting this exact same error with Reformer, but only when I wrap it with DDP and then train across multiple GPUs on the same box. I do not get this error with Longformer. If I don't use DDP with Reformer, then it works fine. Am doing vanilla AR language model training using a custom script. But my script works fine when used on a single GPU with no DDP. The error seems to indicate that there's something about Reformer which DDP does not yet support:
"RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the forward function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple checkpoint functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases yet."
Am using transformers 3.5.1.

same with @trias702

same with @trias702 @yuanenming
any updates? @xixiaoyao

Parskatt mentioned this issue Nov 2, 2020

Regarding DDP and reversible networks lucidrains/performer-pytorch#14

Closed

github-actions bot added the wontfix label Mar 6, 2021

github-actions bot closed this as completed Mar 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

distributed launch raise Error #7160

distributed launch raise Error #7160

xixiaoyao commented Sep 16, 2020 •

edited

patrickvonplaten commented Sep 20, 2020

mitchelldehaven commented Oct 22, 2020

trias702 commented Dec 10, 2020

yuanenming commented Dec 15, 2020

github-actions bot commented Mar 6, 2021

ShaneTian commented Mar 28, 2022

distributed launch raise Error #7160

distributed launch raise Error #7160

Comments

xixiaoyao commented Sep 16, 2020 • edited

Environment info

Who can help

Information

To reproduce

Expected behavior

patrickvonplaten commented Sep 20, 2020

mitchelldehaven commented Oct 22, 2020

trias702 commented Dec 10, 2020

yuanenming commented Dec 15, 2020

github-actions bot commented Mar 6, 2021

ShaneTian commented Mar 28, 2022

xixiaoyao commented Sep 16, 2020 •

edited