Parameter at index 195 has been marked as ready twice. #23018

skye95git · 2023-04-27T06:21:51Z

System Info

transformers version: 4.28.0
Platform: Linux-5.4.0-122-generic-x86_64-with-glibc2.31
Python version: 3.9.12
Huggingface_hub version: 0.13.4
Safetensors version: not installed
PyTorch version (GPU?): 1.13.1+cu117 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: yes
Using distributed or parallel set-up in script?: yes

Who can help?

@ArthurZucker @younesbelkada

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

I retrained Roberta on my own corpus with the MLM task. I set model.gradient_checkpointing_enable() to save memory.

model = RobertaModel.from_pretrained(model_name_or_path,config=config)
model.gradient_checkpointing_enable()  # Activate gradient checkpointing
model = Model(model,config,tokenizer,args)

My model:

class Model(nn.Module):   
    def __init__(self, model,config,tokenizer,args):
        super(Model, self).__init__()
        self.encoder = model
        self.config = config
        self.tokenizer = tokenizer
        self.args = args
        self.lm_head = nn.Linear(config.hidden_size,config.vocab_size)
        self.lm_head.weight = self.encoder.embeddings.word_embeddings.weight
        self.register_buffer(
        "bias", torch.tril(torch.ones((args.block_size, args.block_size), dtype=torch.uint8)).view(1, args.block_size, args.block_size)
        )

   def forward(self, mlm_ids): 
...

There is an error:

RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parame
ter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes. o
r try to use _set_static_graph() as a workaround if this module graph does not change during training loop.2) Reused parameters in multipl
e reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model, it would result 
in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple
 times. DDP does not support such use cases in default. You can try to use _set_static_graph() as a workaround if your module graph does n
ot change over iterations.                                                                                                                
Parameter at index 195 with name encoder.encoder.layer.11.output.LayerNorm.weight has been marked as ready twice. This means that multiple
 autograd engine  hooks have fired for this particular parameter during this iteration.

If I get rid of this line of code：model.gradient_checkpointing_enable(), it is ok. Why?

Expected behavior

I want to pre-train with gradient_checkpointing.

The text was updated successfully, but these errors were encountered:

sgugger · 2023-04-27T12:29:28Z

There is little we can do to help without seeing a full reproducer.

github-actions · 2023-05-27T15:01:56Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

CrissBrian · 2023-06-07T21:01:39Z

Got exact same bug when gradient_checkpointing_enable()

mirix · 2023-10-13T09:51:26Z

Are you using DDP?

I am using DDP on two GPUs:

python -m torch.distributed.run --nproc_per_node 2 run_audio_classification.py

(run because launch fails)

All the rest being equal facebook/wav2vec2-base works if gradient_checkpointing is set to True, however, the large model crashes unless the option it is either set to False or removed.

gradient_checkpointing works for both models if using a single GPU, so the issue seems to be DDP-related.

This seems to come from:

https://github.com/pytorch/pytorch/blob/main/torch/csrc/distributed/c10d/reducer.cpp

mirix · 2023-10-13T09:58:12Z

The problem may be that when the trainer is invoked from torchrun is setting find_unused_parameters to True for all devices, when, apparently, it should only do it for the first one:

https://discuss.pytorch.org/t/finding-the-cause-of-runtimeerror-expected-to-mark-a-variable-ready-only-once/124428/3

And the reason why the base model works is because that option can be set to False. However, for the large model it has to be True.

The solution would be changing the way in which that argument is parsed.

infinitylogesh · 2023-11-03T12:43:31Z

Thank you @mirix , Making ddp_find_unused_parameters=False in Trainer solved this issue for me.

younesbelkada · 2023-11-03T12:45:35Z

if you use enable_gradient_checkpointing() you can now overcome this issue by passing gradient_checkpointing_kwargs={"use_reentrant": False}

model.enable_gradient_checkpointing(gradient_checkpointing_kwargs={"use_reentrant": False})

github-actions bot closed this as completed Jun 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parameter at index 195 has been marked as ready twice. #23018

Parameter at index 195 has been marked as ready twice. #23018

skye95git commented Apr 27, 2023

sgugger commented Apr 27, 2023

github-actions bot commented May 27, 2023

CrissBrian commented Jun 7, 2023

mirix commented Oct 13, 2023 •

edited

mirix commented Oct 13, 2023 •

edited

infinitylogesh commented Nov 3, 2023 •

edited

younesbelkada commented Nov 3, 2023

Parameter at index 195 has been marked as ready twice. #23018

Parameter at index 195 has been marked as ready twice. #23018

Comments

skye95git commented Apr 27, 2023

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

sgugger commented Apr 27, 2023

github-actions bot commented May 27, 2023

CrissBrian commented Jun 7, 2023

mirix commented Oct 13, 2023 • edited

mirix commented Oct 13, 2023 • edited

infinitylogesh commented Nov 3, 2023 • edited

younesbelkada commented Nov 3, 2023

mirix commented Oct 13, 2023 •

edited

mirix commented Oct 13, 2023 •

edited

infinitylogesh commented Nov 3, 2023 •

edited