Pretrain on Wav2vec2 getting parameters did not receive grad for rank0. #17116

Slyne · 2022-05-06T15:57:07Z

System Info

Following this example and running with base model seem ok.

But when turning into the large model, the model name is './' so I guess this is a typo. Therefore I tried facebook/wav2vec2-large-xlsr-53 and facebook/wav2vec2-large-lv60, both getting

Parameters which did not receive grad for rank 0: wav2vec2.encoder.layers.16.final_layer_norm.bias, wav2vec2.encoder.layers.16.final_layer_norm.weight, wav2vec2.encoder.layers.16.feed_forward.output_dense.bias, wav2vec2.encoder.layers.16.feed_forward.output_dense.weight, wav2vec2.encoder.layers.16.feed_forward.intermediate_dense.bias, wav2vec2.encoder.layers.16.feed_forward.intermediate_dense.weight, wav2vec2.encoder.layers.16.layer_norm.bias, wav2vec2.encoder.layers.16.layer_norm.weight, wav2vec2.encoder.layers.16.attention.out_proj.bias, wav2vec2.encoder.layers.16.attention.out_proj.weight, wav2vec2.encoder.layers.16.attention.q_proj.bias, wav2vec2.encoder.layers.16.attention.q_proj.weight, wav2vec2.encoder.layers.16.attention.v_proj.bias, wav2vec2.encoder.layers.16.attention.v_proj.weight, wav2vec2.encoder.layers.16.attention.k_proj.bias, wav2vec2.encoder.layers.16.attention.k_proj.weight, wav2vec2.encoder.layers.15.final_layer_norm.bias, wav2vec2.encoder.layers.15.final_layer_norm.weight, wav2vec2.encoder.layers.15.feed_forward.output_dense.bias, wav2vec2.encoder.layers.15.feed_forward.output_dense.weight, wav2vec2.encoder.layers.15.feed_forward.intermediate_dense.bias, wav2vec2.encoder.layers.15.feed_forward.intermediate_dense.weight, wav2vec2.encoder.layers.15.layer_norm.bias, wav2vec2.encoder.layers.15.layer_norm.weight, wav2vec2.encoder.layers.15.attention.out_proj.bias, wav2vec2.encoder.layers.15.attention.out_proj.weight, wav2vec2.encoder.layers.15.attention.q_proj.bias, wav2vec2.encoder.layers.15.attention.q_proj.weight, wav2vec2.encoder.layers.15.attention.v_proj.bias, wav2vec2.encoder.layers.15.attention.v_proj.weight, wav2vec2.encoder.layers.15.attention.k_proj.bias, wav2vec2.encoder.layers.15.attention.k_proj.weight, wav2vec2.encoder.layers.14.final_layer_norm.bias, wav2vec2.encoder.layers.14.final_layer_norm.weight, wav2vec2.encoder.layers.14.feed_forward.output_dense.bias, wav2vec2.encoder.layers.14.feed_forward.output_dense.weight, wav2vec2.encoder.layers.14.feed_forward.intermediate_dense.bias, wav2vec2.encoder.layers.14.feed_forward.intermediate_dense.weight, wav2vec2.encoder.layers.14.layer_norm.bias, wav2vec2.encoder.layers.14.layer_norm.weight, wav2vec2.encoder.layers.14.attention.out_proj.bias, wav2vec2.encoder.layers.14.attention.out_proj.weight, wav2vec2.encoder.layers.14.attention.q_proj.bias, wav2vec2.encoder.layers.14.attention.q_proj.weight, wav2vec2.encoder.layers.14.attention.v_proj.bias, wav2vec2.encoder.layers.14.attention.v_proj.weight, wav2vec2.encoder.layers.14.attention.k_proj.bias, wav2vec2.encoder.layers.14.attention.k_proj.weight
Parameter indices which did not receive grad for rank 0: 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309

Who can help?

@patrickvonplaten

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Follow https://github.com/huggingface/transformers/tree/main/examples/pytorch/speech-pretraining#large

Expected behavior

Expect no such errors!

The text was updated successfully, but these errors were encountered:

patrickvonplaten · 2022-05-09T16:51:14Z

Hey @Slyne,

Could you please make sure to set the parameter layerdrop to 0.0 in distributed settings? This: https://huggingface.co/facebook/wav2vec2-large-xlsr-53/blob/main/config.json#L57 needs to be set to 0.0 before pretraining

Slyne · 2022-05-10T03:53:49Z

@patrickvonplaten Thanks! It works now. Just found this parameter seemed to do some regularization on model structure. Will huggingface support this trick ?
Do you mind fixing that typo and this parameter and close this issue ?

Thanks!!!

patrickvonplaten · 2022-05-10T11:19:35Z

Hey @Slyne,

It's quite difficult to get layerdrop working in distributed settings. Regarding the typo, feel free to open a PR to show what could be fixed if you want :-)

Thanks!

github-actions · 2022-06-06T15:01:58Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Slyne added the bug label May 6, 2022

github-actions bot closed this as completed Jun 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pretrain on Wav2vec2 getting parameters did not receive grad for rank0. #17116

Pretrain on Wav2vec2 getting parameters did not receive grad for rank0. #17116

Slyne commented May 6, 2022 •

edited

Loading

patrickvonplaten commented May 9, 2022

Slyne commented May 10, 2022

patrickvonplaten commented May 10, 2022

github-actions bot commented Jun 6, 2022

Pretrain on Wav2vec2 getting parameters did not receive grad for rank0. #17116

Pretrain on Wav2vec2 getting parameters did not receive grad for rank0. #17116

Comments

Slyne commented May 6, 2022 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

patrickvonplaten commented May 9, 2022

Slyne commented May 10, 2022

patrickvonplaten commented May 10, 2022

github-actions bot commented Jun 6, 2022

Slyne commented May 6, 2022 •

edited

Loading