Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pretrain on Wav2vec2 getting parameters did not receive grad for rank0. #17116

Closed
2 of 4 tasks
Slyne opened this issue May 6, 2022 · 4 comments
Closed
2 of 4 tasks

Pretrain on Wav2vec2 getting parameters did not receive grad for rank0. #17116

Slyne opened this issue May 6, 2022 · 4 comments
Labels

Comments

@Slyne
Copy link

Slyne commented May 6, 2022

System Info

Following this example and running with base model seem ok.

But when turning into the large model, the model name is './' so I guess this is a typo. Therefore I tried facebook/wav2vec2-large-xlsr-53 and facebook/wav2vec2-large-lv60, both getting

Parameters which did not receive grad for rank 0: wav2vec2.encoder.layers.16.final_layer_norm.bias, wav2vec2.encoder.layers.16.final_layer_norm.weight, wav2vec2.encoder.layers.16.feed_forward.output_dense.bias, wav2vec2.encoder.layers.16.feed_forward.output_dense.weight, wav2vec2.encoder.layers.16.feed_forward.intermediate_dense.bias, wav2vec2.encoder.layers.16.feed_forward.intermediate_dense.weight, wav2vec2.encoder.layers.16.layer_norm.bias, wav2vec2.encoder.layers.16.layer_norm.weight, wav2vec2.encoder.layers.16.attention.out_proj.bias, wav2vec2.encoder.layers.16.attention.out_proj.weight, wav2vec2.encoder.layers.16.attention.q_proj.bias, wav2vec2.encoder.layers.16.attention.q_proj.weight, wav2vec2.encoder.layers.16.attention.v_proj.bias, wav2vec2.encoder.layers.16.attention.v_proj.weight, wav2vec2.encoder.layers.16.attention.k_proj.bias, wav2vec2.encoder.layers.16.attention.k_proj.weight, wav2vec2.encoder.layers.15.final_layer_norm.bias, wav2vec2.encoder.layers.15.final_layer_norm.weight, wav2vec2.encoder.layers.15.feed_forward.output_dense.bias, wav2vec2.encoder.layers.15.feed_forward.output_dense.weight, wav2vec2.encoder.layers.15.feed_forward.intermediate_dense.bias, wav2vec2.encoder.layers.15.feed_forward.intermediate_dense.weight, wav2vec2.encoder.layers.15.layer_norm.bias, wav2vec2.encoder.layers.15.layer_norm.weight, wav2vec2.encoder.layers.15.attention.out_proj.bias, wav2vec2.encoder.layers.15.attention.out_proj.weight, wav2vec2.encoder.layers.15.attention.q_proj.bias, wav2vec2.encoder.layers.15.attention.q_proj.weight, wav2vec2.encoder.layers.15.attention.v_proj.bias, wav2vec2.encoder.layers.15.attention.v_proj.weight, wav2vec2.encoder.layers.15.attention.k_proj.bias, wav2vec2.encoder.layers.15.attention.k_proj.weight, wav2vec2.encoder.layers.14.final_layer_norm.bias, wav2vec2.encoder.layers.14.final_layer_norm.weight, wav2vec2.encoder.layers.14.feed_forward.output_dense.bias, wav2vec2.encoder.layers.14.feed_forward.output_dense.weight, wav2vec2.encoder.layers.14.feed_forward.intermediate_dense.bias, wav2vec2.encoder.layers.14.feed_forward.intermediate_dense.weight, wav2vec2.encoder.layers.14.layer_norm.bias, wav2vec2.encoder.layers.14.layer_norm.weight, wav2vec2.encoder.layers.14.attention.out_proj.bias, wav2vec2.encoder.layers.14.attention.out_proj.weight, wav2vec2.encoder.layers.14.attention.q_proj.bias, wav2vec2.encoder.layers.14.attention.q_proj.weight, wav2vec2.encoder.layers.14.attention.v_proj.bias, wav2vec2.encoder.layers.14.attention.v_proj.weight, wav2vec2.encoder.layers.14.attention.k_proj.bias, wav2vec2.encoder.layers.14.attention.k_proj.weight
Parameter indices which did not receive grad for rank 0: 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309

Who can help?

@patrickvonplaten

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Follow https://github.com/huggingface/transformers/tree/main/examples/pytorch/speech-pretraining#large

Expected behavior

Expect no such errors!
@Slyne Slyne added the bug label May 6, 2022
@patrickvonplaten
Copy link
Contributor

Hey @Slyne,

Could you please make sure to set the parameter layerdrop to 0.0 in distributed settings? This: https://huggingface.co/facebook/wav2vec2-large-xlsr-53/blob/main/config.json#L57 needs to be set to 0.0 before pretraining

@Slyne
Copy link
Author

Slyne commented May 10, 2022

@patrickvonplaten Thanks! It works now. Just found this parameter seemed to do some regularization on model structure. Will huggingface support this trick ?
Do you mind fixing that typo and this parameter and close this issue ?

Thanks!!!

@patrickvonplaten
Copy link
Contributor

Hey @Slyne,

It's quite difficult to get layerdrop working in distributed settings. Regarding the typo, feel free to open a PR to show what could be fixed if you want :-)

Thanks!

@github-actions
Copy link

github-actions bot commented Jun 6, 2022

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants