Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wav2Vec2 diversity loss problem #3673

Closed
PeiyuChen1005 opened this issue Jul 1, 2021 · 6 comments
Closed

Wav2Vec2 diversity loss problem #3673

PeiyuChen1005 opened this issue Jul 1, 2021 · 6 comments
Assignees
Labels

Comments

@PeiyuChen1005
Copy link

❓ Questions and Help

Before asking:

  1. search the issues.
    issue wav2vec 2.0: L2 penalty on features #3315

What is your question?

I've found that the diversity loss weight of wav2vec2.0 is 0.1, and a question about why the weight of diversity loss is this low was proposed in issue #3315 , but no answer is provided. So my first question is the same: Why such low a weight is assigned to diversity loss.
Also, I've tried to give different weights to this loss term ,such as 0.5,1.0, and I found weird loss curves like this:
weight= 0.1
div1
weight= 0.5
div5
weight= 1.0
div10
all curves in one
all
The diversity loss always rise sharply in the first few epoches (and can't go down to the original loss). Is this a normal phenomenon or something wrong occured? Is it because that my training data is too small? Is there anything that can help me understand the codebook? How can I set the number of codebooks correctly?

What's your environment?

  • fairseq Version (e.g., 1.0 or master): master
  • PyTorch Version (e.g., 1.0) 1.8.1
  • OS (e.g., Linux): Linux(Ubuntu)
  • How you installed fairseq (pip, source): source
  • Python version: 3.8.8
  • CUDA/cuDNN version: 11.1
  • GPU models and configuration: NVIDIA-A100

Any help is appreciated~~~

@alexeib
Copy link
Contributor

alexeib commented Jul 1, 2021

diversity loss value of 0.1 is enough to ensure that a large portion of the codebook is used. you can try other values and monitor code_perplexity to see what percentage of the codebook is used (max value is num latent groups * num latent vars). the actual loss value of diversity loss doesnt matter, it exists to ensure sufficient codebook use and to promote exploration in the early training phase

@PeiyuChen1005
Copy link
Author

PeiyuChen1005 commented Jul 1, 2021

diversity loss value of 0.1 is enough to ensure that a large portion of the codebook is used. you can try other values and monitor code_perplexity to see what percentage of the codebook is used (max value is num latent groups * num latent vars). the actual loss value of diversity loss doesnt matter, it exists to ensure sufficient codebook use and to promote exploration in the early training phase

So is that means it doesn't matter how large the diversity loss is? Just make sure most of the codebooks are used is ok(by monitor code_perplexity, is the percentage of codebooks used = code_perplexity/(num latent groups * num latent vars)? )?

@alexeib
Copy link
Contributor

alexeib commented Jul 1, 2021

diversity loss value of 0.1 is enough to ensure that a large portion of the codebook is used. you can try other values and monitor code_perplexity to see what percentage of the codebook is used (max value is num latent groups * num latent vars). the actual loss value of diversity loss doesnt matter, it exists to ensure sufficient codebook use and to promote exploration in the early training phase

So is that means it doesn't matter how large the diversity loss is? Just make sure most of the codebooks are used is ok(by monitor code_perplexity, is the percentage of codebooks used = code_perplexity/(num latent groups * num latent vars)? )?

yes. too high coefficient can also hurt the main objective

@PeiyuChen1005
Copy link
Author

PeiyuChen1005 commented Jul 2, 2021

Thank you alexeib!!!!! I get it~ @alexeib

@PeiyuChen1005
Copy link
Author

PeiyuChen1005 commented Jul 2, 2021

I found that my 'code_perplexity' is quite low(diversity weight 0.1, code_perplexity ~100, diversity weight 0.5, code_perplexity ~300). Can you please tell me what is the value of the code_perplexity or codebook percentage in the normal range when num of total codebooks is 640? @alexeib

@PeiyuChen1005 PeiyuChen1005 reopened this Jul 2, 2021
@alexeib
Copy link
Contributor

alexeib commented Jul 27, 2021

anything that is not super low will generally do ok. e.g. 100-500 range

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants