Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the hyper-parameters for LDAM loss #2

Closed
hyungwonchoi opened this issue Sep 17, 2019 · 3 comments
Closed

Questions about the hyper-parameters for LDAM loss #2

hyungwonchoi opened this issue Sep 17, 2019 · 3 comments

Comments

@hyungwonchoi
Copy link

hyungwonchoi commented Sep 17, 2019

It was a very interesting paper to read :)

I have some questions regarding the hyper-parameters for LDAM loss.

  1. What is the values of C, the hyper-parameter to be tuned (according to the paper)? Is it (max_m / np.max(m_list)) introduced in below?
    https://github.com/kaidic/LDAM-DRW/blob/master/losses.py#L28

  2. Is s=30 in LDAM loss also a hyper-parameter to be tuned? I could not find any explanation in the paper. Did I miss something?

  3. What were the tendency of these hyper-parameters when training? How do these hyper-parameter selections are related to the imbalance level (or different datasets)? The found parameters work for other datasets in the paper (Tiny ImageNet, iNaturalist)?

Thanks.

@kaidic
Copy link
Owner

kaidic commented Sep 19, 2019

Thanks for your interest in our paper. I'll briefly answer based on my understanding.

  1. Right.

  2. Nope you don't have to. s is pretty robust here. You could try 10 it works pretty much the same. It's pretty common to introduce a scalar if the input of cross entropy is are normalized.

  3. I think max_m is a hype-rparameter that requires tuning. Basically we want the max_m to be as large as possible while it doesn't incur under-fitting. I find 0.5 works universally well for small datasets. As the iNaturalist, 0.3 suffices and it seems that 0.5 is too large.

@chuong98
Copy link

Hi, it seems to me that parameter s is the temperature for Softmax. Did you try with s=1 by any chance?
Thanks.

@kaidic
Copy link
Owner

kaidic commented Sep 19, 2019

s = 1 will incur under-fitting. The reason behind it is that even when the logits looks like [1, -1, -1, ...], after softmax the true class's probability can not get close to 0.99.

@kaidic kaidic closed this as completed Sep 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants