Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why using delta_RS #2

Closed
liyunsheng13 opened this issue Aug 2, 2021 · 4 comments
Closed

why using delta_RS #2

liyunsheng13 opened this issue Aug 2, 2021 · 4 comments

Comments

@liyunsheng13
Copy link

Hello

In the file ranking_losses.py, I am a little confused about the following codes.

if delta_RS > 0:
fg_relations=torch.clamp(fg_relations/(2delta_RS)+0.5,min=0,max=1)
bg_relations=torch.clamp(bg_relations/(2
delta_RS)+0.5,min=0,max=1)
else:
fg_relations = (fg_relations >= 0).float()
bg_relations = (bg_relations >= 0).float()

I do not quite understand why you used delta_RS and what is the bias number 0.5 used for. It seems to me that when you add 0.5, it will make some negative logits to be positive. Does it make sense?

Thanks

@kemaloksuz
Copy link
Owner

kemaloksuz commented Aug 2, 2021

Hi,

Thanks for your interest in our work!

Basically, $\delta_{RS}$ aims to ensure a margin among examples for better generalization. To provide more insight based on your example, when we add $0.50$ to a negative example with a logit of $s$, then the network should assign a logit to a positive at least $0.50+s$ in order to have zero error originating from that pair (i.e. the network should ensure that the score of this positive is at least larger than the negative with logit of $s+0.50$ to minimize error.). For the same purpose, it has been adopted by AP Loss and aLRP Loss as well. In addition to these works, with RS Loss, we also want to enforce a margin among positives with $\delta_{RS}$.

Regarding the code snippet you shared; "if" part implements the smoothed step function (please see Fig. 4 and Eq. (21) in https://arxiv.org/pdf/2008.07294.pdf) and the "else" part basically aims to prevent the division-by-zero error when $\delta_{RS}=0$ (i.e. $\delta_{RS}$ is a multiplier of the denominator). Also, note that the "else" part corresponds to step function without any smoothing (dashed curve in Fig. 4 of https://arxiv.org/pdf/2008.07294.pdf).

Hope it is more clear now.

@liyunsheng13
Copy link
Author

Thanks for your prompt reponse. I have one more question. I think the logit is the input the sigmoid funtion but I wonder if it makes any difference if we use the output of the sigmoid function as input to the ranking loss.

@kemaloksuz
Copy link
Owner

Yes, you are correct about what I mean by "logit".

In fact, I have also thought about using the normalized logits after sigmoid (I will call them "probabilities" considering that they are between [0,1]) but never tried. That's why, I can only make some "rational" guesses:

  • In that case, I think you have to, at least, tune $\delta_{RS}$ (i.e. $0.50$ seems to be a large margin for the [0,1] range of probabilities).
  • There should not be any difference between ranking logits or probabilities. However, I am not perfectly sure because when we use sigmoid then the range of the raw values is squeezed within [0,1]. Maybe, this limited range can have an impact.

So, by validating $\delta_{RS}$, I think it can be worth to try. If you do so, please let me know as well :)

@liyunsheng13
Copy link
Author

Thanks a lot. I will definitely let you know after I get some results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants