why using delta_RS #2

liyunsheng13 · 2021-08-02T17:40:10Z

Hello

In the file ranking_losses.py, I am a little confused about the following codes.

if delta_RS > 0:
fg_relations=torch.clamp(fg_relations/(2delta_RS)+0.5,min=0,max=1)
bg_relations=torch.clamp(bg_relations/(2delta_RS)+0.5,min=0,max=1)
else:
fg_relations = (fg_relations >= 0).float()
bg_relations = (bg_relations >= 0).float()

I do not quite understand why you used delta_RS and what is the bias number 0.5 used for. It seems to me that when you add 0.5, it will make some negative logits to be positive. Does it make sense?

Thanks

kemaloksuz · 2021-08-02T17:59:31Z

Hi,

Thanks for your interest in our work!

Basically, $\delta_{RS}$ aims to ensure a margin among examples for better generalization. To provide more insight based on your example, when we add $0.50$ to a negative example with a logit of $s$, then the network should assign a logit to a positive at least $0.50+s$ in order to have zero error originating from that pair (i.e. the network should ensure that the score of this positive is at least larger than the negative with logit of $s+0.50$ to minimize error.). For the same purpose, it has been adopted by AP Loss and aLRP Loss as well. In addition to these works, with RS Loss, we also want to enforce a margin among positives with $\delta_{RS}$.

Regarding the code snippet you shared; "if" part implements the smoothed step function (please see Fig. 4 and Eq. (21) in https://arxiv.org/pdf/2008.07294.pdf) and the "else" part basically aims to prevent the division-by-zero error when $\delta_{RS}=0$ (i.e. $\delta_{RS}$ is a multiplier of the denominator). Also, note that the "else" part corresponds to step function without any smoothing (dashed curve in Fig. 4 of https://arxiv.org/pdf/2008.07294.pdf).

Hope it is more clear now.

liyunsheng13 · 2021-08-02T18:14:29Z

Thanks for your prompt reponse. I have one more question. I think the logit is the input the sigmoid funtion but I wonder if it makes any difference if we use the output of the sigmoid function as input to the ranking loss.

kemaloksuz · 2021-08-02T20:28:03Z

Yes, you are correct about what I mean by "logit".

In fact, I have also thought about using the normalized logits after sigmoid (I will call them "probabilities" considering that they are between [0,1]) but never tried. That's why, I can only make some "rational" guesses:

In that case, I think you have to, at least, tune $\delta_{RS}$ (i.e. $0.50$ seems to be a large margin for the [0,1] range of probabilities).
There should not be any difference between ranking logits or probabilities. However, I am not perfectly sure because when we use sigmoid then the range of the raw values is squeezed within [0,1]. Maybe, this limited range can have an impact.

So, by validating $\delta_{RS}$, I think it can be worth to try. If you do so, please let me know as well :)

liyunsheng13 · 2021-08-02T23:40:33Z

Thanks a lot. I will definitely let you know after I get some results.

kemaloksuz closed this as completed Aug 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why using delta_RS #2

why using delta_RS #2

liyunsheng13 commented Aug 2, 2021

kemaloksuz commented Aug 2, 2021 •

edited

liyunsheng13 commented Aug 2, 2021

kemaloksuz commented Aug 2, 2021

liyunsheng13 commented Aug 2, 2021

why using delta_RS #2

why using delta_RS #2

Comments

liyunsheng13 commented Aug 2, 2021

kemaloksuz commented Aug 2, 2021 • edited

liyunsheng13 commented Aug 2, 2021

kemaloksuz commented Aug 2, 2021

liyunsheng13 commented Aug 2, 2021

kemaloksuz commented Aug 2, 2021 •

edited