train_reloss.py is not runnable #2

lwmlyy · 2022-11-25T03:52:59Z

Hi,
Thanks for sharing your awesome work. I notice there are many typos and un-existed parameters in train_reloss.py, such as the code from line 33-40. 'i' is not defined, and 'logits, targets' from the corresponding batch are not used in the following code.

hunto · 2022-11-25T07:05:21Z

Hi @lwmlyy,

Thanks for your attention. Your commented bugs have been fixed through this commit (7b21f71).

lwmlyy · 2022-11-25T07:08:46Z

Thanks for your reply.
Also, in line 55, 'spearmanr' should be replaced with 'spearman_diff’？

hunto · 2022-11-25T07:14:04Z

Yes. This was fixed through 7a23ccc.

lwmlyy · 2022-11-29T10:48:08Z

In line 36 of the latest version, targets are obtained by taking a max function upon logits on dimension 1(the batch dimension rather than the token dimension, -1 ?), which will also ignore the input targets from the batch. I wonder why this is conducted.

hunto · 2022-11-29T11:47:21Z

The full logits data in TensorDataset is with shape [L, B, C]. When we iterate through the dataset (line 27), the logits_batch is with shape [N, B, C], where N is the batch size in dataloader. In this way, we get a batch of logits data, and each logits (line 33) is with shape [B, C] and is used to compute the accuracy on B samples. We use these data to get N loss values and N accuracies, then the surrogate loss is optimized using the rank correlations between loss values and accuracies.

To compute the accuracy, the straightforward way is to use the original targets in dataset (this could be done by commenting lines 36~38). Nevertheless, we only uses the trained model to predict the logits, which means that the diversity of accuracy is limited (e.g., almost all accuracies are larger than 90% in CIFAR-10). Therefore, the surrogate loss trained in this way would not be used at the early period of training.

In order to increase the diversity of accuracy, we generate pseudo targets with a randomly sampled accuracy (line 37). In line 36, the targets (shape [B, 1]) is obtained by using torch.max on the last classification dim (1) of logits ([B, C]), and the targets at this time would achieve 100% accuracy with the logits. Then we randomly modify the content of targets in line 38, to obtain the randomly sampled accuracy value.

Hope this could help you understand our code more clearly :)

lwmlyy · 2022-11-29T12:52:09Z

Thanks for the detailed explanation. In the paper, you mention the training data for the Reloss is generated by GR with probability p and GM with 1-p. However, there is no mention in the code. I wonder how this is implemented.

hunto · 2022-11-29T13:37:02Z

Actually we did not use random data generator in the classification loss (p=0), and we obtain the trained surrogate loss using the same training method in train_reloss.py. This is because we use the logits on ImageNet to train the surrogate loss and the data is diverse enough. For some dataset like CIFAR-10, which is easy to converge, you can randomly generate a probability (from 0. to 1.) on the target class to replace the original value in logits.

The random generator is only used on scene text recognition task for comparisons to previous work LS-ED [1], and you can find the details of random generator on this task in the paper.

[1] Learning surrogates via deep embedding. In European Conference on Computer Vision, pp. 205–221. Springer, 2020.

lwmlyy · 2022-11-29T13:44:14Z

It means if we do not want to use the random generator, we should comment the lines 36~38, is this correct? Also, logits are the predictions from a model and the targets are the ground-truth labels corresponding to the logits, right?

hunto · 2022-11-29T14:48:27Z

It means if we do not want to use the random generator, we should comment the lines 36~38, is this correct? Also, logits are the predictions from a model and the targets are the ground-truth labels corresponding to the logits, right?

Yes, you can comment lines 36~38 to use the original predictions and labels without any pseudo data generated, but this may have a potential risk of performing worse if your data distribution is not diverse enough.

lwmlyy · 2022-11-30T02:04:44Z

What is the best option for natural language generation tasks including machine reading comprehension and machine translation?

hunto · 2022-12-02T08:09:00Z

For these two NLP tasks, we do not involve any random generation and directly use the pure outputs and labels generated by the network and dataset.

lwmlyy · 2022-12-05T02:01:27Z

thanks for share the information.

lwmlyy closed this as completed Dec 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

train_reloss.py is not runnable #2

train_reloss.py is not runnable #2

lwmlyy commented Nov 25, 2022

hunto commented Nov 25, 2022

lwmlyy commented Nov 25, 2022

hunto commented Nov 25, 2022

lwmlyy commented Nov 29, 2022

hunto commented Nov 29, 2022

lwmlyy commented Nov 29, 2022

hunto commented Nov 29, 2022

lwmlyy commented Nov 29, 2022

hunto commented Nov 29, 2022

lwmlyy commented Nov 30, 2022

hunto commented Dec 2, 2022

lwmlyy commented Dec 5, 2022

train_reloss.py is not runnable #2

train_reloss.py is not runnable #2

Comments

lwmlyy commented Nov 25, 2022

hunto commented Nov 25, 2022

lwmlyy commented Nov 25, 2022

hunto commented Nov 25, 2022

lwmlyy commented Nov 29, 2022

hunto commented Nov 29, 2022

lwmlyy commented Nov 29, 2022

hunto commented Nov 29, 2022

lwmlyy commented Nov 29, 2022

hunto commented Nov 29, 2022

lwmlyy commented Nov 30, 2022

hunto commented Dec 2, 2022

lwmlyy commented Dec 5, 2022