Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train_reloss.py is not runnable #2

Closed
lwmlyy opened this issue Nov 25, 2022 · 12 comments
Closed

train_reloss.py is not runnable #2

lwmlyy opened this issue Nov 25, 2022 · 12 comments

Comments

@lwmlyy
Copy link

lwmlyy commented Nov 25, 2022

Hi,
Thanks for sharing your awesome work. I notice there are many typos and un-existed parameters in train_reloss.py, such as the code from line 33-40. 'i' is not defined, and 'logits, targets' from the corresponding batch are not used in the following code.

@hunto
Copy link
Owner

hunto commented Nov 25, 2022

Hi @lwmlyy,

Thanks for your attention. Your commented bugs have been fixed through this commit (7b21f71).

@lwmlyy
Copy link
Author

lwmlyy commented Nov 25, 2022

Thanks for your reply.
Also, in line 55, 'spearmanr' should be replaced with 'spearman_diff’?

@hunto
Copy link
Owner

hunto commented Nov 25, 2022

Yes. This was fixed through 7a23ccc.

@lwmlyy
Copy link
Author

lwmlyy commented Nov 29, 2022

In line 36 of the latest version, targets are obtained by taking a max function upon logits on dimension 1(the batch dimension rather than the token dimension, -1 ?), which will also ignore the input targets from the batch. I wonder why this is conducted.

@hunto
Copy link
Owner

hunto commented Nov 29, 2022

The full logits data in TensorDataset is with shape [L, B, C]. When we iterate through the dataset (line 27), the logits_batch is with shape [N, B, C], where N is the batch size in dataloader. In this way, we get a batch of logits data, and each logits (line 33) is with shape [B, C] and is used to compute the accuracy on B samples. We use these data to get N loss values and N accuracies, then the surrogate loss is optimized using the rank correlations between loss values and accuracies.

To compute the accuracy, the straightforward way is to use the original targets in dataset (this could be done by commenting lines 36~38). Nevertheless, we only uses the trained model to predict the logits, which means that the diversity of accuracy is limited (e.g., almost all accuracies are larger than 90% in CIFAR-10). Therefore, the surrogate loss trained in this way would not be used at the early period of training.

In order to increase the diversity of accuracy, we generate pseudo targets with a randomly sampled accuracy (line 37). In line 36, the targets (shape [B, 1]) is obtained by using torch.max on the last classification dim (1) of logits ([B, C]), and the targets at this time would achieve 100% accuracy with the logits. Then we randomly modify the content of targets in line 38, to obtain the randomly sampled accuracy value.

Hope this could help you understand our code more clearly :)

@lwmlyy
Copy link
Author

lwmlyy commented Nov 29, 2022

Thanks for the detailed explanation. In the paper, you mention the training data for the Reloss is generated by GR with probability p and GM with 1-p. However, there is no mention in the code. I wonder how this is implemented.

@hunto
Copy link
Owner

hunto commented Nov 29, 2022

Actually we did not use random data generator in the classification loss (p=0), and we obtain the trained surrogate loss using the same training method in train_reloss.py. This is because we use the logits on ImageNet to train the surrogate loss and the data is diverse enough. For some dataset like CIFAR-10, which is easy to converge, you can randomly generate a probability (from 0. to 1.) on the target class to replace the original value in logits.

The random generator is only used on scene text recognition task for comparisons to previous work LS-ED [1], and you can find the details of random generator on this task in the paper.

[1] Learning surrogates via deep embedding. In European Conference on Computer Vision, pp. 205–221. Springer, 2020.

@lwmlyy
Copy link
Author

lwmlyy commented Nov 29, 2022

It means if we do not want to use the random generator, we should comment the lines 36~38, is this correct? Also, logits are the predictions from a model and the targets are the ground-truth labels corresponding to the logits, right?

@hunto
Copy link
Owner

hunto commented Nov 29, 2022

It means if we do not want to use the random generator, we should comment the lines 36~38, is this correct? Also, logits are the predictions from a model and the targets are the ground-truth labels corresponding to the logits, right?

Yes, you can comment lines 36~38 to use the original predictions and labels without any pseudo data generated, but this may have a potential risk of performing worse if your data distribution is not diverse enough.

@lwmlyy
Copy link
Author

lwmlyy commented Nov 30, 2022

What is the best option for natural language generation tasks including machine reading comprehension and machine translation?

@hunto
Copy link
Owner

hunto commented Dec 2, 2022

For these two NLP tasks, we do not involve any random generation and directly use the pure outputs and labels generated by the network and dataset.

@lwmlyy
Copy link
Author

lwmlyy commented Dec 5, 2022

thanks for share the information.

@lwmlyy lwmlyy closed this as completed Dec 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants