Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

why the label are all some when training? #20

Closed
happyxuwork opened this issue Jul 21, 2021 · 3 comments
Closed

why the label are all some when training? #20

happyxuwork opened this issue Jul 21, 2021 · 3 comments

Comments

@happyxuwork
Copy link

the idea in your paper is amazing,great truths are all simple. I have the following questions:
1、why the labels of each Iteration are same(support images are sampled with ClassStratifiedSampler meaning every sampling the same class and have the same class order?)

labels_matrix = make_labels_matrix(
2、The size of the mutil-cropp image is 96 x 96, and the FC layer exists on the network. Why can the loss of the mutil-cropp image be propagated backward?
3、Have you considered label loss plus unlable loss as the final loss during training? In this way, finetune is not required.

@MidoAssran
Copy link
Contributor

Hi @happyxuwork, thanks for your interest.

  1. The order of the labels can change, but the samples sampled with ClassStratifiedSampler in the support mini-batch always follows this pattern
    [[a,b,c,…], [a,b,c,…], …, [a,b,c,…]]
    where a,b,c are images from classes a,b,c respectively. While the classes represented by a,b,c change from one iteration to the next, for all intents and purposes, we can just fix the one hot label matrix at the start, since it’s just used to identify which samples are the same, and which are different.

  2. The FC layer is just the prediction head (see here). Even though the small crops are 96x96, their representation size before the prediction head is the same as that of the large crops (e.g., 2048-dimensional for RN50), so you can still feed it into the projection head, no problem.

  3. I haven't tried this with the PAWS loss, but I think it sounds interesting! As one possible alternative to fine-tuning, we just looked at soft nearest-neighbours classification, but as you said, if you have a supervised loss (and a supervised head) during pre-training, then you can directly examine the prediction accuracy of the supervised head on the validation set. Though I suspect you may still get a performance boost by fine-tuning this supervised head.

@happyxuwork
Copy link
Author

@MidoAssran if more label can be used in the ImageNet, for example 20% label data can be used, do you have any suggestions to imporve the performance? or some losses can be added? From your point of view, what's the difficulty in reaching the level of full supervision with 20 percent of the data?

@MidoAssran
Copy link
Contributor

@happyxuwork Hi sorry for the delay getting back to you! Was on vacation :)

Yes I think using more labels in support, if available, will directly improve performance. See Fig.7 in appendix B in the paper.

I haven't tried using 20% of labels, but we see that by using wider ResNets (e.g., ResNet50 4x), we can already match fully supervised performance (without extra tricks like AutoAugment, etc.) with only 10% of labels (see Fig.6 in appendix B). Off the top of my head, I'm not sure what the "main difficulty" is, but I think there is certainly room for improvement, since performance with 1% labels is still significantly lower than performance with 10% labels.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants