Concern on the details of the comparison results in Table-2 #2

PkuRainBow · 2020-10-30T06:13:21Z

Really nice paper!

We carefully read your work and find the experimental settings on Pascal-VOC in Table-2 (as shown below) is really interesting: on the last column of Table-2, all the methods only use 92 images as the labeled set and choose the train-aug set (10582) as the unlabeled set according to the code :

wss/core/data_generator.py

Lines 85 to 104 in 8069dbe

    
           _PASCAL_VOC_SEG_INFORMATION = DatasetDescriptor( 
        
               splits_to_sizes={ 
        
                   'train': 1464, 
        
                   'train_aug': 10582, 
        
                   'trainval': 2913, 
        
                   'val': 1449, 
        
                   # Splits for semi-supervised 
        
                   '4_clean': 366, 
        
                   '8_clean': 183, 
        
                   # Balanced 1/16 split 
        
                   # Sample with rejection, suffix represents the sample index 
        
                   # e.g., 16_clean_3 represents the 3rd random shuffle to sample 1/16 
        
                   # split, given a fixed random seed 8888 
        
                   '16_clean_3': 92, 
        
                   '16_clean_14': 92, 
        
                   '16_clean_20': 92, 
        
                   # More images 
        
                   'coco2voc_labeled': 91937, 
        
                   'coco2voc_unlabeled': 215340, 
        
               },

and,

wss/train_wss.py

Line 796 in 280cc1a

split_name=FLAGS.train_split_cls,

Our understanding is that the FLAGS.train_split_cls represents the set of unlabeled images used for training and its value is train_aug by default. So the number of unlabeled images is nearly more than 100x than the number of unlabeled images. Given that the total training iteration number is set as training_number_of_steps=30000, therefore, we will iterate the sampled 92 labeled images for nearly 30000x64/92=20869 epochs. Is my understanding correct?

If my understanding is correct, we are curious about whether training for so many epochs on the 92 labeled images is a good choice. Besides, as the train-aug set (10582) contains the 92 labeled images, so we guess all the methods also apply the pseudo-label based methods/consistency based methods on the labeled images (instead of only on the unlabeled images).

Great thanks and wait for your explanation if my understanding is wrong!

Yuliang-Zou · 2020-10-30T06:38:12Z

Hi @PkuRainBow

Yes, your understanding is correct. We here use the same number of iterations for all the data splits, this is because we need to iterate through the unlabeled set enough times (if you count the number of epochs based on the unlabeled set, then they are the same).
Yes, those 92 images are also in the unlabeled set. I follow the common practice in SSL classification here.

BTW, we sample those 92 images so that the number of pixels for each class is roughly balanced. You might not always get a good result if you pick arbitrary 92 images (see Appendix C).

PkuRainBow · 2020-11-02T10:24:43Z

@Yuliang-Zou Great thanks for your explanation. We still have a small concern about your experimental settings.

According to your explanation, in fact, your method will train over the 92 images (labeled set) for 20869 epochs, which might cause serious overfitting problems on the supervised loss training part. We also find that the authors of CutMix face the same challenge and we paste the discussion here: Britefury/cutmix-semisup-seg#5 (comment)

So we are really interested in how your experimental setting can address the overfitting problem? Hope for your explanation!

Yuliang-Zou · 2020-11-03T19:12:23Z

I don't have a clear answer yet. But I guess it could be related to the training schedule. In the beginning, the supervised loss dominates the optimization; as we train for more and more iterations, the unsupervised loss starts to take effects and gradually dominates the loss. Just for your reference, FixMatch (semi-supervised classification) has an experiment, training cifar10 on 10 images only, but it works quite well.

PkuRainBow · 2020-11-04T01:46:03Z

@Yuliang-Zou Thanks for your reply. The balance between the supervised loss and the unsupervised loss might be a good point to avoid this problem. If my understanding is correct, it is very important to ensure the unsupervised loss to dominate in the late stage. However, there seem no explicit mechanisms to ensure such a scheme, therefore, we guess that an explicit re-weighting scheme might address this problem.

PkuRainBow closed this as completed Nov 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concern on the details of the comparison results in Table-2 #2

Concern on the details of the comparison results in Table-2 #2

PkuRainBow commented Oct 30, 2020 •

edited

Loading

Yuliang-Zou commented Oct 30, 2020 •

edited

Loading

PkuRainBow commented Nov 2, 2020 •

edited

Loading

Yuliang-Zou commented Nov 3, 2020

PkuRainBow commented Nov 4, 2020

Concern on the details of the comparison results in Table-2 #2

Concern on the details of the comparison results in Table-2 #2

Comments

PkuRainBow commented Oct 30, 2020 • edited Loading

Yuliang-Zou commented Oct 30, 2020 • edited Loading

PkuRainBow commented Nov 2, 2020 • edited Loading

Yuliang-Zou commented Nov 3, 2020

PkuRainBow commented Nov 4, 2020

PkuRainBow commented Oct 30, 2020 •

edited

Loading

Yuliang-Zou commented Oct 30, 2020 •

edited

Loading

PkuRainBow commented Nov 2, 2020 •

edited

Loading