Failure in reproducing reported result: enquiry about a few implementation details not stated in the paper #4

kate-sann5100 · 2019-06-29T18:29:07Z

Hi, I have been trying to reimplement this paper and cannot get the reported accuracy. It would be super helpful if you can give me some advice about the following details please:

Most importantly, would you mind to explain how the episodes are sampled from the dataset, and the definition of epoch mentioned in section 5.1 in more details please? (it would be very helpful if you can provide the total number of samples in the dataset and any data augmentation or preprocessing methods applied)
Currently, I understand one epoch as using all masks available in the dataset as the support target once. However:
If I include all class masks available in PASCAL VOC 2012 and SDS extra annotation into the dataset, the model will overfit after iterating through the dataset for 200 epochs (end up with meanIOU around 35%).
If I do class balancing (take n random masks from each class, where n equals to the number of masks belong to the class with the least number of masks), the meanIOU on validation set will fluctuate a lot (by about 4%) between each epoch even when approach the end of training.
The training schedule
Are there any batchnorm layers in the model? If so, may I know where are they please?
Are the Dense Comparison Module output features passed through the residual block in the very first iteration when there hasn't been any predicted masks? If so, may I know what is the empty mask like please?
When doing testing, there will be an about 2% fluctuation in meanIOU introduced by the random sampling of 1000 support-query pair. May I know how do you deal with this fluctuation please?

Thanks in advance

mysayalHan · 2019-06-30T10:29:57Z

@kate-sann5100 I also reimplement this paper and can not get the reported accuracy. And I also have some questions about details except what you have put forward:

The ResNet-50 in the paper is not the same as the traditional ResNet-50 because of the dilation convolution after block2, do you re-train the refined ResNet-50 in ImageNet? If so, could you share the re-train model with me?
I am not sure how to build IOM module. For details, is IOM0 and IOM1 share the same parameters? I saw the Figure2(C), the output mask is as the input of the same IOM module, while in Figure2(b), each IOM module has different parameters. The same as you, I have some questions about dropout and empty masks, I directly used torch.dropout2d and did not got a good result.
I found dying ReLU usually happens in my re-implement code when the learning rate is set to 0.0025. Did you meet with the same problem?

happycoding1996 · 2019-06-30T12:08:55Z

@mysayalHan The modified resnet50 may be the one used in PSPNet. You can find the trained backbone in that repo.

@kate-sann5100 I have the same overfitting problem and my val FB-IoU ends up with around 63 and val meanIoU finally is about 30. However, training IoU is close to 99.
My setting: an epoch contains several batches. Each batch is a (support, query) pair. The batch formation follows the sampling process. I simply implement the model without IOM.

I have contacted the author via email. The code will be available soon.

BTW, does anyone have the same problem? My validation loss curve keeps fluctuating and even sometimes going up as the training proceeds.....

mysayalHan · 2019-06-30T13:05:40Z

@happycoding1996 Thanks for your reply, but I think the modified ResNet-50 used in this paper is similar to the paper named Dilated Residual Networks. https://github.com/fyu/drn

happycoding1996 · 2019-07-01T02:18:57Z

@mysayalHan Thank you!

kate-sann5100 · 2019-07-01T11:36:22Z

@mysayalHan

I didn't retrain the resnet-50. I simply loaded the pretrained weights from torchvision.
I believe the IOMs share the same weights.
Moreover, according to Table4, the model should reach a meanIOU of 51.2 with only IOM0 (CANet-Init). I tried to train a baseline model that does not employ additional IOM for optimization, but still cannot get anywhere near to the reported accuracy (51.2).
According to the paper, the entire predicted mask is reset to empty mask when dropout happens. I am not sure dropout2d which does channel-wise dropout is the right way to implement this. I implementation is

if self.training:
    dropout = Bernoulli(self.dropout_prob).sample().item()
    if dropout == 1:
         pred_mask = empty_mask

I haven't faced any dead training (when the weights fail to update) if that is what you mean.

It will be very helpful if you can let me know what meanIOU do you get in evaluation please.

kate-sann5100 · 2019-07-01T11:52:55Z

@happycoding1996
I got very similar final result. Can you explain in more detail about how you implement it with IOM please? Do you mean getting rid of the additional IOM for optimization or not using IOM (including 2 conv blocks and an ASPP) at all?
I did not value the validation loss while training. But by testing the model weights after different epochs, I can conclude that the validation result fluctuates greatly during training. Moreover, I do sometime manage to get meanIOU around 51% on validation set using weights from very early epochs (e.g 21st epoch).

happycoding1996 · 2019-07-01T13:32:19Z

@kate-sann5100 I did not use IOM, I simply processed the concatenated support-query feature (512 channels in total) with two following 3x3 convolutions and get the output. The 51% mIOU you got is the IOU of five classes in split-0? I only got about 30 mIOU that calculates on all test classes.

The following is the test log of 200 epochs model:
[2019-06-30 07:34:30,779 INFO test.py line 293 54540] <<<<<<<<<<<<<<<<< End Evaluation<<<<<<<<<<<<<<<<<
[2019-06-30 07:34:30,779 INFO test.py line 294 54540] Eval result: FB-mIoU/mAcc/allAcc 0.5988/0.6734/0.8995.
[2019-06-30 07:34:30,779 INFO test.py line 296 54540] Background result: iou/accuracy 0.8949/0.9670.
[2019-06-30 07:34:30,779 INFO test.py line 296 54540] Foregound result: iou/accuracy 0.3027/0.3798.
[2019-06-30 07:34:30,779 INFO test.py line 306 54540] meanIoU---Val result: mIoU 0.2791.
[2019-06-30 07:34:30,779 INFO test.py line 308 54540] Class_1 Result: iou 0.1527.
[2019-06-30 07:34:30,779 INFO test.py line 308 54540] Class_2 Result: iou 0.2773.
[2019-06-30 07:34:30,779 INFO test.py line 308 54540] Class_3 Result: iou 0.6541.
[2019-06-30 07:34:30,779 INFO test.py line 308 54540] Class_4 Result: iou 0.2054.
[2019-06-30 07:34:30,779 INFO test.py line 308 54540] Class_5 Result: iou 0.1060.

kate-sann5100 · 2019-07-01T16:32:44Z

@happycoding1996 Thank you.
Yes, my 51% mIOU is of split-0. If you are interested, that particular epoch gives:
Start 1shot evaluation on None voc dataset group0 with crop size (513, 513)
mean_IU: 0.5181510938091882
IU_array: [0.92428074 0.68025647 0.34850322 0.72462379 0.46033128 0.3770407
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. ]
FB_mean_IU: 0.7257937994189583
FB_IU_array: [0.92428074 0.52730686]

IU_array[0] is the background class.

happycoding1996 · 2019-07-02T10:43:52Z

@kate-sann5100 Did you include IOM in your model?

My new result (epoch 18) without IOM module is as follows.
[2019-07-02 08:35:00,725 INFO train_cascade.py line 453 133] FBIoU---Val result: mIoU/mAcc/allAcc 0.6895/0.7956/0.9147.
[2019-07-02 08:35:00,725 INFO train_cascade.py line 455 133] Class_0 Result: iou/accuracy 0.9076/0.9519.
[2019-07-02 08:35:00,725 INFO train_cascade.py line 455 133] Class_1 Result: iou/accuracy 0.4715/0.6393.

kate-sann5100 · 2019-07-02T11:33:33Z

@happycoding1996 I included the IOM in my model and iteration through 5 (1+4) times during both training and evaluation.
Moreover, I have adjusted my model so it now adopts (at least I believe) the exactly same architecture as the architecture shown in the one_shot_network.py just published by the author. The training on the class balanced dataset failed to reach the reported accuracy (got 40% meanIOU at the 200th epoch). I am currently training it on the whole dataset. If that still doesn't get the reported result, I guess there may be some trick with the dataloader or training schedule.

happycoding1996 · 2019-07-03T03:03:51Z

@kate-sann5100 I noticed that randomly sampling 1000 episodes for testing brings large variance to the final result (3% on my meanIoU). Maybe you should rerun the test several times.

MSiam · 2019-07-03T03:23:41Z

@happycoding1996 I am just wondering which code base are you using for the data loading part for sampling 1000 pairs support and query? As I want to ensure that my own dataloader is correct which I am using in my method.

Thanks

happycoding1996 · 2019-07-03T03:40:10Z

@MSiam I wrote it by myself. The process follows the one introduced in Section 4.3 of One-Shot Learning for Semantic Segmentation.

MSiam · 2019-07-03T13:34:25Z

@happycoding1996 Great I wanna ensure that my dataloading process is correct, and I wanna also modify on the code for some of my work, is it possible that you share your code with me on email that you got mIoU 51% on fold 0.

I wanted to reproduce the results of this work as well and compare with it while being sure of the process.

I also want to ensure from the way this method and OSLSM evaluates the IoU, "it is mentioned in the paper that he evaluates the binary IoU and takes average over all classes", does that mean he includes the original 15 classes as well in that average or did he mean only the 5 classes within the fold?

mysayalHan · 2019-07-10T09:01:41Z

@kate-sann5100 Can you share your code with me on my email mysayalhan@gmail.com? It is very strange that I faced dead training (all of the network output are background) even with the one_shot_network.py.

kate-sann5100 · 2019-07-15T09:26:12Z

@happycoding1996
Below is the evaluation result of weights at different epochs. (I trained with 3 gpu, so there are only 67 epochs in total.

Even by increasing the evaluation result by 3%, the model still don't converge to the reported accuracy. Actually, it doesn't converge.

happycoding1996 · 2019-07-18T05:57:46Z

@MSiam Very sorry I currently cannot share the code with you since we are using the code based an unpublished work. When that work is accepted, we will release the code on Github. As for the meanIOU, I think this work only takes the average IOU over 5 classes in the testing fold rather than 20 classes.

@kate-sann5100 I face a similar case where there exists a very large fluctuation on the validation curve. Even if the training curve steadily gets much improved by a better backbone (ResNet 101), the validation result still fluctuates greatly and gets minor improvement (2% on meanIOU).

icoz69 · 2019-09-08T06:38:26Z

hi all, i have updated the codes with meanIOU as the validation metric. note that this is a quick validation with fixed 321*321 inputs. to reproduce the reported SOA result, you need to input with raw sizes and do multi-scale input test. if you feel multi-scale tests are cumbersome, you can compare the results in the ablation study. The results are below:
fold0:49.56
fold1:64.97
fold2:49.83
fold3:51.49
mean:53.96

SMHendryx mentioned this issue Sep 5, 2019

model does not upsample predictions #6

Closed

icoz69 closed this as completed Sep 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failure in reproducing reported result: enquiry about a few implementation details not stated in the paper #4

Failure in reproducing reported result: enquiry about a few implementation details not stated in the paper #4

kate-sann5100 commented Jun 29, 2019 •

edited

mysayalHan commented Jun 30, 2019 •

edited

happycoding1996 commented Jun 30, 2019 •

edited

mysayalHan commented Jun 30, 2019 •

edited

happycoding1996 commented Jul 1, 2019

kate-sann5100 commented Jul 1, 2019 •

edited

kate-sann5100 commented Jul 1, 2019

happycoding1996 commented Jul 1, 2019 •

edited

kate-sann5100 commented Jul 1, 2019 •

edited

happycoding1996 commented Jul 2, 2019 •

edited

kate-sann5100 commented Jul 2, 2019 •

edited

happycoding1996 commented Jul 3, 2019

MSiam commented Jul 3, 2019

happycoding1996 commented Jul 3, 2019 •

edited

MSiam commented Jul 3, 2019 •

edited

mysayalHan commented Jul 10, 2019 •

edited

kate-sann5100 commented Jul 15, 2019

happycoding1996 commented Jul 18, 2019 •

edited

icoz69 commented Sep 8, 2019

Failure in reproducing reported result: enquiry about a few implementation details not stated in the paper #4

Failure in reproducing reported result: enquiry about a few implementation details not stated in the paper #4

Comments

kate-sann5100 commented Jun 29, 2019 • edited

mysayalHan commented Jun 30, 2019 • edited

happycoding1996 commented Jun 30, 2019 • edited

mysayalHan commented Jun 30, 2019 • edited

happycoding1996 commented Jul 1, 2019

kate-sann5100 commented Jul 1, 2019 • edited

kate-sann5100 commented Jul 1, 2019

happycoding1996 commented Jul 1, 2019 • edited

kate-sann5100 commented Jul 1, 2019 • edited

happycoding1996 commented Jul 2, 2019 • edited

kate-sann5100 commented Jul 2, 2019 • edited

happycoding1996 commented Jul 3, 2019

MSiam commented Jul 3, 2019

happycoding1996 commented Jul 3, 2019 • edited

MSiam commented Jul 3, 2019 • edited

mysayalHan commented Jul 10, 2019 • edited

kate-sann5100 commented Jul 15, 2019

happycoding1996 commented Jul 18, 2019 • edited

icoz69 commented Sep 8, 2019

kate-sann5100 commented Jun 29, 2019 •

edited

mysayalHan commented Jun 30, 2019 •

edited

happycoding1996 commented Jun 30, 2019 •

edited

mysayalHan commented Jun 30, 2019 •

edited

kate-sann5100 commented Jul 1, 2019 •

edited

happycoding1996 commented Jul 1, 2019 •

edited

kate-sann5100 commented Jul 1, 2019 •

edited

happycoding1996 commented Jul 2, 2019 •

edited

kate-sann5100 commented Jul 2, 2019 •

edited

happycoding1996 commented Jul 3, 2019 •

edited

MSiam commented Jul 3, 2019 •

edited

mysayalHan commented Jul 10, 2019 •

edited

happycoding1996 commented Jul 18, 2019 •

edited