Retrained zero shot results are inferior to the public scores #15

JanySunny · 2019-06-22T07:30:05Z

I retrained the zero shot model by using train_zero_shot_youtube.sh based on the given settings, obtained the inference results based on the eval_zero_shot_youtube.sh, and then prepared the submission results based on prepare_results_submission.py for the YouTubeVOS challenge official website.
However, the test results on YouTubeVOS cannot match the public scores. Are there any other settings or tricks during training and testing? I found data argumentation is used in training while not in testing, and I absolutely did as the public settings. The models are trained for 50 epochs on a single TitanX GPU (batch_size=4, clips=5). The following is the retrained results:

retrain-RVOS-T: 33.87, 18.37, 38.62, 22.23

retrain-RVOS-S: 38.52, 18.72, 41.70, 22.59

retrain-RVOS-ST: 41.56, 21.46, 45.00, 24.52

Besides I also used the public zero shot youtube model for youtube-vos testing, I got the following scores:
pub-RVOS-ST: 43.39, 21.10, 45.30, 24.32.

It seems the inferior retrain results are not due to the test settings, but I do not know why, can you help me?

carlesventura · 2019-08-28T09:09:03Z

There could be at least two reasons for the different results when retraining:

A new training, will see the images and the instances in a different order, and the data augmentation techniques applied will be also different. This can give as a result a model which can be slightly better or worse than the one trained and released by ourselves.
For the zero-shot case, we trained the model for 40 epochs. Even if the validation loss (obtained with the train-val subset) of the model could be better when trained for 50 epochs, this doesn't mean that the results obtained in the validation set will be better than the ones obtained by the released model.

Best regards,

Carles

JanySunny · 2019-08-30T07:07:32Z

@carlesventura Thanks for your kind answer. Then, how to choose the final model after one training (eg, 40 or more epochs)? Take the one with the best validation loss (obtained with the train-val subset)? Maybe it is overfitting. Or test several or all trained models (checkpoints) on test set? It seems not advisable. Thank you.

Best regards

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrained zero shot results are inferior to the public scores #15

Retrained zero shot results are inferior to the public scores #15

JanySunny commented Jun 22, 2019 •

edited

Loading

carlesventura commented Aug 28, 2019

JanySunny commented Aug 30, 2019

Retrained zero shot results are inferior to the public scores #15

Retrained zero shot results are inferior to the public scores #15

Comments

JanySunny commented Jun 22, 2019 • edited Loading

carlesventura commented Aug 28, 2019

JanySunny commented Aug 30, 2019

JanySunny commented Jun 22, 2019 •

edited

Loading