Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrained zero shot results are inferior to the public scores #15

Open
JanySunny opened this issue Jun 22, 2019 · 2 comments
Open

Retrained zero shot results are inferior to the public scores #15

JanySunny opened this issue Jun 22, 2019 · 2 comments

Comments

@JanySunny
Copy link

JanySunny commented Jun 22, 2019

I retrained the zero shot model by using train_zero_shot_youtube.sh based on the given settings, obtained the inference results based on the eval_zero_shot_youtube.sh, and then prepared the submission results based on prepare_results_submission.py for the YouTubeVOS challenge official website.
However, the test results on YouTubeVOS cannot match the public scores. Are there any other settings or tricks during training and testing? I found data argumentation is used in training while not in testing, and I absolutely did as the public settings. The models are trained for 50 epochs on a single TitanX GPU (batch_size=4, clips=5). The following is the retrained results:

retrain-RVOS-T: 33.87, 18.37, 38.62, 22.23

retrain-RVOS-S: 38.52, 18.72, 41.70, 22.59

retrain-RVOS-ST: 41.56, 21.46, 45.00, 24.52

Besides I also used the public zero shot youtube model for youtube-vos testing, I got the following scores:
pub-RVOS-ST: 43.39, 21.10, 45.30, 24.32.

It seems the inferior retrain results are not due to the test settings, but I do not know why, can you help me?

@carlesventura
Copy link
Collaborator

There could be at least two reasons for the different results when retraining:

  1. A new training, will see the images and the instances in a different order, and the data augmentation techniques applied will be also different. This can give as a result a model which can be slightly better or worse than the one trained and released by ourselves.

  2. For the zero-shot case, we trained the model for 40 epochs. Even if the validation loss (obtained with the train-val subset) of the model could be better when trained for 50 epochs, this doesn't mean that the results obtained in the validation set will be better than the ones obtained by the released model.

Best regards,

Carles

@JanySunny
Copy link
Author

@carlesventura Thanks for your kind answer. Then, how to choose the final model after one training (eg, 40 or more epochs)? Take the one with the best validation loss (obtained with the train-val subset)? Maybe it is overfitting. Or test several or all trained models (checkpoints) on test set? It seems not advisable. Thank you.

Best regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants