New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot reproduce the supervised performance on UCF101 #9
Comments
Initially, I have a similar question about the fine-tuning strategy during the implementation. However, I find that the performance can be reproduced if you unfreeze the Conv weight during the fine-tuning stage. According to the paper, it seems that while the Conv weights are retrieved from the pre-trained model, they are trained together with the FC layer during fine-tuning.
Anyway, you could try to unfreeze all the weights and see how it goes. In my case, I can reproduce the results when using R(2+1) on UCF101. |
Thank you very much for your response. I also reproduced the results when I fine-tuned the whole network. Now I have been trying to reproduce another paper "Video Representation Learning by Recognizing Temporal Transformations". They trained on a pretext task and got ~50% on UCF101, but I just got 20%. |
I haven't reproduced that work so far. Maybe you can double-check other settings (batch size per GPU for example) compared to the original paper. Anyway, wish you all the best with your work. |
Thanks. You too. |
Hello @AronCao49 Thanks |
Thank you very much for your inspiring work. However, I encountered a problem when reproducing the performance. I followed your code to do the self-supervised learning. I got about 60-70% accuracy in pace prediction. However, when I freeze the Conv weights and only train the final FC layer for supervised learning, I just got 0.10 average accuracy on training. When training final FC, I used the same data augmentation method as self-supervised learning as your paper said. Could you please tell me more about the fine-tuning details?
The text was updated successfully, but these errors were encountered: