Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About UCF101 and HMDB51 results #19

Closed
airsplay opened this issue May 10, 2021 · 4 comments
Closed

About UCF101 and HMDB51 results #19

airsplay opened this issue May 10, 2021 · 4 comments

Comments

@airsplay
Copy link

Dear Authors,

Thanks for this great repo for reproducing the results in TimesFormer. I just want to have a quick check whether you have experimented with the two small video classification datasets (i.e., UCF101 and HMDB51) and have some initial results.

@gberta
Copy link
Contributor

gberta commented May 10, 2021

We haven't experimented with UCF or HMDB datasets. However, if you do so please let me know the numbers (both from Imagenet and from Kinetics pretrained checkpoints). It would be great to know how TimeSformer performs on these smaller datasets.

My intuition is that it might not work as well as some prior methods. Our model has a large number of parameters, so these datasets might be too small for the model to learn meaningful patterns. Nevertheless, it would quite interesting to see some results in this setting.

@airsplay
Copy link
Author

Thanks for the detailed answers! We will let you know if we have the results.

@PeihaoChen
Copy link

PeihaoChen commented Jul 14, 2021

Hi,

I have tried timesformer on UCF101. With ImageNet pretrained, I only get 42.85% accuracy.

The timesformer args are as followed: "transformer_args": { "depth": 12, "dim_head": 64, "embed_dim": 768, "heads": 12, "patch_size": 16 }. I trained the model for 30 epoch with learning rate of 0.05 using a cosine annealing schedule and SGD optimizer. The input containing 16 frames with 1/4 sampling rate for 25fps video. The image resolution is 224*224. I trained on 7 GPUs with batch size 2 for each GPU.

I know that the Transformer0based model is not easy to train on small datasets. I am not sure whether this result is normal. Do I miss some important training details? On VidTr paper, the authors achieve 96.7 on UCF101 with Transformer-based model.

Looking for an insightful response that helps me to get a reasonable result on UCF101. Many thanks!

@h030162
Copy link

h030162 commented Aug 11, 2022

https://aistudio.baidu.com/aistudio/projectdetail/2291410
avg_acc1= 0.9243986010551453, avg_acc5= 0.9894263744354248

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants