New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Curriculum Learning and Video-Image Joint Training #45
Comments
Hi yes for the 4-frame stage: 4 frames for WebVid 2M and 1 frame for CC3M :) |
Cool, Thanks. By the way, did you have perform the experiment that only use WebVid 2M during the 4-frames stage? Or some ablation on "Joint Image-video training"? |
We have some results when training on webvid-only, and image-only in the recent arxiv https://arxiv.org/abs/2104.00650. |
Thanks! |
Hi,
I have a question about the curriculum learning. For the 1 frame pretraining, both CC3M and WebVid 2M dataset are used. But when finetuning on 4 frames stage, did you use both video and image for joint pretraining (4 frames for WebVid 2M and 1 frame for CC3M)? Since I cannot find any experimental details for "Joint image-video training" in the paper.
Thanks in advance.
The text was updated successfully, but these errors were encountered: