-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question Regarding Feature Extraction Discrepancy Between Training & Inference #26
Comments
Hi @rsomani95, Thank you for your kinds words. The Best, |
Hi @jayleicn, Thank you for your response. I see, that's great to know. Do you have validation scores for the model without SlowFast features? Thanks! |
I don't have the exact number as well, as far as I remember, the CLIP only model achieves at least 90-95% of the CLIP+SlowFast model performance, so it is also a very decent model. |
Got it. Awesome! That's higher that I'd intuited. That answers my original question, thank you. |
Hello. Firstly, congratulations thank you for sharing this work, it's really cool!
I had a question regarding feature extraction. In the paper and the training script,
train.sh
suggests that there's two sets of video features being used -- SlowFast and CLIP.I confirmed that the shared
moment_detr_features.tar.gz
file has both the SlowFast & CLIP features available as well.However, in the inference script
run.py
, only theClipFeatureExtractor
is used. Do we not need SlowFast features during inference? Or am I missing something?The text was updated successfully, but these errors were encountered: