Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question Regarding Feature Extraction Discrepancy Between Training & Inference #26

Closed
rsomani95 opened this issue May 6, 2023 · 4 comments

Comments

@rsomani95
Copy link

Hello. Firstly, congratulations thank you for sharing this work, it's really cool!

I had a question regarding feature extraction. In the paper and the training script, train.sh suggests that there's two sets of video features being used -- SlowFast and CLIP.
I confirmed that the shared moment_detr_features.tar.gz file has both the SlowFast & CLIP features available as well.

However, in the inference script run.py, only the ClipFeatureExtractor is used. Do we not need SlowFast features during inference? Or am I missing something?

@jayleicn
Copy link
Owner

jayleicn commented May 7, 2023

Hi @rsomani95,

Thank you for your kinds words. The run.py script is used as an easy-to-run demo with as few dependencies as possible, thus we removed the slowfast feature and only rely on the CLIP feature due to it's easier deployment process. For best performance, both slwofast and CLIP features are needed.

Best,
Jie

@rsomani95
Copy link
Author

Hi @jayleicn,

Thank you for your response. I see, that's great to know.
I'm interested in deploying this model, and as you correctly mentioned, it's much easier to only rely on CLIP features.

Do you have validation scores for the model without SlowFast features?
This wasn't reported in the paper, but I'm curious if you ever tried training a model without SlowFast and just on CLIP features?

Thanks!

@jayleicn
Copy link
Owner

jayleicn commented May 8, 2023

I don't have the exact number as well, as far as I remember, the CLIP only model achieves at least 90-95% of the CLIP+SlowFast model performance, so it is also a very decent model.

@rsomani95
Copy link
Author

Got it. Awesome! That's higher that I'd intuited.

That answers my original question, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants