Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do you make predictions for videos? #7

Open
avijit9 opened this issue Feb 14, 2020 · 3 comments
Open

How do you make predictions for videos? #7

avijit9 opened this issue Feb 14, 2020 · 3 comments

Comments

@avijit9
Copy link

avijit9 commented Feb 14, 2020

How to make prediction of a video? What is the threshold you choose usually? I am talking about the following line in the paper

After training the 3C-Net, the CLS module (see Fig. 2
and Eq. 2) is used to compute the action-class scores (pmf)
at the video-level using the final T-CAM, for the action classification task
@naraysa
Copy link
Owner

naraysa commented Feb 15, 2020

There is no threshold. Once the final T-CAM (t x num_class) is computed by the net, we do the top-k pooling over time and get a k x num_class vector, which is then temporally averaged. The resulting vector of size num_class is passed through a softmax to obtain the classwise scores of the video.

@avijit9
Copy link
Author

avijit9 commented Feb 15, 2020

And then how do you find out which classes are present in the video from the scores?

@naraysa
Copy link
Owner

naraysa commented Feb 16, 2020

The softmax was for the mAP computation. For finding the classes present, we don't perform the softmax above. Instead take all the labels whose top-k mean is greater than 0 as categories present in the video.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants