You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
How to make prediction of a video? What is the threshold you choose usually? I am talking about the following line in the paper
After training the 3C-Net, the CLS module (see Fig. 2
and Eq. 2) is used to compute the action-class scores (pmf)
at the video-level using the final T-CAM, for the action classification task
The text was updated successfully, but these errors were encountered:
There is no threshold. Once the final T-CAM (t x num_class) is computed by the net, we do the top-k pooling over time and get a k x num_class vector, which is then temporally averaged. The resulting vector of size num_class is passed through a softmax to obtain the classwise scores of the video.
The softmax was for the mAP computation. For finding the classes present, we don't perform the softmax above. Instead take all the labels whose top-k mean is greater than 0 as categories present in the video.
How to make prediction of a video? What is the threshold you choose usually? I am talking about the following line in the paper
The text was updated successfully, but these errors were encountered: