What does the paper mean by different temporal extent being used on 60f network? #5

ajay9022 · 2019-02-15T17:20:58Z

I was reading the paper Long-term Temporal Convolutions for Action Recognition and read that they have tried different temporal extent t ∈{20,40,60,80,100} on the 60f Network.

I didn't get the term temporal extent used here. Can you also explain what does 60f network mean?

From this link I got to know that a video is made up of many clips and each clip is of some x frames. Does that hold true in this paper too?

The text was updated successfully, but these errors were encountered:

gulvarol · 2019-02-15T17:30:57Z

The temporal extent is simply the number of input frames (clip) of the network. t ∈ {20,40,60,80,100} is not on the 60f network. It is either 20-frames (20f), 40-frames (40f), 60-frames (60f) etc. We did experiments with different input resolutions. Yes, the terminology must be the same for clip and video.

ajay9022 · 2019-02-15T18:25:15Z

So, that means for 60f network the input is of 60 frames. Right?

Also, one of the takeaways of the paper is that higher temporal resolution inputs get better accuracy. So, that means the inputs with more input frames are identified better than those with fewer frames.

So, does that mean the difference between 2 consecutive frames in temporal extent = 100 is less than that in temporal extent = 60 case? Because now for the same video there will be lesser frames which will be far apart.

gulvarol · 2019-02-17T15:43:03Z

Yes 60f means 60 frames.

The difference between 2 consecutive frames for 60f and 100f is the same since we always sample consecutive frames from the original video. More randomness in such sampling could improve the results. This is not something we investigated in that paper.

ajay9022 · 2019-03-18T14:58:20Z

That means that for a video of 240 frames when fed into a 60f network will only take the first 60 frames and neglect the last 180 frames. This surely means that there is a lack of information that we are feeding into the network. This will surely hamper the accuracy in recognising the video.

Just to confirm, did I get it right?

gulvarol · 2019-03-18T15:38:37Z

This is explained in the last two paragraphs of Section 3.3 of the paper. At training, we take a random (not necessarily the first) 60-frame clip. At test time, we perform sliding windows and average their scores. Otherwise, using only 1 clip of course reduces the accuracy.

ajay9022 · 2019-03-18T16:19:13Z

Does sliding windows mean that sliding through 1-60 frames and then 6-65, 11-70 because the stride of 4 frames is given in the paper or does that mean anything else?

ajay9022 · 2019-03-18T16:31:07Z

Can you explain the last paragraph of Section 3.3 a bit more? I didn't get it how the cropping is being done?

At test time, a video is divided into t-frame clips with a temporal stride of 4 frames. Each clip is further tested with 10 crops, namely the 4 corners and the center, together with their horizontal flips. The video score is obtained by averaging over clip scores and crop scores. If the number of frames in a video is less than the clip size, we pad the input by repeating the last frames to fill the missing volume.

Also, do different clips in a given video show different actions. Why are we focusing on clips of a video rather than talking about the complete video at a time?

Does the clip size mean the no. of frames in a given clip? Again how has a given video been divided into the clips? Do clips in a video share common frames?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What does the paper mean by different temporal extent being used on 60f network? #5

What does the paper mean by different temporal extent being used on 60f network? #5

ajay9022 commented Feb 15, 2019 •

edited

gulvarol commented Feb 15, 2019

ajay9022 commented Feb 15, 2019 •

edited

gulvarol commented Feb 17, 2019

ajay9022 commented Mar 18, 2019

gulvarol commented Mar 18, 2019

ajay9022 commented Mar 18, 2019 •

edited

ajay9022 commented Mar 18, 2019

What does the paper mean by different temporal extent being used on 60f network? #5

What does the paper mean by different temporal extent being used on 60f network? #5

Comments

ajay9022 commented Feb 15, 2019 • edited

gulvarol commented Feb 15, 2019

ajay9022 commented Feb 15, 2019 • edited

gulvarol commented Feb 17, 2019

ajay9022 commented Mar 18, 2019

gulvarol commented Mar 18, 2019

ajay9022 commented Mar 18, 2019 • edited

ajay9022 commented Mar 18, 2019

ajay9022 commented Feb 15, 2019 •

edited

ajay9022 commented Feb 15, 2019 •

edited

ajay9022 commented Mar 18, 2019 •

edited