You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I am very interested in your work and I am doing some reproduction work based on your work.
Now I have two questions which make me a little confused. May I ask about them?
1.How do you get your final prediction? For example, if I fuse from 'temporal' to 'spatial', should I only use the prediction of spatial net or both of the two nets? And when you got your best result in your paper, the 'nFramesPerVid' you used is also only 1?
2.Which of these two performs better in your experiment? Fuse from 'temporal' to 'spatial' or fuse from 'spatial' to 'temporal'?
I am sorry for taking your time and thank you a lot for reading my questions. I'd appreciate it a lot if you could kindly answer my questions.
The text was updated successfully, but these errors were encountered:
MubarkLa
changed the title
May I ask three questions?
May I ask two questions?
Sep 12, 2016
To 1: As noted in the paper, the final prediction is made by averaging the prediction layer outputs (in the paper: "During testing, we average the FC8 predictions for both towers.");
Regarding 'nFramesPerVid': this is also mentioned in the paper: "For testing we average
20 temporal predictions from each network by densely sampling the input-frame-stacks and their horizontal flips."
To 2: We did not evaluate fusion from the spatial into the temporal stream, but we expect similar performance.
Hello, I am very interested in your work and I am doing some reproduction work based on your work.
Now I have two questions which make me a little confused. May I ask about them?
1.How do you get your final prediction? For example, if I fuse from 'temporal' to 'spatial', should I only use the prediction of spatial net or both of the two nets? And when you got your best result in your paper, the 'nFramesPerVid' you used is also only 1?
2.Which of these two performs better in your experiment? Fuse from 'temporal' to 'spatial' or fuse from 'spatial' to 'temporal'?
I am sorry for taking your time and thank you a lot for reading my questions. I'd appreciate it a lot if you could kindly answer my questions.
The text was updated successfully, but these errors were encountered: