Alignment with paper #11

tobyclh · 2019-02-03T05:57:35Z

Hello, thanks for releasing the pytorch version of the code!
I have a couple questions that sync this repo with the paper (sorry for the pun

fc7 in the paper is a 256-d vector whereas here the output feature is 1024-d (at lease the pretrained model seems to be), is it a newer/better version of this work or am I looking at the wrong place?
in the file SyncNetInstance.py line 107, there is a *4 applied to the sampling of the audio, I suspect that refers to some sort of stride, however I seem to miss the part in the paper mentioning this stride (perhaps too fundamental?), would you explain what it is?

joonson · 2019-02-04T06:43:03Z

Hi,

This is an updated version, but the functionality should be the same.
This is because the audio (spectrograms) is sampled at 100Hz, whereas the video is sampled at 25Hz.

tobyclh · 2019-02-09T04:07:33Z

Thank you for the response!

tobyclh closed this as completed Feb 9, 2019

Provide feedback