Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some clarifications #410

Closed
cijose opened this issue Sep 25, 2019 · 2 comments
Closed

Some clarifications #410

cijose opened this issue Sep 25, 2019 · 2 comments

Comments

@cijose
Copy link

cijose commented Sep 25, 2019

https://github.com/facebookresearch/wav2letter/blob/master/recipes/models/seq2seq_tds/librispeech/network.arch

In the above network architecture specification, what does padding -1 mean for C2 layers?.

Also I do not understand what is the expected shape of the speech features to this network. Let's say I have a batch of size "b" of mfcc features of dimension "d" and time steps "t" . How do you view the tensor of shape b x t x d as the input to the first layer of convolutions?. I do not understand why do we need 2d convolutions here. Sorry, it was not clear from the paper.

@an918tw
Copy link
Contributor

an918tw commented Sep 25, 2019

padding = -1 means same padding, where we apply smallest possible padding such that out_size = ceil(in_size/stride) (https://github.com/facebookresearch/flashlight/blob/master/flashlight/common/Defines.h#L50-L53)

The first layer of the encoder, V -1 NFEAT 1 0, will change your input to be of shape (t, d, 1, b), which is the required input shape to the second Conv2D layer. (0 means keeping the original dimension, -1 means inferring the dimension from the full size) We recommend you directly use (t, d, 1, b) as the input tensor shape, even though there's flexibility due to the first layer.

@cijose
Copy link
Author

cijose commented Sep 26, 2019

@an918tw Thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants