Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's the shape of network's input #1

Closed
nonday opened this issue Apr 18, 2020 · 3 comments
Closed

What's the shape of network's input #1

nonday opened this issue Apr 18, 2020 · 3 comments

Comments

@nonday
Copy link

nonday commented Apr 18, 2020

output = net(X[preFetchBatchI*args.batchSize:(preFetchBatchI+1)*args.batchSize,:,:].permute(0,2,1), eps)

  1. what's the value of (batch_size, feat_dim, chunk_len) , (batch_size, 30, ?) ?
  2. Is MFCC of the feature in your experiment?
  3. Have you try other tools to extract features, such as librosa ...?
    Thanks!
@manojpamk
Copy link
Owner

  1. what's the value of (batch_size, feat_dim, chunk_len) , (batch_size, 30, ?) ?

I'm not sure if I understand the question. chunk_len is determined by kaldi when creating the archives. It represents the temporal dimension: number of MFCC frames in the utterance.

  1. Is MFCC of the feature in your experiment?

Yes, each input sample is a matrix - a sequence of MFCC features.

  1. Have you try other tools to extract features, such as librosa ...?

Not at the moment.

@nonday
Copy link
Author

nonday commented Apr 22, 2020

"loss in nan", how to solve this?

@manojpamk
Copy link
Owner

The most common reason (in this repo) was due to the stats pooling layer. If all inputs are zero or same, then var(0) seems to result in NaN loss.
Please use the -noiseEps to avoid this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants