What's the shape of network's input #1

nonday · 2020-04-18T08:43:04Z

Line 91 in 350e4b5

    
           output = net(X[preFetchBatchI*args.batchSize:(preFetchBatchI+1)*args.batchSize,:,:].permute(0,2,1), eps)

what's the value of (batch_size, feat_dim, chunk_len) , (batch_size, 30, ?) ?
Is MFCC of the feature in your experiment?
Have you try other tools to extract features, such as librosa ...?
Thanks!

manojpamk · 2020-04-18T22:26:13Z

what's the value of (batch_size, feat_dim, chunk_len) , (batch_size, 30, ?) ?

I'm not sure if I understand the question. chunk_len is determined by kaldi when creating the archives. It represents the temporal dimension: number of MFCC frames in the utterance.

Is MFCC of the feature in your experiment?

Yes, each input sample is a matrix - a sequence of MFCC features.

Have you try other tools to extract features, such as librosa ...?

Not at the moment.

nonday · 2020-04-22T14:03:34Z

"loss in nan", how to solve this?

manojpamk · 2020-04-22T20:46:48Z

The most common reason (in this repo) was due to the stats pooling layer. If all inputs are zero or same, then var(0) seems to result in NaN loss.
Please use the -noiseEps to avoid this.

manojpamk closed this as completed Apr 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's the shape of network's input #1

What's the shape of network's input #1

nonday commented Apr 18, 2020 •

edited

Loading

manojpamk commented Apr 18, 2020

nonday commented Apr 22, 2020

manojpamk commented Apr 22, 2020

What's the shape of network's input #1

What's the shape of network's input #1

Comments

nonday commented Apr 18, 2020 • edited Loading

manojpamk commented Apr 18, 2020

nonday commented Apr 22, 2020

manojpamk commented Apr 22, 2020

nonday commented Apr 18, 2020 •

edited

Loading