What if the frame length is greater than NFFT? #33

janluke · 2017-02-03T16:45:19Z

I'm not an expert in this kind of stuff, so I'm sorry if this will be a waste of time.

From the numpy.fft.rfft documentation [in our case: n=NFTT, input=frame]:
"Number of points along transformation axis in the input to use. If n is smaller than the length of the input, the input is cropped. If it is larger, the input is padded with zeros. If n is not given, the length of the input along the axis specified by axis is used."

Is not this cropping something we want to avoid? Because, as far as I've seen, there's not any check in the code about how the frame size compares to NFTT.

jameslyons · 2017-02-09T07:49:07Z

The cropping is generally something you would want to avoid, but I am hesitant to make it an error. It will probably make the features perform worse, but this is something you would hopefully catch with cross validation. I am thinking I'll leave it how it is, but I could probably be convinced to change it if someone had a really good argument.

janluke · 2017-02-09T15:12:18Z

In my opinion, if it's something we want to avoid, at least a warning is necessary (using logging package).

Consider a (not so expert or distracted) user who wants to use the mfcc function with a greater sampling rate than 16kHz, say 22kHz (a typical value), leaving all the other default parameters as they are. In this case, a 25ms frame is long 550 samples, so the frames will be cropped yielding to suboptimal results. The user will be totally unaware of this cropping behaviour and I don't think the user expects to mess up something just changing the sampling rate or the window size (aka frame size).

[Side question: what parameters would you suggest in this case, given that 550 is only a little bigger than 512?]

Furthermore, not all the users are going to do cross validation on the nftt, rather many will use some typical set of parameters found in some paper, which usually provides sampling rate, window size and step (in ms), and number of coefficients.

jameslyons · 2017-02-15T11:00:28Z

I guess you've convinced me, I'll add a warn() when NFFT < framelen. Generally people choose fft sizes to be 2^k for some integer k. For a frame size of 550, 1024 would probably be used. I think this is because FFTs can be computed most efficiently when FFT size is 2^k. From the point of view of speech recognition, it doesn't really matter. Your accuracy will be close to identical if you use FFT size of 550 or 1024, because the mel filterbank will smooth everything out anyway.

janluke · 2017-02-15T14:06:56Z

Good! Thank you for your work and help :)

jameslyons closed this as completed Feb 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What if the frame length is greater than NFFT? #33

What if the frame length is greater than NFFT? #33

janluke commented Feb 3, 2017 •

edited

jameslyons commented Feb 9, 2017

janluke commented Feb 9, 2017

jameslyons commented Feb 15, 2017

janluke commented Feb 15, 2017

What if the frame length is greater than NFFT? #33

What if the frame length is greater than NFFT? #33

Comments

janluke commented Feb 3, 2017 • edited

jameslyons commented Feb 9, 2017

janluke commented Feb 9, 2017

jameslyons commented Feb 15, 2017

janluke commented Feb 15, 2017

janluke commented Feb 3, 2017 •

edited