Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What if the frame length is greater than NFFT? #33

Closed
janluke opened this issue Feb 3, 2017 · 4 comments
Closed

What if the frame length is greater than NFFT? #33

janluke opened this issue Feb 3, 2017 · 4 comments

Comments

@janluke
Copy link
Contributor

janluke commented Feb 3, 2017

I'm not an expert in this kind of stuff, so I'm sorry if this will be a waste of time.

From the numpy.fft.rfft documentation [in our case: n=NFTT, input=frame]:
"Number of points along transformation axis in the input to use. If n is smaller than the length of the input, the input is cropped. If it is larger, the input is padded with zeros. If n is not given, the length of the input along the axis specified by axis is used."

Is not this cropping something we want to avoid? Because, as far as I've seen, there's not any check in the code about how the frame size compares to NFTT.

@jameslyons
Copy link
Owner

The cropping is generally something you would want to avoid, but I am hesitant to make it an error. It will probably make the features perform worse, but this is something you would hopefully catch with cross validation. I am thinking I'll leave it how it is, but I could probably be convinced to change it if someone had a really good argument.

@janluke
Copy link
Contributor Author

janluke commented Feb 9, 2017

In my opinion, if it's something we want to avoid, at least a warning is necessary (using logging package).

Consider a (not so expert or distracted) user who wants to use the mfcc function with a greater sampling rate than 16kHz, say 22kHz (a typical value), leaving all the other default parameters as they are. In this case, a 25ms frame is long 550 samples, so the frames will be cropped yielding to suboptimal results. The user will be totally unaware of this cropping behaviour and I don't think the user expects to mess up something just changing the sampling rate or the window size (aka frame size).

[Side question: what parameters would you suggest in this case, given that 550 is only a little bigger than 512?]

Furthermore, not all the users are going to do cross validation on the nftt, rather many will use some typical set of parameters found in some paper, which usually provides sampling rate, window size and step (in ms), and number of coefficients.

@jameslyons
Copy link
Owner

I guess you've convinced me, I'll add a warn() when NFFT < framelen. Generally people choose fft sizes to be 2^k for some integer k. For a frame size of 550, 1024 would probably be used. I think this is because FFTs can be computed most efficiently when FFT size is 2^k. From the point of view of speech recognition, it doesn't really matter. Your accuracy will be close to identical if you use FFT size of 550 or 1024, because the mel filterbank will smooth everything out anyway.

@janluke
Copy link
Contributor Author

janluke commented Feb 15, 2017

Good! Thank you for your work and help :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants