-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace speex resampler with polyphase FIR interpolator. #51
Conversation
* Return proper error codes * Recalculate samples_in_100ms when samplerate chanes * Initialize resampler after sample rate is set
This looks really nice. How's the performance compared to master branch? |
I don't have a good way to compare with master because I have never used speex. In terms of accuracy, here is the relevant output from the test application (#50): If someone could provide similar test results using master, we could compare accuracy. In terms of CPU performance, I don't know what method speex uses. But I do know that it is really hard to beat the performance of the sinc/windowed method with odd filter taps at the nyquist frequency. The reason is that the sinc coefficients end up being zero for every other coefficient (except the middle) and the hanning window zeros out the first and last sample. So the total number of multiply/add operations per sample for 49 taps comes to ((48 x 3 / 4) + 1 - 2) / 4 = 8.75 for my implementation. Compare that to the example filter provided by BS.1770-4 for 48 taps. The total number of multiply/add operations per sample comes to 48 / 4 = 12 So the method I used is 27% more efficient than the BS.1770-4 example. |
Beautiful. @jiixyj I'd like to merge this pull request along with the others (I would clean up some commits while retaining credit to the authors). Looks like I have permission to do this, but I'll give you a chance to step in first. If I don't hear anything in a week, I'll do it myself. |
Hello, |
@audionuma : Thanks for doing the test. Can you also compare the accuracy of the test results between the two implementations using #50? It is possible that we could reduce the number of filter taps: |
I just took a look at the speexdsp resampler implementation for the first time. It appears to be highly optimized with SSE instructions. I don't think we could hope to compete with it on performance using a straight C implementation. Here are a couple of ideas of how to improve the performance of this patch:
|
Using a slightly modified version of #50 (see attached file). Output for current release
Output for true-peak branch
Results are good for standard compliance. |
@audionuma : Thanks for looking into all of that. I'm a little slammed for the rest of this week. But I might have some time to look for optimizations next week. |
Hello,
Using the suggested flags for Clang (which I am using) leads to a runtime error, that's why they are commented out and replaced by the GCC ones. |
Interesting. The "-mavx2 -mfma" flags tell the compiler to try to use FMA instructions when possible. FMA was only recently added to some CPUs in the last couple of years. It seems very likely that the compiler would use FMA for the multiply/accumulate code if the flags permit it. However, if it does, and you run the binary on a CPU that does not support FMA, it would probably cause a runtime error. If we really want to use FMA, we would need to write two versions of the function: a native version and an FMA optimized version. Then, at runtime, we would need to decide which function to use by checking the CPU ID/flags. Highly optimized libraries like ffmpeg do this all the time. We probably don't need that - and all this FMA stuff is a distraction. What is probably making all the difference is the "-O2" flag. If you have the time, it would be good to re-run the comparisons of master and true-peak branches after compiling both with the "-O2" flag (and excluding the other flags). If you don't have time, I will try to make time this weekend. I hope that would narrow the performance gap between the master and true-peak branches because I really didn't expect the performance to be so different. I'm no expert on packaging. So I'm not sure if it is best practice to specify optimization flags in the build scripts (cmake/make) or if it is best to leave those decisions to the packager or person compiling. It is interesting to compare the debian rules files for libebur128 and speex. The libebur128 rules do not specify any compiler options. While the speex rules explicitly force "-O2" |
With only "-O2" flag set, I have |
When you make a debug build you don't get the -O2 flag but you do when you make a release build. Any speed comparisons should be done against release mode builds. Make a release build by using this configure line:
This is how packages in distros such as Debian build. |
Ok. So based on those results, it looks like we should be good to go with the patch as-is. Are there any other tests that should be done before we move forward? |
Thank you for your work on this! I haven't had the chance to thoroughly review your patches, but if everyone else is happy, please go ahead. :) |
@jiixyj : Thanks for your consideration. For those following along, jiixyj is referring to 9eda37f Previously, the return value for loudness_shortterm() and loudness_momentary() was EBUR128_SUCCESS - even when there was insufficient data. The functions would always perform the calculations as if st->d->audio_data was full of valid data. That means that sometimes the loudness measurement would be -HUGE_VAL, and sometimes it would just be a number that doesn't represent anything. Because st->d->audio_data is allocated using malloc(), you can't even assume that the loudness measurement represents silence leading up to the data that is available - because malloc() might return a pointer with random data in it. My patch only applies to loudness_shortterm() and loudness_momentary(). After the patch, if either function is called before enough valid data has been received, the return value is EBUR128_ERROR_INSUFFICIENT_DATA and the loudness parameter is returned unchanged. I have checked the specifications and they don't define a meaning for shortterm and momentary loudness before 3000ms and 400ms have elapsed (respectively). So I don't know that there is a right answer of what to return in that scenario. Logically, I suppose, one could just calculate the loudness for the data that has been received so far. I'd be willing to make that patch if people prefer. If the advantage of the patch does not outweigh the disadvantage (API change), I am OK if it is excluded. In my own application, I would have to keep track of how many samples have been provided and ignore the value until I have provided sufficient data. We could also just update the documentation to explain that the functions should not be used until sufficient data has been provided. |
@bmatherly has a very good point with raising this EBUR128_ERROR_INSUFFICIENT_DATA. It is a good way to deal with this issue. The hardware realtime meter I use doesn't display a value until there's sufficient data (ie it will not display a value for short-term for the first three seconds after reset). |
Hi. I added two more commits to this pull request. d82b54b fixes a typo/bug from a1bcda2. Feel free to squash those two commits if you want. 8cc762d implements prev_sample_peak() and prev_true_peak() which provide the respective measurements for only the data from the previous call to add frames. I consider this the last feature that is needed to implement a real-time EBU compliant loudness meter. With this feature, it is possible to provide a real time display of peak and true_peak measurements in addition to the maximum peak for the entire program. I hope you will consider including these. I think I am done adding features. I can't think of any other features that would be required for my real-time loudness meter. Of course, if I find any bugs, I'll pass the fixes along. |
I've merged the interpolation patch into the review-pr-49 branch. Thank you again for the resampler, finally no more Speex dependency! |
Haven't had a chance to test yet, but this is a welcome addition! |
The new resampler is in master now. This PR can be closed if I didn't miss anything. |
Thanks again for merging the interpolator. I think it is a good step forward. I see you excluded this commit: Is it still under consideration? Shall I submit another pull request? Without that commit, it is not possible to use libebur128 for a real-time peak/true-peak meter because the value will only ever go up when a new maximum is encountered. |
Ah, I must have missed this commit. Could you submit another PR? |
PR for previous peak/true peak functions: |
This pull request builds on top of: #49
The only commit this request adds is: 48924de
Also, it passes all tests from: #50