New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bufferless True Peak-analysis #36
Conversation
Some long-running quickcheck-runs showed that the difference between f64 and f32 calculations can be as high as 0.00000386. Increase the allowed error-margin to avoid spurious failures
fa7cc07
to
134751c
Compare
Allows rapidly iterating the sample-buffers, one dasp::Frame at a time
739007a
to
21a1ed6
Compare
Thanks, this seems great. I'll take a proper look this weekend :) |
3a3fc20
to
7ed48ec
Compare
Is it ready for review now? I saw you fixed up/improved various things in the meantime :) |
Fair point. :) I'll look into it.
Den lör 7 nov. 2020 kl 15:56 skrev Sebastian Dröge <notifications@github.com
…:
***@***.**** commented on this pull request.
------------------------------
In src/interp.rs
<#36 (comment)>:
> - let imp: Box<dyn Interpolator> = match (taps, factor, channels) {
- (49, 2, 1) => Box::new(specialized::Interp2F::<[f32; 1]>::new()),
- (49, 2, 2) => Box::new(specialized::Interp2F::<[f32; 2]>::new()),
- (49, 2, 4) => Box::new(specialized::Interp2F::<[f32; 4]>::new()),
- (49, 2, 6) => Box::new(specialized::Interp2F::<[f32; 6]>::new()),
- (49, 2, 8) => Box::new(specialized::Interp2F::<[f32; 8]>::new()),
- (49, 4, 1) => Box::new(specialized::Interp4F::<[f32; 1]>::new()),
- (49, 4, 2) => Box::new(specialized::Interp4F::<[f32; 2]>::new()),
- (49, 4, 4) => Box::new(specialized::Interp4F::<[f32; 4]>::new()),
- (49, 4, 6) => Box::new(specialized::Interp4F::<[f32; 6]>::new()),
- (49, 4, 8) => Box::new(specialized::Interp4F::<[f32; 8]>::new()),
- (taps, factor, channels) => Box::new(generic::Interp::new(taps, factor, channels)),
- };
- Self(imp)
+ pub fn new(_taps: usize, _factor: usize, _channels: u32) -> Self {
+ unimplemented!()
This suggests that this commit should be squashed with another one :) This
alone doesn't seem runnable as-is.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#36 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAADLXCEWE7724P6UJQBMDTSOVNZBANCNFSM4TKQV5ZQ>
.
|
I'd say it's ready for review. I'm still looking for ways to improve performance further, but this is good to merge as-is (I'll look into the squashing-topic though) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good to me, thanks a lot :)
Can you add some details to the last commit with the rolling buffer about what kind of optimizations this allows, i.e. what you saw happening in practice here? I assume it simply allows auto-vectorization to kick in at all for this code or is there more to it?
7ed48ec
to
cbd916f
Compare
You're welcome. Thanks for all the other code I did not have to write :) I think all the feedback is addressed now. Please have a look again. |
- Split interp::Frame into utils::FrameAcc based on dasp::Frame and utils::Samples::foreach_frame - Push incoming frame:s directly onto the interpolator, one at a time, and check sample-max on resulting frames immediately. This removes the need for input and output-buffering. - Cleanup the unused parts of interp.rs
Save samples with shadow-buffering to enable continous fixed-length view into the buffer. For any offset, there will be a correct continous view of the entire circular buffer. This turns the inner loop of filter application from N*4 + M*4, into a predictable 12*4 operation. This avoids some branching, and gives the LLVM optimizer better information to work with. (For example, allowing 512-bit operations)
cbd916f
to
7812ea0
Compare
You forgot to update |
This is the second of the two TruePeak analysis optimizations. The key optimization here, is avoiding extra memory-copying by not keeping input and output from the upsamling. Every new input-frame is fed immediately to the interpolator, generating 2 or 4 new frames which are immediately checked for new max before being discarded.
The net gain according to my benchmark:
As a nice bonus, it also cleans up a lot of code from the previous step of optimization, causing a significant net reduction of code.