Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bufferless True Peak-analysis #36

Merged
merged 5 commits into from Nov 10, 2020
Merged

Conversation

rawler
Copy link
Contributor

@rawler rawler commented Nov 4, 2020

This is the second of the two TruePeak analysis optimizations. The key optimization here, is avoiding extra memory-copying by not keeping input and output from the upsamling. Every new input-frame is fed immediately to the interpolator, generating 2 or 4 new frames which are immediately checked for new max before being discarded.

The net gain according to my benchmark:

true_peak: 48kHz 2ch f32/Rust
/Interleaved: 545.50 -> 395.99 (-27.4%)
/Planar: 556.07 -> 424.97 (-23.6%)

true_peak: 48kHz 2ch i16/Rust
/Interleaved: 550.95 -> 436.63 (-20.7%)
/Planar: 579.58 -> 476.53 (-17.8%)

As a nice bonus, it also cleans up a lot of code from the previous step of optimization, causing a significant net reduction of code.

8 files changed, 462 insertions(+), 363 deletions(-)

Some long-running quickcheck-runs showed that the difference between f64
and f32 calculations can be as high as 0.00000386. Increase the allowed
error-margin to avoid spurious failures
Allows rapidly iterating the sample-buffers, one dasp::Frame at a time
@rawler rawler force-pushed the bufferless-true-peak branch 3 times, most recently from 739007a to 21a1ed6 Compare November 4, 2020 22:59
@sdroege
Copy link
Owner

sdroege commented Nov 5, 2020

Thanks, this seems great. I'll take a proper look this weekend :)

@sdroege
Copy link
Owner

sdroege commented Nov 7, 2020

Is it ready for review now? I saw you fixed up/improved various things in the meantime :)

src/interp.rs Outdated Show resolved Hide resolved
@rawler
Copy link
Contributor Author

rawler commented Nov 7, 2020 via email

@rawler
Copy link
Contributor Author

rawler commented Nov 7, 2020

I'd say it's ready for review. I'm still looking for ways to improve performance further, but this is good to merge as-is (I'll look into the squashing-topic though)

src/interp.rs Outdated Show resolved Hide resolved
src/interp.rs Outdated Show resolved Hide resolved
src/ebur128.rs Show resolved Hide resolved
src/interp.rs Outdated Show resolved Hide resolved
src/utils.rs Show resolved Hide resolved
src/interp.rs Show resolved Hide resolved
src/interp.rs Outdated Show resolved Hide resolved
src/interp.rs Outdated Show resolved Hide resolved
src/interp.rs Show resolved Hide resolved
Copy link
Owner

@sdroege sdroege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good to me, thanks a lot :)

Can you add some details to the last commit with the rolling buffer about what kind of optimizations this allows, i.e. what you saw happening in practice here? I assume it simply allows auto-vectorization to kick in at all for this code or is there more to it?

@rawler
Copy link
Contributor Author

rawler commented Nov 9, 2020

You're welcome. Thanks for all the other code I did not have to write :) I think all the feedback is addressed now. Please have a look again.

- Split interp::Frame into utils::FrameAcc based on dasp::Frame and
  utils::Samples::foreach_frame
- Push incoming frame:s directly onto the interpolator, one at a time, and
  check sample-max on resulting frames immediately. This removes the need
  for input and output-buffering.
- Cleanup the unused parts of interp.rs
Save samples with shadow-buffering to enable continous fixed-length view
into the buffer. For any offset, there will be a correct continous view of
the entire circular buffer.

This turns the inner loop of filter application from N*4 + M*4, into a
predictable 12*4 operation. This avoids some branching, and gives the LLVM
optimizer better information to work with. (For example, allowing 512-bit
operations)
@sdroege
Copy link
Owner

sdroege commented Nov 10, 2020

You forgot to update benches/interp.rs for push() -> interpolate(). I've updated that now, seems all good to me otherwise and I'll merge once the CI is also happy :)

src/interp.rs Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants