Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please consider using optimized binaural decoders in spherical harmonic form #114

tumbleandyaw opened this issue Jul 15, 2016 · 3 comments


Copy link

@tumbleandyaw tumbleandyaw commented Jul 15, 2016

To me, this will sound more spatially accurate and less spectrally colored than the current "virtual speaker" approach. Some good sounding examples are the ATK synthetic spherical filters, which can be found here: Keep up the great work ! Thanks!


This comment has been minimized.

Copy link

@joslloand joslloand commented Jul 15, 2016

With the ATK project, we've filed this related issue: Support / integration for YouTube / Jump Inspector?

Some more discussion on the topic, from here:

It is useful to know that the ATK's binaural decoders are not virtual loudspeaker decoders. That is, we don't just load HRIRs and treat them as virtual speakers. Instead, we use design optimized 1st-order decoders from (a large subset of) measured HRIRs found in the Listen and CIPIC collections. You can read some of the details at atk-kernels.

The ATK's binaural decoders exist as FIR kernels which combine both the decoding process and the binaural rendering in one FIR filter set. For each ear, there is a kernel for each spherical harmonic:

  • W_left, W_right
  • X_left, X_right
  • Y_left, Y_right
  • Z_left, Z_right

All this filter design is done offline as part of the atk-kernels project. There are many benefits in working in the spherical harmonic domain. Among other things, it means the amount of convolution we need to do in real-time is actually minimal. (Only 8 convolutions.) Also, it means we can do detailed design outside-real-time. (All the frequency and spatial domain optimizations described here.)

A simple solution would be for the ATK project to provide kernels in ACN-SN3D (aka AmbiX), for the task.

Modification of Spatial Media code base for shouldn't be too difficult. Pseudo-code along the lines of this, will do the trick:

out_left =
(in_ACN0 * FIR_left_ACN0) + 
(in_ACN1 * FIR_left_ACN1) + 
(in_ACN2 * FIR_left_ACN2) + 
(in_ACN3 * FIR_left_ACN3)

out_right =
(in_ACN0 * FIR_right_ACN0) + 
(in_ACN1 * FIR_right_ACN1) + 
(in_ACN2 * FIR_right_ACN2) + 
(in_ACN3 * FIR_right_ACN3)

// * is convolution operator

There are numerous benefits to this harmonic convolution approach.... Reducing coloration is just one of them. For HOA, the number of kernels required is equal to the number of harmonics (x2, one set for each ear given an asymmetric head), which significantly reduces the CPU load when compared to the "virtual loudspeaker" method of binaural decoding.


This comment has been minimized.

Copy link

@henk-spook henk-spook commented Jul 22, 2016

Yes, to me the binaural decoders from ATK also sound less coloured and more spatially accurate than the virtual speaker approach. Would be great if these were considered i.s.o. the current virtual speaker decoding. Thanks


This comment has been minimized.

Copy link

@mgorzel mgorzel commented Apr 27, 2017

Thanks for the suggestion tumbleandyaw, joslloand & henk-spook!
We've added that to the resources:



@dcower dcower closed this in bf5b3aa May 3, 2017
dcower added a commit that referenced this issue May 3, 2017
Fixes formatting & links in; Fixes #114
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
4 participants
You can’t perform that action at this time.