You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was reading the paper https://arxiv.org/pdf/1712.03439.pdf mentioning that they sample the IR at 1024kHz, then resample to 128kHZ and do a low-pass filtering at 80kHz, then resample to the audio frequency of 16kHz
I have a few questions:
from their paper it seems that they seem to use 1024kHz sampling because of the distance between the microphones, would we expect to still have better results even in the single microphone case if we do a higher-resolution computation, then a resampling?
when I do the computation at 1024kHz and resample at 16kHz, the signal is not scaled in the same way, it seems that if I apply a scaling of 1024 / 16 I get comparable values for the max, although the resulting IR do not look the same (they seem slightly "shifted" in time). Is there a better way to do the resampling?
their paper shows how they do efficient OLA computations, is this what is used in pyroomacoustics, and would it require a lot of work to add it if it was needed?
The text was updated successfully, but these errors were encountered:
Hi @maelp , I think you are referencing the RIR creation method in this paper (ref [1] in the one you linked). Essentially, these guys are rediscovering 60+ years of signal processing on their own. I wouldn't recommend this as a reference for your own implementation.
In their method, they need to generate the RIR at a higher frequency because they use impulses that are rounded to the nearest sample. Pyroomacoustics doesn't have this problem because we generate fractional delays directly. No rounding to the nearest sample is ever done. What comes out of the simulator is better than what you'll get using this upsampling method.
Regarding computations, OLA is a completely standard method. The STFT method of pyroomacoustics lets you do it. Although it is a good point that this is not actually what we use to convolve the RIR in the room simulation. Instead we use scipy.signal.fftconvolve which uses FFT for the convolution (Real FFT filtering in the paper linked, Eq 4). So from their paper, an efficient implementation of OLA might be 2x as fast. The problem for us is that the OLA in stft is done in python, so it might be actually slower than fftconvolve. This should be checked.
I was reading the paper https://arxiv.org/pdf/1712.03439.pdf mentioning that they sample the IR at 1024kHz, then resample to 128kHZ and do a low-pass filtering at 80kHz, then resample to the audio frequency of 16kHz
I have a few questions:
from their paper it seems that they seem to use 1024kHz sampling because of the distance between the microphones, would we expect to still have better results even in the single microphone case if we do a higher-resolution computation, then a resampling?
when I do the computation at 1024kHz and resample at 16kHz, the signal is not scaled in the same way, it seems that if I apply a scaling of
1024 / 16
I get comparable values for themax
, although the resulting IR do not look the same (they seem slightly "shifted" in time). Is there a better way to do the resampling?their paper shows how they do efficient OLA computations, is this what is used in
pyroomacoustics
, and would it require a lot of work to add it if it was needed?The text was updated successfully, but these errors were encountered: