# Warped Linear Prediction Filter

The Enhanced Summary Autocorrelation method for polyphonic pitch detection ([[1]](#References)) starts off with a Warped Linear Prediction (WLP) filter:

>The first block is a pre-whitening filter that is used to remove short-time correlation of the signal. The whitening filter is implemented using a 12th-order warped linear prediction. [...] A WLP filter of 12th order is used here with sampling rate of 22 kHz, Hamming windowing, frame size of 23.2 ms, and hop size of 10.0 ms. Inverse filtering with the WLP model yields the pre-whitened signal.

The combination of WLP and Hamming windowing sound similar to the first step of MFCC ([[2]](#References)):

>Implement short-time Fourier transform (STFT) to the speech signal with a finite-duration window (e.g., a 32 ms Hamming window)

The STFT is used in beat detection ([[3]](#References)), e.g. bpm estimation of a metronome clip. This paper ([[4]](#Rerefences)) discusses how WLP and STFT are both techniques used for the elimination of impulses:

>Impulse noise is characterized by a short burst of acoustic energy of either a single impulse or a series of impulses, with a wide spectral bandwidth. Typical acoustic impulse noises include sounds of machine gun firing, of rain drops hitting a hard surface like the windshield of a moving car, of typing on a keyboard, of indicator clicks in cars, of clicks in old analog recordings, of popping popcorn and so on.

In [[1]](#References), the WLP step is contained in the "pre-whitening" block. Pre-whitening of a signal is a processing step that makes a signal behave more like white noise

There is another use of WLP, which is in compressing signals based on a close approximation of the Bark auditory scale to the range of the human ear, ([[5]](#References)).

In [3]:
import numpy

sr = 48000
freq = 440
duration_sec = 5

t = numpy.linspace(0, duration_sec, sr * duration_sec)
sinewave = numpy.sin(freq * 2.0 * numpy.pi * t)

[ 0.00000000e+00  5.23600933e-04  1.04720172e-03 ... -1.04720172e-03
 -5.23600933e-04 -4.89858720e-15]


In [1]:
import IPython

IPython.display.Audio(data=sinewave, rate=sr)

\begin{equation*}
1 +  \frac{q^2}{(1-q)}+\frac{q^6}{(1-q)(1-q^2)}+\cdots =
\prod_{j=0}^{\infty}\frac{1}{(1-q^{5j+2})(1-q^{5j+3})},
\quad\quad \text{for $|q|<1$}. 
\end{equation*}

## References

[[1] T. Tolonen and M. Karjalainen, "A computationally efficient multipitch analysis model," IEEE Trans. Speech Audio Processing, vol. 8, pp. 708-716, Nov. 2000.](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.334.1573&rep=rep1&type=pdf)

2  - Phase Autocorrelation Bark Wavelet Transform Features for Robust Speech Recognition

3 - Some beat detection paper

[[4] R C Nongpiur "Impulse noise removal in speech using wavelets", Proc ICASSP2008 pp. 1593-1596 Mar. 2008](https://www.ece.uvic.ca/~rnongpiu/Nongpiur_Icassp_2008.pdf)

5 - A. Harma and U. K. Laine, "A comparison of warped and conventional linear predictive coding," in IEEE Transactions on Speech and Audio Processing, vol. 9, no. 5, pp. 579-588, July 2001.