<a href="https://colab.research.google.com/github/meetAmarAtGithub/15_Reva_Speech_Analytics/blob/main/Session_2_Leg_2_Audio_Preprocessing_Sampling.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Audio sampling techniques for analysis**

Audio resampling, also known as sample rate conversion or resampling, is the process of changing the sample rate of an audio signal. The sample rate refers to the number of samples taken per second to represent the audio waveform digitally.

Resampling is performed to alter the duration or playback speed of an audio signal or to match the sample rates of different audio sources. It involves modifying the sequence of audio samples to either increase or decrease the number of samples per second.

When upsampling (increasing the sample rate), new samples need to be generated to fill in the gaps between existing samples. This requires interpolation techniques to estimate the new sample values based on the original samples. Conversely, when downsampling (decreasing the sample rate), the process involves discarding some of the original samples to reduce the overall data rate. Filtering is often applied to remove frequencies beyond the new Nyquist limit (half the new sample rate) to avoid aliasing.

Resampling algorithms can vary in complexity and performance. Common interpolation methods include nearest-neighbor, linear interpolation, spline interpolation, sinc interpolation, and polyphase filtering. The choice of resampling algorithm depends on factors such as the desired quality, computational efficiency, and specific requirements of the application.

Audio resampling is widely used in various audio-related applications, including music production, audio editing, multimedia playback, telecommunication systems, digital signal processing, and audio analysis. It allows for time scaling, synchronization, matching sample rates, compatibility, and various other operations that require manipulating the temporal characteristics of audio signals.

**Why audio resampling**

Resampling is often needed in audio analysis for several reasons.By employing resampling techniques, audio analysts can align, synchronize, manipulate, and process audio signals effectively, facilitating various tasks such as feature extraction, classification, transcription, audio effects, and many other applications in audio analysis.

    Matching Sample Rates: Audio signals may have different sample rates, and resampling allows us to bring them to a common sample rate. This is crucial when working with audio data from various sources or when combining audio signals for analysis or processing.

    Time Alignment: Resampling allows for precise time alignment of audio signals. By resampling signals to the same sample rate, we can accurately synchronize and compare different audio segments or perform time-dependent analysis across multiple signals.

    Time Scaling: Resampling enables time scaling of audio signals. It allows us to speed up or slow down the playback speed of audio, which can be useful for tasks like time-stretching, tempo adjustment, audio effects, or synchronization with other media.

    Filtering and Signal Processing: Resampling is often used in signal processing applications. It allows for efficient filtering operations by changing the sample rate and effectively adjusting the frequency response of the signal.

    Downsampling and Upsampling: Resampling is commonly used to downsample or upsample audio signals. Downsampling reduces the sample rate, which can be useful for reducing computational requirements or focusing on specific frequency ranges. Upsampling increases the sample rate, which can enhance the quality or resolution of the signal.

    Interpolation and Extrapolation: Resampling techniques, such as interpolation, are used to estimate the values between existing samples or to extrapolate beyond the available data points. This is useful for tasks like waveform reconstruction, generating synthetic audio, or  enhancing the resolution of the signal.

    Compatibility and Integration: Resampling ensures compatibility between audio signals and systems that have different sample rate requirements. It allows for integration of audio data into different platforms, devices, or software that expect specific sample rates.

**Frequently used audio resampling techniques**

The following ten frequently used audio resampling techniques:

     Nearest Neighbor Resampling: The new samples are selected by picking the nearest existing sample in the original signal.

    Linear Interpolation Resampling: New samples are generated by linearly interpolating between neighboring existing samples.

    Sinc Interpolation Resampling: Sinc interpolation uses a sinc function to interpolate new samples based on the existing samples.

    Polyphase Filtering Resampling: Resampling is achieved through multistage filtering using polyphase filters.

    FFT-based Resampling: Fast Fourier Transform (FFT) is used to perform resampling in the frequency domain.

    Kaiser Window Resampling: Resampling is performed by applying a Kaiser window function to the original samples.

    Lagrange Polynomial Interpolation Resampling: Lagrange polynomials are used to interpolate new samples based on the original samples.

    Cubic Convolution Resampling: Resampling is achieved using a cubic convolution function to interpolate new samples.

    All-Pass Interpolation Resampling: All-pass filters are used to modify the phase of the original samples during resampling.

    Direct-form Filter Resampling: Resampling is performed using direct-form digital filters, such as FIR or IIR filters.

In [None]:
#Sample implementation

In [None]:
import librosa
import numpy as np
from scipy.interpolate import interp1d

In [None]:
# Specify the path to the audio file
audio_path = "D:\\RACE\\Speech Analytics\\Session 1\\Data\\session1_violin-origional.wav"

# Load the audio file
audio, sr = librosa.load(audio_path)

In [None]:
len(audio)

110250

In [None]:
sr

22050

**Nearest neighbour resampling**

Nearest Neighbor Resampling is a simple resampling technique that selects the nearest existing sample in the original audio signal to determine the resampled value. It is based on the assumption that the nearest sample provides a reasonable approximation of the original audio waveform.

Mathematically, the Nearest Neighbor Resampling can be defined as follows:

        Let x[n] be the original audio signal with a sample rate of Fs and y[m] be the resampled signal with a higher sample rate Fs'.
        The resampling ratio is given by Fs' / Fs.
        To compute y[m], where m is the resampled index and n is the original index:


In [None]:
def nearest_neighbor_resample(audio, target_length):
    original_length = len(audio)
    x = np.arange(original_length)
    f = interp1d(x, audio, kind='nearest')
    resampled_audio = f(np.linspace(0, original_length - 1, target_length))
    return resampled_audio

In [None]:
target_sr = int(0.8* sr)
nnrs = nearest_neighbor_resample(audio, target_sr)
print(len(audio), len(nnrs))

110250 17640


**Linear interpolation**

Linear Interpolation Resampling is a technique used to estimate new sample values by interpolating between neighboring existing samples in the original audio signal. It provides a smoother resampled waveform compared to nearest neighbor resampling.

Mathematically, Linear Interpolation Resampling can be defined as follows:

        Let x[n] be the original audio signal with a sample rate of Fs and y[m] be the resampled signal with a higher sample rate Fs'.
        The resampling ratio is given by Fs' / Fs.
        To compute y[m], where m is the resampled index and n is the original index:
        m = round(n * (Fs' / Fs))
        y[m] = (1 - fractional_part) * x[floor(n)] + fractional_part * x[ceil(n)]

In [None]:
def linear_interpolation_resample(audio, target_length):
    original_length = len(audio)
    x = np.linspace(0, original_length - 1, original_length)
    f = interp1d(x, audio, kind='linear')
    resampled_audio = f(np.linspace(0, original_length - 1, target_length))
    return resampled_audio

In [None]:
lirs = linear_interpolation_resample(audio, target_sr)
print(len(audio), len(lirs))

110250 17640


**Polyphase resampling**

Polyphase resampling is a technique used to efficiently perform sample rate conversion or resampling by dividing the process into multiple stages using polyphase filters. It helps reduce the computational complexity and allows for more flexible resampling ratios.

Polyphase resampling involves the following steps:

    Filter Design:
        Design a low-pass filter with a cutoff frequency corresponding to half the desired new sample rate.
        Divide the filter coefficients into multiple sets or phases, each representing a different filter tap group.

    Decimation:
        Apply downsampling to the original audio signal to reduce its sample rate.
        The downsampling factor determines the ratio of the original sample rate to the desired new sample rate.

    Polyphase Decomposition:
        Decompose the low-pass filter into multiple polyphase filter matrices, where each matrix contains the filter coefficients for a specific phase.
        Each polyphase matrix represents a different filter tap group.

    Filtering:
        Apply the polyphase filters to the downsampled audio signal by multiplying the filter coefficients with the corresponding samples.
        This filtering operation is performed for each phase or filter tap group separately.

    Interpolation:
        Apply upsampling to the filtered signal by inserting zeros between the samples to increase the sample rate.
        The upsampling factor determines the ratio of the desired new sample rate to the downsampled sample rate.

    Polyphase Reconstruction:
        Reconstruct the resampled audio signal by combining the interpolated samples from each phase or filter tap group.
        This reconstruction step involves summing the filtered and interpolated samples from each phase.

Polyphase resampling provides a more efficient approach to resampling by utilizing the polyphase filters. It reduces the computational complexity compared to direct resampling techniques and allows for flexible resampling ratios. By dividing the resampling process into multiple stages and utilizing polyphase filter matrices, polyphase resampling achieves accurate sample rate conversion while maintaining computational efficiency, making it well-suited for real-time audio processing and other time-critical applications.

In [None]:
from scipy.signal import resample_poly

def polyphase_resample(audio, target_length, up_factor, down_factor):
    resampled_audio = resample_poly(audio, up_factor, down_factor, axis=0)
    return resampled_audio[:target_length]

In [None]:
pprs = polyphase_resample(audio, target_sr, 5000, 3000)
print(len(audio), len(pprs))

110250 17640


**Fast Fourier Transforms (FFT) based resampling**

FFT (Fast Fourier Transform) resampling is a technique used in audio analysis to resample audio signals by manipulating their frequency content. It involves transforming the audio signal from the time domain to the frequency domain using the Fourier Transform, manipulating the frequency domain representation, and then applying the inverse Fourier Transform to obtain the resampled audio signal in the time domain.

Mathematical Formulation:

    Given an audio signal x[n] of length N, where n represents the discrete time index ranging from 0 to N-1.
    Compute the discrete Fourier transform (DFT) of x[n] using the FFT algorithm to obtain the frequency domain representation X[k], where k represents the discrete frequency index ranging from 0 to N-1.
        X[k] = ∑[n=0 to N-1] (x[n] * exp(-j2πkn/N))
    Manipulate the frequency domain representation X[k] by modifying the amplitudes or phases of the spectral components as desired.
        This can involve adjusting the magnitudes, shifting the frequencies, or applying any other frequency domain processing.
    Apply the inverse discrete Fourier transform (IDFT) to the manipulated frequency domain representation to obtain the resampled audio signal y[n].
        y[n] = (1/N) * ∑[k=0 to N-1] (X[k] * exp(j2πkn/N))

Note that during FFT resampling, the length of the resampled audio signal may differ from the original audio signal length N, depending on the desired resampling ratio. The resampling ratio determines the scaling factor applied to the frequency domain representation.

The FFT resampling technique allows for efficient frequency domain manipulation of audio signals and can be used to change the sample rate, adjust the pitch, or perform other frequency-based modifications. It is commonly employed in various audio analysis and processing tasks.

In [None]:
from scipy.fft import fft, ifft

def fft_resample(audio, target_length):
    audio_fft = fft(audio)
    resampled_audio = np.real(ifft(audio_fft[:target_length]))
    return resampled_audio

In [None]:
fftrs = fft_resample(audio, target_sr)
print(fftrs)
print(len(audio), len(fftrs))

[-0.0824108   0.00693217  0.01062605 ...  0.32677647  0.15200552
 -0.06411109]
110250 17640


**Sinc interpolation resampling**

Sinc interpolation is a high-quality method for resampling audio signals. It uses the sinc function as an interpolation kernel to estimate new sample values between the existing samples in the original signal.

Mathematically, sinc interpolation can be defined as follows:

    Upsampling:
        Let x[n] be the original audio signal with a sample rate of Fs and y[m] be the resampled signal with a higher sample rate Fs'.
        The resampling ratio is given by Fs' / Fs.
        To compute y[m], where m is the resampled index and n is the original index:

        scss

m = round(n * (Fs' / Fs))
y[m] = Σ(x[k] * sinc((m - k) * (Fs / Fs')))

where sinc(x) is defined as sin(pi * x) / (pi * x), and the summation is performed over the range k that covers neighboring samples around m.

In [None]:
from scipy.signal import resample

def sinc_interpolation_resample(audio, target_length):
    resampled_audio = resample(audio, target_length, window='hann', axis=0)
    return resampled_audio

In [None]:
sirc = sinc_interpolation_resample(audio, target_sr)
print(sirc)
print(len(audio), len(sirc))

[-0.05330481  0.01649052 -0.01066211 ...  0.10165904  0.04839436
 -0.03741269]
110250 17640


**Lagrange interpolation**

Lagrange interpolation is a method used for signal resampling that involves fitting a polynomial through a set of known data points to estimate new sample values. It is commonly used in various fields, including audio analysis and processing.

Mathematically, Lagrange interpolation can be defined as follows:

    Upsampling:
        Let x[n] be the original audio signal with a sample rate of Fs and y[m] be the resampled signal with a higher sample rate Fs'.
        The resampling ratio is given by Fs' / Fs.
        To compute y[m], where m is the resampled index and n is the original index:

        m = round(n * (Fs' / Fs))
        y[m] = Σ(x[k] * L(m, k))

        where L(m, k) is the Lagrange interpolation polynomial defined as:


        L(m, k) = Π((m - j) / (k - j)) for j ≠ k, j = 0, 1, ..., N

           where N is the number of neighboring samples used in the interpolation.

Lagrange interpolation is a straightforward method for signal resampling and can provide reasonable results. However, it has some limitations, including the sensitivity to the distribution of data points and the potential introduction of artifacts when there are rapid changes in the signal. Other interpolation methods like spline interpolation or sinc interpolation are often preferred for higher accuracy and better performance in audio resampling applications.

In [None]:
from scipy.interpolate import lagrange

def lagrange_interpolation_resample(audio, target_length):
    original_length = len(audio)
    x = np.arange(original_length)
    f = lagrange(x, audio)
    resampled_audio = f(np.linspace(0, original_length - 1, target_length))
    return resampled_audio

In [None]:
lgirs = lagrange_interpolation_resample(audio, target_sr)
print(lgirs)
print(len(audio), len(lgirs))

KeyboardInterrupt: 

High-quality resampling refers to the process of changing the sample rate of an audio signal while minimizing the introduction of artifacts and preserving the audio's original quality. The samplerate library in Python provides a reliable and efficient way to perform high-quality resampling.

The samplerate library offers different resampling algorithms, including:

    Sinc interpolation: This method uses a sinc function as an interpolation kernel to estimate the new sample values. It provides excellent audio quality with minimal distortion and aliasing artifacts. Sinc interpolation is computationally intensive but produces accurate results.

    Polyphase filtering: Polyphase resampling divides the resampling process into multiple stages using polyphase filters. It reduces computational complexity while maintaining high-quality resampling. Polyphase filtering is particularly useful for real-time applications and efficient resampling.


    Linear interpolation: Linear interpolation estimates new sample values by interpolating between neighboring samples using a straight line. It provides smoother resampling results compared to zero-order hold but may not capture the fine details of the original signal.

    Fast Fourier Transform (FFT) resampling: This method employs the FFT algorithm to convert the audio signal into the frequency domain, perform resampling, and then apply an inverse FFT to obtain the resampled signal. It can provide efficient and accurate resampling, particularly for large sample rate changes.

To perform high-quality resampling using the samplerate library, you need to follow these steps:

    Load the original audio signal using a suitable library like librosa or soundfile.
    Specify the desired sample rate for the resampled signal.
    Choose the appropriate resampling algorithm from the samplerate library.
    Use the selected resampling algorithm to resample the audio signal, providing the original sample rate, desired sample rate, and audio data.
    Save the resampled signal to an audio file or use it for further analysis or processing.

By employing high-quality resampling techniques offered by the samplerate library, you can ensure that the resampled audio maintains its original fidelity and minimizes artifacts introduced during the resampling process.