# Audio Signal Compression
---
### Mikołaj Leszczuk, Andres Vejar
---
## 1. Purpose
The aim of this applied experiment is to practice audio signal compression standards. In particular, it is required to analyze the impact of the compression ratio (CR), audio quality, and performance (sound quality compared to the file size) for three popular audio codecs:
1. MPEG Audio Layer-3 (MP3)
2. Advanced Audio Coding (AAC)
3. Vorbis

After the analysis, it is required to create simple models of quality for the audio signals compressed by different codecs.

The experiment requires a basic knowledge of audio signal compression methods and codecs.

## 2. Experiment Description
The experiment description is presented as a processing pipeline in the block diagram below:


<div><p>
    <br />
    </p>
<img src="img/diag.svg"  width="500">
</div>

In the diagram, an audio signl $s$ (left) is the source data for the experiment. It represents the *original* audio (without compression) and it will be used as a reference to compare it with compressed versions of the original.
Using this audio source $s$, the first operation block $R$ correspond to the creation of a reconstructed (compressed) audio signal $r$, from $s$. Every compression algorithm require to specify a set of parameters that will affect the results of conversion. In this activity, the parameter to study will be the bit rate [1] of the resulting audio files. In the block diagram, the compression parameters are designated as $\theta$, and the compression operation can be defined by $R(\theta,s) = r$. 

To compare the compression ratio with respect to the audio quality of the reconstructed file $r$, objective metrics for distortion are used, that consider the variations between the original audio $s$ and the compressed audio $r$. This numerical evaluation is represented by the block $D$ in the diagram, with $D(s,r) = d$. For example, the absolute error $\text{AE}[k]$ can be calculated for each audio sample $k=1, 2, \ldots N_{\text{samples}}$:
$$ \text{AE}[k] =  \left | s[k] - r[k] \right | $$
To obtain a single indicator of distortion for the full signal, summary statistics can be used, for example the sum of the absolute errors ($\text{AE}$) or the mear absolute error $ (\text{MAE})$:

$$ \text{AE}(s,r) = 
\sum_{k=1}^{N_{\text{samples}}} \text{AE}[k]$$

$$ \text{MAE}(s,r) = 
\frac{1}{N_{\text{samples}}} 
\sum_{k=1}^{N_{\text{samples}}} \text{AE}[k]$$

To consider subjective measures of quality, a human observer needs to decide which value of quality to set for the reconstructed (compressed) audio signal $r$. This scoring decision is represented by the block $\text{MOS}$, where $\text{MOS}(r) = m$. Using the absolute category ranking, there are 5 posible outputs values mapped as:

* 5 :	Excellent
* 4 :	Good
* 3 :	Fair
* 2 :	Poor
* 1 :	Bad 

The values $(\theta,d,m)$ can be used to analyze the impact of the audio compression rate on the audio quality. The empirical analysis can be done using a collection of data points:
$(\theta_0,d_0,m_0), (\theta_1,d_1, m_1), \ldots$ to generate a scatter plot of $d$ versus $\theta$ and $m$ versus $\theta$. 
The expected result is that in general, the distortion value will grow with higher compression ratios and that the MOS will fall with higher compression ratios.

## 3. Tasks

1. **Load the audio file and visualize the waveform**
   * Import the audio file into the notebook.
   * Plot the raw waveform data (amplitude vs. time) to get an initial sense of the signal.
1. **Analyze the audio signal**
   * Extract basic metadata (e.g., sample rate, number of channels, duration).
   * Compute and display a short-time Fourier transform (spectrogram) to observe frequency components over time.
1. **Perform basic pre-processing**
   * Normalize the audio signal if needed.
   * If the signal is stereo, optionally mix down to mono for simpler processing (or handle each channel separately).
1. **Implement a simple compression technique**
   * Demonstrate a transform-based approach (e.g., DCT or wavelet transform).
   * Retain only a subset of coefficients to compress the audio (e.g., by thresholding or quantization).
   * Record and discuss the chosen compression ratio.
1. **Reconstruct (decompress) the audio**
   * Invert the transform to rebuild the audio signal.
   * Save or play back the decompressed audio to subjectively assess quality differences compared to the original.
1. **Evaluate compression quality**
   * Compute objective metrics (e.g., SNR, RMSE, or perceptual metrics if available) between the original and decompressed signals.
   * Summarize the trade-off between file size (or bit rate) and perceived audio quality.
1. **Experiment with different parameters**
   * Change quantization levels, threshold values, or transform block sizes.
   * Observe the impact on audio quality and compression ratio.
   * Document any patterns or insights (e.g., at which point the compression becomes audibly or visibly degraded).
1. **Compare compression methods** *(optional)*
   * If multiple compression methods are provided or can be implemented (e.g., wavelet vs. DCT, or a built-in library like MP3/AAC), compare results in terms of fidelity, compression ratio, and computational complexity.
   * Plot the differences in quality metrics (e.g., SNR) for each method to visualize performance trade-offs.
1. **Discussion and conclusions**
   * Summarize findings on how compression methods affect audio quality and size.
   * Propose possible improvements or next steps (e.g., more advanced transforms, psychoacoustic modeling, different bit rates).

In [None]:
pip install --upgrade ipympl librosa

In [None]:
%matplotlib widget
from compaud import run_gui
run_gui()

# References
[1] *Bit rate*. Wikipedia. Retrieved March 07, 2023, from __[https://en.wikipedia.org/wiki/Bit_rate](https://en.wikipedia.org/wiki/Bit_rate)__.
