# 📖 🎶 Understanding Quantization with the THX Deep Note 🎶

![](./assets/figures/THX-details.jpg)

![](./assets/figures/thx-logo.webp)

You’ve probably heard the iconic THX Deep Note in movie theaters—a deep, spine-tingling crescendo that fills the room with cinematic magic. But what makes it sound so immersive? Let's break it down with some quantization magic!

## 🧩 What is Quantization?

Quantization in audio and digital signals is like snapping the smooth curves of sound into tiny steps that computers can handle. Think of it as:

- Frequency Quantization: Slicing up the range of pitches (frequencies) into specific levels.
- Amplitude Quantization: Breaking the loudness (volume) into small, fixed units.

This process is essential because computers need numbers, not continuous waves, to process and reproduce sounds.

## 🔧 Hardware Quantization and Oscillators

When the THX Deep Note was first created in 1983, it was generated by analog-to-digital oscillators on a computer. These oscillators were programmed to produce a random mix of frequencies that climbed or descended to form that rich, harmonious crescendo.

Here's where it gets fun:

- Oscillators are like digital musicians, each producing a pure tone (like a single note on a keyboard).
- In the Deep Note, 20 oscillators start at random frequencies within a range and slowly move toward their final frequencies.
- The random starting points give the sound its mysterious, evolving quality.

But there’s a catch: the synthesizer couldn’t handle infinite smooth transitions! The frequencies were quantized, meaning each oscillator could only hit certain predefined steps, leading to a staircase-like journey instead of a perfect glide.

## 🎼 The Randomness Behind the Magic

The THX Deep Note is never _exactly_ the same! Each performance of the original algorithm introduces small random variations in:

- Initial frequencies: The oscillators start at different random pitches, which makes the beginning sound slightly chaotic.
- Timing: The speed at which each oscillator glides to its final frequency can vary.

This randomness gives the Deep Note its unique, almost "alive" quality. Even though it’s a digitally generated sound, it feels dynamic and organic—thanks to the clever use of randomness and quantization.

## 🧠 How Frequency Quantization Shapes the Deep Note

Quantizing the frequencies means:

1. The range of pitches was split into discrete steps that the synthesizer hardware could handle.
2. As oscillators climbed or descended, they “snapped” to the closest quantized step, adding a subtle, mechanical texture to the sound.

Think of it like climbing a staircase instead of sliding up a ramp. The oscillators couldn’t hit every micro-pitch, but they got close enough to create the smooth illusion we hear.

## 🎉 Fun Facts About the Deep Note

- Dynamic Chaos: The Deep Note starts as pure chaos (random pitches) and resolves into a powerful harmony. This transition mirrors the tension and release we love in music!
- Impossible to Play Live: The original Deep Note algorithm uses so many oscillators and random variations that it’s essentially unplayable by human hands. Computers = ultimate rock stars!
- Revised for Modern Tech: In 2015, a new version of the Deep Note was created with modern tools, featuring up to 80 oscillators and higher precision, but the essence of randomness and quantization remained.
- Not Just Audio: The Deep Note isn’t just sound—it’s an experience. The use of quantization ensures the sound feels consistent across theaters, no matter the size of the speakers.

## 💡 Why Does It Matter?

Quantization is the backbone of how we digitize sound. Without it:

- We’d still be stuck with analog recordings, unable to share crystal-clear sounds online.
- Sounds like the THX Deep Note wouldn’t exist in their iconic form.

So next time you hear the THX Deep Note, remember: it’s not just cool audio engineering—it’s oscillators, randomness, and quantization working together to blow your mind!


In [None]:
import os
import matplotlib.pyplot as plt
import numpy as np
from scipy.io.wavfile import read
from IPython.display import Audio, display
from pydub import AudioSegment
import soundfile as sf


def quantize_in_frequency_domain(samples: np.ndarray, n_bits: int) -> np.ndarray:
    """
    Quantize an audio signal in the frequency domain.

    Parameters:
        samples (np.ndarray): The input audio signal (int16 format).
        n_bits (int): Number of bits to use for quantization (4 ≤ n_bits ≤ 16).

    Returns:
        np.ndarray: The quantized audio signal (int16 format).
    """
    # Convert samples to float in range [-1.0, 1.0]
    samples_float = samples.astype(np.float32) / 32768.0

    # Perform FFT to move to the frequency domain
    fft_coefficients = np.fft.fft(samples_float)

    # Quantization in the frequency domain
    # Scale coefficients for quantization
    magnitude = np.abs(fft_coefficients)
    phase = np.angle(fft_coefficients)

    # Number of quantization levels
    levels = 2**n_bits

    # Quantize magnitudes
    quantized_magnitude = np.round(magnitude * (levels / np.max(magnitude))) / (
        levels / np.max(magnitude)
    )

    # Reconstruct quantized coefficients
    quantized_coefficients = quantized_magnitude * np.exp(1j * phase)

    # Perform IFFT to return to the time domain
    quantized_signal = np.fft.ifft(quantized_coefficients).real

    # Convert back to int16
    quantized_int16 = (quantized_signal * 32768.0).astype(np.int16)

    return quantized_int16


for i in [2, 4, 6, 8, 10, 12, 16]:
    # Load MP3
    input_file = "./assets/sounds/THX_Deep_Note.mp3"
    output_wav = f"./assets/sounds/output_frequency_quantized_at_{i}_bits.mp3"
    target_bit_depth = (
        i  # Change this between 4 and 16 for different quantization depths.
    )

    # Load MP3 file with pydub
    audio = AudioSegment.from_mp3(input_file)

    # Extract raw samples as a NumPy array
    channels = audio.channels
    frame_rate = audio.frame_rate
    samples = np.array(audio.get_array_of_samples(), dtype=np.int16)

    # If there are multiple channels, split and process each channel separately
    if channels > 1:
        samples = samples.reshape(-1, channels)
        quantized_samples = np.zeros_like(samples, dtype=np.int16)
        for ch in range(channels):
            quantized_samples[:, ch] = quantize_in_frequency_domain(
                samples[:, ch], target_bit_depth
            )
    else:
        quantized_samples = quantize_in_frequency_domain(samples, target_bit_depth)

    # Convert back to AudioSegment for saving as WAV
    quantized_audio = AudioSegment(
        quantized_samples.tobytes(),
        frame_rate=frame_rate,
        sample_width=2,  # 16 bits = 2 bytes
        channels=channels,
    )

    # Export as WAV
    quantized_audio.export(output_wav, format="mp3")

    print(f"Saved frequency-quantized audio as {output_wav}")

# List of audio files and their labels
audio_files = [
    "./assets/sounds/output_frequency_quantized_at_2_bits.mp3",
    "./assets/sounds/output_frequency_quantized_at_4_bits.mp3",
    "./assets/sounds/output_frequency_quantized_at_6_bits.mp3",
    "./assets/sounds/output_frequency_quantized_at_8_bits.mp3",
    "./assets/sounds/output_frequency_quantized_at_10_bits.mp3",
    "./assets/sounds/output_frequency_quantized_at_12_bits.mp3",
    "./assets/sounds/output_frequency_quantized_at_16_bits.mp3",
]


# Function to convert MP3 to WAV
def convert_mp3_to_wav(mp3_path):
    audio = AudioSegment.from_mp3(mp3_path)
    wav_path = mp3_path.replace(".mp3", ".wav")
    audio.export(wav_path, format="wav")
    return wav_path


# Function to plot waveform, FFT, and embed audio
def plot_waveform_and_fft(file_path):
    if not os.path.exists(file_path):
        print(f"File not found: {file_path}")
        return

    original_file_path = file_path.copy()

    # Convert MP3 to WAV if necessary
    if file_path.endswith(".mp3"):
        file_path = convert_mp3_to_wav(file_path)

    # Read audio file
    rate, data = read(file_path)

    # Convert stereo to mono if necessary
    if len(data.shape) > 1:
        data = np.mean(data, axis=1)

    # Generate waveform plot
    plt.figure(figsize=(10, 4))
    plt.plot(data, color="blue")
    plt.title(f"Waveform: {os.path.basename(file_path)}")
    plt.xlabel("Sample Index")
    plt.ylabel("Amplitude")
    plt.grid()
    plt.tight_layout()
    plt.show()

    # Compute FFT
    N = len(data)
    fft_data = np.fft.fft(data)
    fft_freqs = np.fft.fftfreq(N, 1 / rate)

    # Plot FFT (magnitude spectrum)
    plt.figure(figsize=(10, 4))
    plt.plot(fft_freqs[: N // 2], np.abs(fft_data)[: N // 2], color="red")
    plt.title(f"FFT Magnitude Spectrum: {os.path.basename(file_path)}")
    plt.xlabel("Frequency (Hz)")
    plt.ylabel("Magnitude")
    plt.grid()
    plt.tight_layout()
    plt.show()

    # Embed audio
    display(Audio(original_file_path))


import os
import matplotlib.pyplot as plt
import numpy as np
import panel as pn
from scipy.io.wavfile import read
from IPython.display import Audio
from pydub import AudioSegment

pn.extension()


# Function to convert MP3 to WAV
def convert_mp3_to_wav(mp3_path):
    audio = AudioSegment.from_mp3(mp3_path)
    wav_path = mp3_path.replace(".mp3", ".wav")
    audio.export(wav_path, format="wav")
    return wav_path


# Function to plot waveform, FFT, and embed audio
def plot_waveform_and_fft(file_path):
    if not os.path.exists(file_path):
        return f"File not found: {file_path}"

    # Convert MP3 to WAV if necessary
    if file_path.endswith(".mp3"):
        file_path = convert_mp3_to_wav(file_path)

    # Read audio file
    rate, data = read(file_path)

    # Convert stereo to mono if necessary
    if len(data.shape) > 1:
        data = np.mean(data, axis=1)

    # Generate waveform plot
    plt.figure(figsize=(10, 4))
    plt.plot(data, color="blue")
    plt.title(f"Waveform: {os.path.basename(file_path)}")
    plt.xlabel("Sample Index")
    plt.ylabel("Amplitude")
    plt.grid()
    plt.tight_layout()
    waveform_plot = pn.pane.Matplotlib(plt.gcf(), dpi=100)
    plt.close()

    # Compute FFT
    N = len(data)
    fft_data = np.fft.fft(data)
    fft_freqs = np.fft.fftfreq(N, 1 / rate)

    # Plot FFT (magnitude spectrum)
    plt.figure(figsize=(10, 4))
    plt.plot(fft_freqs[: N // 2], np.abs(fft_data)[: N // 2], color="red")
    plt.title(f"FFT Magnitude Spectrum: {os.path.basename(file_path)}")
    plt.xlabel("Frequency (Hz)")
    plt.ylabel("Magnitude")
    plt.grid()
    plt.tight_layout()
    fft_plot = pn.pane.Matplotlib(plt.gcf(), dpi=100)
    plt.close()

    # Embed audio
    audio_player = pn.pane.Audio(
        file_path, name=f"Audio: {os.path.basename(file_path)}", autoplay=False
    )

    return pn.Column(waveform_plot, fft_plot, audio_player)


# List of audio files and their labels
audio_files = {
    "2 Bits": "./assets/sounds/output_frequency_quantized_at_2_bits.mp3",
    "4 Bits": "./assets/sounds/output_frequency_quantized_at_4_bits.mp3",
    "6 Bits": "./assets/sounds/output_frequency_quantized_at_6_bits.mp3",
    "8 Bits": "./assets/sounds/output_frequency_quantized_at_8_bits.mp3",
    "10 Bits": "./assets/sounds/output_frequency_quantized_at_10_bits.mp3",
    "12 Bits": "./assets/sounds/output_frequency_quantized_at_12_bits.mp3",
    "16 Bits": "./assets/sounds/output_frequency_quantized_at_16_bits.mp3",
}

# Create tabs for each bit depth
tabs = pn.Tabs()
for bit_depth, file_path in audio_files.items():
    tabs.append((bit_depth, plot_waveform_and_fft(file_path)))

# Serve the panel
tabs.servable()