# Module 2: Time-Domain Analysis

### Concepts

This module dives into the analysis of audio signals in the time domain, which involves studying how the signal behaves over time. Key concepts covered in this module are:

#### Amplitude Envelope (Attack, Decay, Sustain, Release):
Understanding how the amplitude of a signal changes over time, which is essential for describing the envelope of musical sounds, such as how a note is played, sustained, and then released.

#### Zero-Crossing Rate & RMS Energy:

- **Zero-Crossing Rate (ZCR):** Measures how often a signal crosses the zero axis, which is commonly used for distinguishing between different types of sounds, such as speech or noise.
- **Root Mean Square (RMS) Energy:** Helps quantify the power or loudness of a signal over time, useful in detecting the intensity of sound signals.

#### Autocorrelation for Pitch Detection:
Autocorrelation is a technique used for pitch detection, especially in speech and music analysis. It is used to identify periodicity within a signal by measuring its correlation with itself over time.

---

### Notebook Demos

The notebook demos will allow the users to visualize and experiment with these concepts.

#### Compute and Plot the Envelope of a Drum Hit:

- **Goal:** Visualize the attack, decay, sustain, and release stages of a drum hit.
- **User Action:** Users can use a slider to adjust the window size and observe how the envelope changes.

#### Listen to Segments with High vs. Low Zero-Crossing Rate (ZCR):

- **Goal:** Compare segments of audio that have different ZCR values.
- **User Action:** Users will hear the difference between sounds with high ZCR (e.g., noise, cymbals) vs. low ZCR (e.g., speech, sustained notes).

#### Pitch Detection: Compare Autocorrelation vs. librosa.pyin on a Sustained Note:

- **Goal:** Understand how pitch detection works using two methods: autocorrelation and the `librosa.pyin` method, which is specifically designed for pitch tracking in musical signals.
- **User Action:** Users will compare the results of pitch detection using both methods on a sustained note.

---

### Exercise

#### Build a Simple Voice-Activity Detector Using Energy Thresholding

- **Objective:** Create a voice-activity detector (VAD) that can distinguish between periods of speech and silence based on energy thresholding.

The VAD will use energy levels to detect if speech is present in an audio signal.

##### Steps:
1. Load an audio clip (e.g., a speech recording).
2. Compute the RMS energy for the signal over time.
3. Set an energy threshold to distinguish speech from silence.
4. Visualize the VAD results by plotting the energy over time and marking the speech segments.

This exercise will help users understand how energy-based methods can be applied to detect voice activity in audio recordings.

---

### Summary

Time-domain analysis is critical for audio signal processing, where we look at how signals evolve over time.

In this module, you will learn about amplitude envelope, zero-crossing rate, RMS energy, and autocorrelation techniques for pitch detection.

The exercises will provide hands-on experience with detecting speech, visualizing energy envelopes, and comparing pitch detection methods.

---

### Next Steps:
For Module 2, follow the notebook demos and exercises to explore time-domain features like amplitude envelopes, ZCR, RMS energy, and autocorrelation.


## Demo: Interactive Amplitude Envelope Explorer

In this demo you will:

1. **Choose a sound**  
   Use the dropdown menu to select one of three drum-hit samples.

2. **Play the sound**  
   Click the **▶️ Play** button to listen to the selected audio clip.

3. **Visualize its envelope**  
   Click the **📈 Plot Envelope** button to compute and display its RMS amplitude envelope (with a fixed window size of 2048 samples).  

Feel free to switch sounds at any time and replay or re-plot to see how each drum hit’s attack, decay and release differ in the time domain!


In [None]:
import librosa
import matplotlib.pyplot as plt
import ipywidgets as widgets
from IPython.display import Audio, display, clear_output
from pathlib import Path

# --- configuration -----------------------------------
SOUNDS_DIR = Path('sounds')
FILES = {
    'Drum Hit 1': 'drum_hit1.wav',
    'Drum Hit 2': 'drum_hit2.wav',
    'Drum Hit 3': 'drum_hit3.wav'
}

WINDOW_SIZE = 2048  # you can expose this later as a slider if you like

# --- widgets -----------------------------------------
sound_dropdown = widgets.Dropdown(
    options=list(FILES.keys()),
    value='Drum Hit 1',
    description='Sound:'
)

play_button = widgets.Button(
    description='▶️ Play',
    button_style='info'
)
plot_button = widgets.Button(
    description='📈 Plot Envelope',
    button_style='primary'
)

output = widgets.Output()

# --- helper functions -------------------------------
def load_audio(name):
    path = SOUNDS_DIR / FILES[name]
    return librosa.load(str(path), sr=None)

# --- callbacks --------------------------------------
def on_play_clicked(_):
    with output:
        clear_output(wait=True)
        y, sr = load_audio(sound_dropdown.value)
        display(Audio(data=y, rate=sr, autoplay=False))

def on_plot_clicked(_):
    with output:
        clear_output(wait=True)
        y, sr = load_audio(sound_dropdown.value)
        hop = WINDOW_SIZE // 2
        rms = librosa.feature.rms(
            y=y,
            frame_length=WINDOW_SIZE,
            hop_length=hop
        )[0]
        t = librosa.times_like(rms, sr=sr, hop_length=hop)
        plt.figure(figsize=(10, 3))
        plt.plot(t, rms, alpha=0.8)
        plt.title(f"{sound_dropdown.value} envelope (window={WINDOW_SIZE})")
        plt.xlabel("Time (s)")
        plt.ylabel("RMS energy")
        plt.grid(True)
        plt.show()

# wire up events
play_button.on_click(on_play_clicked)
plot_button.on_click(on_plot_clicked)

# --- layout -----------------------------------------
ui = widgets.VBox([
    sound_dropdown,
    widgets.HBox([play_button, plot_button]),
    output
])

display(ui)


## Demo: Zero-Crossing Rate (ZCR) Comparison

In this demo you will:

1. **Choose a sound**  
   Select either a low-ZCR example (speech) or a high-ZCR example (noise/cymbal) from the dropdown.

2. **Play the sound**  
   Click the **▶️ Play** button to listen to the selected clip.

3. **Plot its ZCR**  
   Click the **📈 Plot ZCR** button to compute and display the frame-wise zero-crossing rate over time, showing how often the waveform crosses the zero axis in each frame.

Explore how speech (low ZCR) and noise (high ZCR) differ in their time-domain behavior!  


In [None]:
import numpy as np
import soundfile as sf
import librosa
import matplotlib.pyplot as plt
import ipywidgets as widgets
from IPython.display import Audio, display, clear_output
from pathlib import Path

# --- configuration -----------------------------------
SOUNDS_DIR = Path('sounds')
FILES = {
    'Low ZCR (low_noise.mp3)':  'low_noise.mp3',
    'High ZCR (high-zcore.mp3)': 'high-zcore.mp3'
}

FRAME_LENGTH = 2048
HOP_LENGTH   = FRAME_LENGTH // 2

# --- widgets -----------------------------------------
sound_dropdown = widgets.Dropdown(
    options=list(FILES.keys()),
    value='Low ZCR (low_noise.mp3)',
    description='Sound:'
)

play_button = widgets.Button(description='▶️ Play',    button_style='info')
plot_button = widgets.Button(description='📈 Plot ZCR', button_style='primary')
output      = widgets.Output()

# --- helper functions -------------------------------
def load_audio(name):
    """Read WAV via soundfile or MP3 via librosa."""
    path = SOUNDS_DIR / FILES[name]
    ext  = path.suffix.lower()
    if ext == '.wav':
        y, sr = sf.read(str(path), dtype='float32')
    else:
        y, sr = librosa.load(str(path), sr=None)
    # if stereo, take left channel
    if y.ndim > 1:
        y = y[:,0]
    return y, sr

# --- callbacks --------------------------------------
def on_play_clicked(_):
    with output:
        clear_output(wait=True)
        y, sr = load_audio(sound_dropdown.value)
        display(Audio(data=y, rate=sr, autoplay=False))

def on_plot_clicked(_):
    with output:
        clear_output(wait=True)
        y, sr = load_audio(sound_dropdown.value)
        zcr = librosa.feature.zero_crossing_rate(
            y, frame_length=FRAME_LENGTH, hop_length=HOP_LENGTH
        )[0]
        t = librosa.frames_to_time(np.arange(len(zcr)), sr=sr, hop_length=HOP_LENGTH)
        plt.figure(figsize=(10, 3))
        plt.plot(t, zcr, alpha=0.8)
        plt.title(f"{sound_dropdown.value} — Zero-Crossing Rate")
        plt.xlabel("Time (s)")
        plt.ylabel("ZCR")
        plt.grid(True)
        plt.show()

# --- wire up events -------------------------------
play_button.on_click(on_play_clicked)
plot_button.on_click(on_plot_clicked)

# --- layout -----------------------------------------
ui = widgets.VBox([
    sound_dropdown,
    widgets.HBox([play_button, plot_button]),
    output
])
display(ui)


## Pitch Detection Comparison

> **Note:** On some machines, clicking **📈 Plot Pitch** may take **30–60 seconds** to complete. Please be patient after you click!

In this demo, you'll compare two different pitch-detection methods on a sustained note:

- **Autocorrelation**: a simple time-domain method that finds periodicity by correlating the signal with itself.  
- **librosa.pyin**: a state-of-the-art probabilistic estimator designed for musical pitch tracking.

**Instructions:**
1. Use the **Sound** dropdown to select the sustained note.  
2. Click **▶️ Play** to listen.  
3. Click **📈 Plot Pitch** to visualize both pitch contours over time:  
   - **Blue curve** = Autocorrelation  
   - **Orange curve** = librosa.pyin  
4. Observe where the two methods agree or diverge.  


In [None]:
import numpy as np
import librosa
import matplotlib.pyplot as plt
import ipywidgets as widgets
from IPython.display import Audio, display, clear_output
from pathlib import Path

# --- configuration -----------------------------------
SOUNDS_DIR   = Path('sounds')
FILES        = {
    'Pitch 1': 'pitch1.mp3',
    'Pitch 2': 'pitch2.mp3',
    'Pitch 3': 'pitch3.mp3'
}
FRAME_LENGTH = 2048
HOP_LENGTH   = FRAME_LENGTH // 4   # e.g. 512
FMIN, FMAX   = 50, 2000             # search range in Hz

# --- widgets -----------------------------------------
sound_dropdown = widgets.Dropdown(
    options=list(FILES.keys()),
    value='Pitch 1',
    description='Sound:'
)
play_button = widgets.Button(description='▶️ Play',       button_style='info')
plot_button = widgets.Button(description='📈 Plot Pitch', button_style='primary')
output      = widgets.Output()

# --- helper functions -------------------------------
def load_audio(name):
    path = SOUNDS_DIR / FILES[name]
    y, sr = librosa.load(str(path), sr=None)
    return y, sr

def pitch_autocorr(y, sr, frame_length, hop_length, fmin, fmax):
    # Pad and frame
    y_pad = np.pad(y, (frame_length//2, frame_length//2), mode='reflect')
    frames = librosa.util.frame(y_pad, frame_length=frame_length, hop_length=hop_length).T
    pitches = []
    for frame in frames:
        frame = frame - np.mean(frame)
        ac = librosa.autocorrelate(frame, max_size=frame_length)
        ac[:1] = 0
        i_min = int(sr / fmax)
        i_max = min(int(sr / fmin), len(ac))
        lag   = np.argmax(ac[i_min:i_max]) + i_min
        pitches.append(sr / lag)
    times = librosa.frames_to_time(np.arange(len(pitches)), sr=sr, hop_length=hop_length)
    return np.array(pitches), times

# --- callbacks --------------------------------------
def on_play_clicked(_):
    with output:
        clear_output(wait=True)
        y, sr = load_audio(sound_dropdown.value)
        display(Audio(data=y, rate=sr, autoplay=False))

def on_plot_clicked(_):
    with output:
        clear_output(wait=True)
        y, sr = load_audio(sound_dropdown.value)

        # Autocorrelation pitch
        pitch_ac, t_ac = pitch_autocorr(
            y, sr,
            FRAME_LENGTH, HOP_LENGTH,
            FMIN, FMAX
        )

        # librosa.pyin pitch
        f0_py, voiced_flag, voiced_prob = librosa.pyin(
            y, fmin=FMIN, fmax=FMAX, sr=sr,
            frame_length=FRAME_LENGTH, hop_length=HOP_LENGTH
        )
        t_py = librosa.times_like(f0_py, sr=sr, hop_length=HOP_LENGTH)

        # Plot both contours
        plt.figure(figsize=(10, 3))
        plt.plot(t_ac,  pitch_ac, label='Autocorrelation', alpha=0.8)
        plt.plot(t_py,  f0_py,    label='librosa.pyin',    alpha=0.8)
        plt.title(f"{sound_dropdown.value} — Pitch Detection")
        plt.xlabel("Time (s)")
        plt.ylabel("Frequency (Hz)")
        plt.legend()
        plt.grid(True)
        plt.show()

# --- wire up events -------------------------------
play_button.on_click(on_play_clicked)
plot_button.on_click(on_plot_clicked)

# --- initial clear -------------------------------
with output:
    clear_output()

# --- layout -----------------------------------------
ui = widgets.VBox([
    sound_dropdown,
    widgets.HBox([play_button, plot_button]),
    output
])

display(ui)


## 🛠 Exercise: Build a Simple Voice-Activity Detector (VAD)

**Objective:**  
Implement a VAD that distinguishes speech from silence using RMS-energy thresholding.

---

### Instructions

1. **Load an audio clip**  
   Use `librosa.load()` (or `soundfile.read()`) to load your speech file into a waveform `y` and sample rate `sr`.

2. **Compute RMS energy**  
   Slice your waveform into overlapping frames (use `frame_length=2048`, `hop_length=512`) and compute  
   ```python
   rms = librosa.feature.rms(y=y, frame_length=frame_length, hop_length=hop_length)[0]
   times = librosa.frames_to_time(np.arange(len(rms)), sr=sr, hop_length=hop_length) ### Choose an energy threshold
- Experiment with different values (e.g. `mean(rms)`, or a fixed value) to separate “speech” vs. “silence.”
  ```

### Detect speech segments
- Create a boolean mask:
  ```python
  speech_mask = rms > threshold
  ```

  ### Find contiguous speech regions
- Identify and group contiguous regions where `speech_mask` is `True`.

### Visualize
- Plot RMS energy vs. time, overlaying the threshold line.
- Shade the speech-active regions using `plt.axvspan(start_time, end_time, alpha=0.3)`.
- Play the clip so you can listen and verify your VAD results.




In [None]:
import numpy as np
import librosa
import matplotlib.pyplot as plt
import ipywidgets as widgets
from IPython.display import Audio, display, clear_output
from pathlib import Path

# — Configuration —
SOUNDS_DIR    = Path('sounds')
AUDIO_FILE    = 'speech.wav'    # replace with your file
FRAME_LENGTH  = 2048
HOP_LENGTH    = FRAME_LENGTH // 4  # 512

# — Widgets —
play_btn      = widgets.Button(description='▶️ Play Audio', button_style='info')
threshold_sl  = widgets.FloatSlider(
    value=0.02, min=0.000, max=0.1, step=0.001,
    description='Threshold:'
)
run_btn       = widgets.Button(description='🖋️ Run VAD', button_style='primary')
out           = widgets.Output()

# — Helper to load & play —
def load_audio():
    path = SOUNDS_DIR / AUDIO_FILE
    y, sr = librosa.load(str(path), sr=None)
    return y, sr

def on_play(_):
    with out:
        clear_output()
        y, sr = load_audio()
        display(Audio(data=y, rate=sr, autoplay=False))

def on_run(_):
    with out:
        clear_output()
        y, sr = load_audio()

        # 1) Compute RMS
        rms = librosa.feature.rms(
            y=y,
            frame_length=FRAME_LENGTH,
            hop_length=HOP_LENGTH
        )[0]
        times = librosa.frames_to_time(
            np.arange(len(rms)),
            sr=sr,
            hop_length=HOP_LENGTH
        )

        # 2) Thresholding
        thresh = threshold_sl.value
        speech_mask = rms > thresh

        # 3) Plot energy + threshold
        plt.figure(figsize=(10, 3))
        plt.plot(times, rms, label='RMS Energy')
        plt.hlines(thresh, times[0], times[-1],
                   colors='r', linestyles='--',
                   label=f'Threshold = {thresh:.3f}')
        # 4) Shade speech regions
        in_speech = False
        start_t = None
        for t, is_sp in zip(times, speech_mask):
            if is_sp and not in_speech:
                in_speech = True
                start_t = t
            if not is_sp and in_speech:
                in_speech = False
                plt.axvspan(start_t, t, color='orange', alpha=0.3)
        # cover tail
        if in_speech:
            plt.axvspan(start_t, times[-1], color='orange', alpha=0.3)

        plt.title('Voice Activity Detection')
        plt.xlabel('Time (s)')
        plt.ylabel('RMS Energy')
        plt.legend()
        plt.grid(True)
        plt.show()

# — Wire up events —
play_btn.on_click(on_play)
run_btn.on_click(on_run)

# — Layout —
display(widgets.VBox([
    widgets.HBox([play_btn, threshold_sl, run_btn]),
    out
]))
