# EE 519 — Speech AI 
## HW-1 | Notebook 3: Time Operations — Delay, Echo, Reversal (Global & Short-Time)

**Student Name:**  
**USC ID:**  
**Date:**  

---

### Learning Objectives
By completing this notebook, you will:
- Apply **time-domain operations** to speech (delay, echo, reversal)
- Connect time operations to perceptual effects (echo vs reverb-like smear)
- Compare **global time reversal** vs **short-time reversal**
- Develop intuition for **short-time processing** and (approx.) stationarity in speech

> ⚠️ **Important**
> - All answers (code + explanations) must be written **inside this notebook**
> - Do **not** delete questions or prompts
> - Clearly label all plots (title, axes, units)
> - Audio must be playable inline where requested


### Grading (Notebook 3 — 20 points)

| Component | Points |
|---|---:|
| Reproducible audio setup + clean I/O | 3 |
| Delay + echo experiments (plots + listening) | 6 |
| Global reversal analysis (plots + explanation) | 4 |
| Short-time reversal analysis (plots + explanation) | 5 |
| Clarity, organization, reflections | 2 |

> We grade **understanding and reasoning**, not perfection.


---

# 0. Setup

This notebook must run **quickly and reproducibly** for grading.

## ✅ Reproducibility requirements (very important)
- Put audio files in the **same folder as this notebook** or inside `./audio/`
- Use **relative paths only**
- Do not rely on cloud mounts or absolute paths
- Notebook should run top-to-bottom after we download your ZIP

Recommended structure:
```
HW1/
├── HW1_Notebook3_Time_Operations_Echo_Reversal.ipynb
└── audio/
    └── x.wav
```

## Data requirement
You need **one** speech recording `x` (sentence recommended).
You may:
- Record it in Python (optional) and save to `./audio/x.wav`, OR
- Record externally and place it at `./audio/x.wav`


In [None]:
# TODO: Libraries
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import Audio, display


In [None]:
# TODO: Path (use relative paths only)
X_PATH = "./audio/x.wav"


In [None]:
# OPTIONAL recording (only if your system supports it):
# If recording doesn't work, skip this and use an external WAV file.
#
# !pip install sounddevice soundfile  # (uncomment if needed/allowed)
# import sounddevice as sd
# import soundfile as sf
# from pathlib import Path
#
# Path("./audio").mkdir(exist_ok=True)
# fs_rec = 16000
# duration_sec = 5
# print("Recording... Speak your sentence now.")
# x = sd.rec(int(duration_sec*fs_rec), samplerate=fs_rec, channels=1)
# sd.wait()
# x = x.squeeze()
# sf.write("./audio/x.wav", x, fs_rec)
# print("Saved to ./audio/x.wav")


In [None]:
# TODO: Implement or reuse WAV loader (mono float in [-1, 1])
def load_wav(path):
    raise NotImplementedError("Implement load_wav(path)")


In [None]:
# TODO: Load signal
# x, fs = load_wav(X_PATH)
# print(fs, len(x)/fs, x.min(), x.max())
# display(Audio(x, rate=fs))


In [None]:
# TODO: Helper functions
def plot_waveform(x, fs, title, tlim=None):
    raise NotImplementedError

def play_audio(x, fs):
    display(Audio(x, rate=fs))

def magnitude_spectrum(x, fs):
    """Return f (Hz) and magnitude (dB or linear)."""
    raise NotImplementedError


---

# 1. Time Delay and Echo (Single-Tap FIR Intuition)

An **echo** can be created by adding a delayed and scaled copy of the signal:

\[
y[n] = x[n] + \alpha x[n-D]
\]

Where:
- \(D\) is the delay in samples (e.g., 50 ms, 100 ms, 300 ms)
- \(\alpha\) is the echo gain (e.g., 0.2–0.7)

### Task
Create echoes with delays:
- 50 ms
- 100 ms
- 300 ms

For each delay, test at least **two** gains (e.g., 0.3 and 0.6).


In [None]:
# TODO: Implement delay operator
def delay_signal(x, D):
    """Delay by D samples: y[n] = x[n-D] with zero padding."""
    raise NotImplementedError


In [None]:
# TODO: Implement echo creator
def add_echo(x, fs, delay_ms, alpha):
    """Return echoed signal y = x + alpha * delayed(x)."""
    raise NotImplementedError


In [None]:
# TODO: Run echo experiments
# delays_ms = [50, 100, 300]
# alphas = [0.3, 0.6]
# results = {}  # e.g., results[(delay_ms, alpha)] = y


## 1.1 Visualize echoes

For at least one region containing speech:
- Plot original and echoed signal (overlay) for each condition
- Use the same time window for fair comparison


In [None]:
# TODO: Plot overlays for echo conditions
# Pick a time window: (t_start, t_end)
# plot_waveform(x, fs, "Original", tlim=(..., ...))
# for (delay_ms, alpha), y in results.items():
#     plot_waveform(y, fs, f"Echo: {delay_ms}ms, alpha={alpha}", tlim=(..., ...))


## 1.2 Listen: echo vs reverb-like smear

Play each echoed version and describe:
- When does the echo become a distinct repetition?
- When does it feel like “thickening”/reverberation instead?

> Use headphones if possible.


In [None]:
# TODO: Listening test
# for (delay_ms, alpha), y in results.items():
#     print(f"Echo: {delay_ms} ms, alpha={alpha}")
#     play_audio(y, fs)


### Observations (Echo)

Answer in 8–12 lines total:

- Which delay/gain combination sounded like a **distinct echo**?
- Which combinations sounded more like **reverb / thickening**?
- What role does \(\alpha\) play perceptually?
- Did any condition cause clipping? If yes, how did you handle it?


---

# 2. Global Time Reversal

Global time reversal is:
\[
y[n] = x[-n]
\]
In discrete time (finite signals), reversing the array implements this.

### Task
- Create a fully time-reversed version of your speech signal
- Compare waveform, spectrum, and intelligibility


In [None]:
# TODO: Global reversal
# y_rev = x[::-1]


In [None]:
# TODO: Plot + play global reversal
# plot_waveform(x, fs, "Original (full)")
# plot_waveform(y_rev, fs, "Global time reversal (full)")
# play_audio(y_rev, fs)


### Conceptual Questions (Global Reversal)

Answer clearly in complete sentences:

1. Is the reversed speech intelligible? Why or why not?  
2. Which properties remain the same after reversal (duration, energy distribution, magnitude spectrum)?  
3. Which properties fundamentally change (temporal cues, attack/decay patterns, coarticulation cues)?  
4. Does the **magnitude spectrum** change under reversal? (Explain.)


---

# 3. Short-Time Reversal (Windowed / Local Reversal)

In short-time processing, we operate on short frames (e.g., 20–30 ms).
Short-time reversal reverses samples **within each frame**, while frame order remains unchanged.

### Task
Implement short-time reversal with:
- 20 ms frames
- 30 ms frames
Use a hop size equal to the frame size (non-overlapping) for simplicity.

> Optional extension: try overlapping frames with a window and overlap-add.


In [None]:
# TODO: Implement short-time reversal
def short_time_reverse(x, fs, frame_ms):
    """Reverse samples within each frame of length frame_ms (non-overlapping)."""
    raise NotImplementedError


In [None]:
# TODO: Generate short-time reversed signals
# y_str_20 = short_time_reverse(x, fs, frame_ms=20)
# y_str_30 = short_time_reverse(x, fs, frame_ms=30)


In [None]:
# TODO: Plot + play short-time reversal outputs
# play_audio(y_str_20, fs)
# play_audio(y_str_30, fs)
# Plot a segment for comparison


## 3.1 Spectrogram comparison (recommended)

Compute and plot spectrograms for:
- Original
- Global reversal
- 20 ms short-time reversal
- 30 ms short-time reversal

Use consistent STFT parameters (e.g., 25 ms window, 10 ms hop) for display.


In [None]:
# TODO: Implement a spectrogram helper (STFT magnitude)
def plot_spectrogram(x, fs, title, n_fft=1024, win_ms=25, hop_ms=10):
    raise NotImplementedError


In [None]:
# TODO: Plot spectrograms (4 figures)
# plot_spectrogram(x, fs, "Original")
# plot_spectrogram(y_rev, fs, "Global reversal")
# plot_spectrogram(y_str_20, fs, "Short-time reversal: 20 ms")
# plot_spectrogram(y_str_30, fs, "Short-time reversal: 30 ms")


### Observations (Short-Time Reversal)

Answer in 10–14 lines total:

- Which is more intelligible: 20 ms or 30 ms short-time reversal? Why?
- Why is short-time reversal often **less destructive** than global reversal?
- What does this reveal about speech being “locally stationary” over short frames?
- What changed in the spectrograms vs the original?


---

# 4. Connecting to Short-Time Processing Intuition

Write 8–12 lines:

- Why do we process speech in short frames (20–30 ms)?
- What assumptions are we making about speech within a frame?
- How did your reversal experiments support or challenge those assumptions?


---

# 5. Reflection (Mandatory)

Write thoughtful answers (be specific):

1. What did you learn that you did not expect about **delay/echo**?  
2. What was one surprising perceptual effect you noticed?  
3. Which visualization (waveform vs spectrogram) helped you explain what you heard—and why?  
4. If you had to explain “short-time processing” to a friend in 2–3 sentences, what would you say?  
5. What experiment would you try next (e.g., multiple echoes, feedback echo, overlap-add short-time reversal)?


---

# 6. AI Use Disclosure (Required)

If you used any AI tools (including ChatGPT), briefly describe:
- What you used it for (e.g., debugging, concept clarification)
- What you wrote/changed yourself

*(If you did not use AI, write “No AI tools used.”)*
