# EE 519 — Speech AI
## HW-1 | Notebook 4: Fourier Analysis — Sinusoids, DFT/FFT, Spectra of Speech

**Student Name:**  
**USC ID:**  
**Date:**  

---

### Learning Objectives
By completing this notebook, you will:
- Build intuition that **sinusoids are building blocks** of signals
- Implement and interpret **DFT/FFT** for synthetic and speech signals
- Understand how **window length** and **window type** affect spectra
- Connect speech acoustics (harmonics + spectral envelope) to frequency-domain views

> ⚠️ **Important**
> - All answers (code + explanations) must be written **inside this notebook**
> - Do **not** delete questions or prompts
> - Clearly label all plots (title, axes, units)
> - Use **relative paths only** for audio


### Grading (Notebook 4 — 20 points)

| Component | Points |
|---|---:|
| Correct FFT/DFT implementation + clean plots | 6 |
| Synthetic sinusoid experiments (time + freq) | 5 |
| Speech frame spectral analysis + window effects | 6 |
| Conceptual explanations (resolution, leakage, envelope vs harmonics) | 2 |
| Clarity & organization | 1 |

> We grade **understanding and reasoning**, not perfection.


---

# 0. Setup (Reproducibility)

This notebook must run quickly for grading.

## ✅ Reproducibility requirements
- Put audio files in the same folder as the notebook or in `./audio/`
- Use **relative paths only**
- Do not rely on cloud mounts or absolute paths

Recommended structure:
```
HW1/
├── HW1_Notebook4_Fourier_DFT_SpeechSpectra.ipynb
└── audio/
    └── x.wav
```

## Data requirement
You need **one** speech recording `x` (sentence recommended) at `./audio/x.wav`.
You may reuse the same `x.wav` from Notebook 3.


In [None]:
# TODO: Imports
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import Audio, display


In [None]:
# TODO: Audio path
X_PATH = "./audio/x.wav"


In [None]:
# TODO: WAV loader (reuse from Notebook 1/3)
def load_wav(path):
    """Return mono float signal in [-1, 1] and sample rate fs."""
    raise NotImplementedError


In [None]:
# TODO: Load audio
# x, fs = load_wav(X_PATH)
# print(fs, len(x)/fs, x.min(), x.max())
# display(Audio(x, rate=fs))


In [None]:
# TODO: Helper plotting utilities
def plot_waveform(x, fs, title, tlim=None):
    raise NotImplementedError

def magnitude_spectrum(x, fs, window=None):
    """Return f axis (Hz) and magnitude spectrum (linear or dB)."""
    raise NotImplementedError


---

# 1. Sinusoids: Time Domain → Frequency Domain

In this section you will generate:
- A single sine wave
- Multiple sine waves (a sum of tones)

Then you will inspect:
- Time-domain waveform
- Magnitude spectrum

### Notes
- Use a clean sampling rate (e.g., 16 kHz)
- Use a fixed duration (e.g., 1 second)


In [None]:
# TODO: Define a sampling rate and duration for synthetic signals
fs_syn = 16000
duration_sec = 1.0
t = np.arange(int(fs_syn * duration_sec)) / fs_syn


## 1.1 Single sine wave

### Task
Generate a sine wave at a chosen frequency (e.g., 440 Hz).
- Plot 20 ms of the waveform
- Plot its magnitude spectrum
- Identify where the spectral peak occurs


In [None]:
# TODO: Single sine
# f0 = 440
# x_sine = np.sin(2*np.pi*f0*t)


In [None]:
# TODO: Plot waveform (zoom) and spectrum
# plot_waveform(x_sine, fs_syn, "Single sine (zoom)", tlim=(0, 0.02))
# f, mag = magnitude_spectrum(x_sine, fs_syn, window="hann")
# plt.figure(); plt.plot(f, mag); ...


### Questions (Single sine)

1. Where is the peak in the spectrum? Is it exactly at your chosen frequency? Why might it differ slightly?  
2. What happens to the spectrum if you shorten the signal duration to 0.1 seconds? (Try and explain.)


## 1.2 Sum of multiple sines

### Task
Create a signal that is the sum of 3–5 sine waves at different frequencies.
- Plot waveform (zoom)
- Plot spectrum
- Create a small table: frequency → amplitude used → observed peaks


In [None]:
# TODO: Sum of sines
# freqs = [200, 400, 800, 1600]  # example
# amps  = [1.0, 0.6, 0.4, 0.2]   # example
# x_sum = np.zeros_like(t)
# for f0, a in zip(freqs, amps):
#     x_sum += a*np.sin(2*np.pi*f0*t)


In [None]:
# TODO: Plot waveform and spectrum; create your small summary table in markdown


### Questions (Sum of sines)

1. Does the time-domain waveform look “simple” or “complex”?  
2. Why does the frequency-domain view make the signal easier to describe?  
3. If two sine waves are very close in frequency, when do they become hard to separate in FFT? (Connect to resolution.)


---

# 2. DFT/FFT Details: Resolution, Leakage, and Windows

### Key concepts you will explore
- **Frequency resolution** depends on signal length (N) and fs: Δf = fs / N
- **Spectral leakage** occurs when frequencies do not align with FFT bins
- **Windowing** reduces leakage but changes main-lobe width and amplitude

In this section you will run controlled experiments to see these effects.


## 2.1 Resolution experiment

### Task
Generate two sine waves close in frequency (e.g., 1000 Hz and 1050 Hz).
Compute spectra for:
- N = 0.02 sec, 0.05 sec, 0.1 sec (choose 3 lengths)

Describe when the two peaks become separable.


In [None]:
# TODO: Resolution experiment
# f1, f2 = 1000, 1050
# lengths_sec = [0.02, 0.05, 0.1]
# For each length:
#   generate x_len
#   compute spectrum
#   plot and compare


### Questions (Resolution)

- For each length, compute Δf = fs / N and report it.
- Which length gives the clearest separation? Why?


## 2.2 Leakage experiment (bin misalignment)

### Task
Pick a sine frequency that is **not** an integer multiple of Δf (e.g., 997 Hz).
Compute FFT:
- with rectangular window (no window)
- with Hann window

Compare leakage patterns.


In [None]:
# TODO: Leakage experiment
# f0 = 997
# x = ...
# Compute spectra with window=None vs window="hann"
# Plot and compare


### Questions (Leakage & windows)

1. What changes when you apply a Hann window?  
2. What tradeoff do you observe (leakage vs main-lobe width / peak sharpness)?  
3. Why is windowing important for short-time speech spectra?


---

# 3. Speech Spectrum: Harmonics vs Spectral Envelope

Speech contains:
- **Harmonics** (from voiced excitation) → equally spaced by fundamental frequency
- **Spectral envelope** (from vocal-tract filtering / formants)

In this section, you will select a short speech segment and compute its spectrum.


## 3.1 Select a voiced segment

### Task
Pick a 30–50 ms segment from your speech that is clearly voiced (e.g., a vowel region).
- Provide start and end times (seconds)
- Plot waveform zoom
- Play the segment


In [None]:
# TODO: Choose voiced segment times (in seconds)
t_start = None
t_end = None

# Extract:
# n0 = int(t_start*fs); n1 = int(t_end*fs)
# x_seg = x[n0:n1]


In [None]:
# TODO: Plot and play segment
# plot_waveform(x_seg, fs, "Voiced segment (zoom)")
# display(Audio(x_seg, rate=fs))


## 3.2 Spectrum of voiced segment

### Task
Compute and plot the magnitude spectrum of the segment:
- rectangular window
- Hann window

Use the same FFT size for fair comparison.

Then answer questions about:
- harmonic spacing
- envelope shape


In [None]:
# TODO: Spectrum for voiced segment
# f_rect, mag_rect = magnitude_spectrum(x_seg, fs, window=None)
# f_hann, mag_hann = magnitude_spectrum(x_seg, fs, window="hann")
# Plot both (separately or overlay)


### Questions (Voiced speech spectrum)

1. Do you see harmonic peaks? Estimate the spacing (≈ F0). What is your estimated F0?  
2. Describe the spectral envelope: where are the broad peaks (formant-like regions)?  
3. How does windowing change the appearance of the harmonics and envelope?


## 3.3 Unvoiced / fricative segment (optional but recommended)

### Task
Pick a 30–50 ms segment that is unvoiced (e.g., /s/ or /f/ region, if present).
Compute spectrum and compare to voiced.

Answer: how does the spectrum differ?


In [None]:
# TODO: Choose unvoiced segment times and repeat the spectrum analysis
# t_start_u = ...
# t_end_u = ...
# x_seg_u = ...


### Questions (Unvoiced vs voiced)

- What is different about the spectrum of unvoiced speech?
- Do you still see harmonic structure? Why or why not?
- Where is most of the energy concentrated (low vs high frequencies)?


---

# 4. Short-Time Fourier Transform View (Spectrogram) - Optional

### Task
Plot a spectrogram of your full utterance with reasonable parameters:
- window length: ~25 ms
- hop: ~10 ms
- display in dB

Then answer conceptual questions.


In [None]:
# TODO: Spectrogram helper (STFT)
def plot_spectrogram(x, fs, title, n_fft=1024, win_ms=25, hop_ms=10):
    raise NotImplementedError


In [None]:
# TODO: Plot spectrogram
# plot_spectrogram(x, fs, "Spectrogram of utterance")


### Questions (Spectrogram intuition)

1. Where do you see voiced regions vs unvoiced regions in the spectrogram?  
2. How does the spectrogram reveal formants (broad bands)?  
3. If you increase window length to 50 ms, what changes? (Try and explain.)  
4. If you decrease hop size, what changes? (Try and explain.)


---

# 5. Summary: What Fourier View Gives You

Write 8–12 lines:

- What information is easiest to see in time domain?
- What information is easiest to see in frequency domain?
- For speech, why do we often use a short-time (frame-based) spectrum instead of one FFT on the whole signal?


---

# 6. Reflection (Mandatory)

Write thoughtful answers (be specific):

1. What did you learn about FFT/DFT that you did not expect?  
2. What was one surprising result about windows or leakage?  
3. Which plot helped you the most (waveform, spectrum, spectrogram) and why?  
4. If you had to explain “frequency resolution” in two sentences, what would you say?  
5. What is one question you now want to explore further (e.g., formant tracking, pitch estimation)?


---

# 7. AI Use Disclosure (Required)

If you used any AI tools (including ChatGPT), briefly describe:
- What you used it for (e.g., debugging, concept clarification)
- What you wrote/changed yourself

*(If you did not use AI, write “No AI tools used.”)*
