# 21M.387 Fundamentals of Music Processing
## Problem Set 3: Fourier Transforms

Make sure all your answers and plots display when the code block is run. You can leave word-based answers in code comments or markdown cells.

You may use any fmplib functions from __previous__ units in your answers. You may __not__ use any fmplib functions from the current unit in your answers (unless explicitly noted). But you can use the current unit's fmplib for testing your code.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import IPython.display as ipd
from ipywidgets import interact
import sys
sys.path.append("..")
import fmplib as fmp

plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['image.interpolation'] = 'nearest'
fmp.documentation_button()

## Exercise 1

Consider this function:

$x(t) = A \cos(\omega t + \phi)$.  
with the parameters: $A=3$. $\omega=8 \pi$. $\phi= \pi / 4$.

Use numpy to create a signal $x_1[n]$ that is sampled from the first 4 seconds of $x(t)$ at a sampling rate of $F_s = 100$Hz.

Store your answer in the variable `x1`

In [5]:
# sinusoid params
A = 3
v = 8 * np.pi
ph = np.pi / 4

# sampling params
t = 4
fs = 100

def make_cosine(A, v, ph, t, fs):
    T = np.linspace(0, t, t*fs)
    return A * np.cos(v*T + ph)

x1 = make_cosine(A, v, ph, t, fs)

Use `matplotlib` to make two plots of $x_1$.

In the first graph plot $x[n]$, where the x-axis shows the sample number $n$. Label the $x$ axis appropriately.  
In the second graph plot $x(t)$. The curve will look the same, but the $x$ axis should show time in seconds. Label the $x$ axis appropriately.

In [None]:
# make 2 plots


## Exercise 2

Use numpy's fft function (`np.fft.fft`) to create $X_1[k]$, the Discrete Fourier Transform of $x_1[n]$ from Exercise 1.

Make two plots of $X_1[k]$: the magnitude $\lvert X_1[k] \lvert$ and the phase $\angle X_1[k]$. For the phase plot to look good, set its values to zero for all points where the magnitude of $X_1[k]$ is negligible.


In [None]:
# fft and plots


a) Describe the symmetry properties of the DFT magnitude and phase plots.  
b) How is the energy of the original signal distributed across the frequency bins in the magnitude plot?

Answer:

a) ...

b) ...

c) What is the location, $k$, of the first peak of $\lvert X_1[k] \lvert$ from your plot?  
d) Calculate the frequency $f$ that corresponds to this value of $k$. Show your calculations in python below.

In [None]:
# peak location
X1_k = 0 # fill in correct answer

# calcualte f associated with X1_k
f_k = 0 # fill in correct answer

## Exercise 3

Reproduce the following two signals, $x_{3a}$ and $x_{3b}$ by observing the given plots and creating the correct sinusoidal functions.  
For each signal, use a sampling rate of $F_s = 100$.  
[Hint: test that you got the signals right by plotting them and comparing to the given graphs below]
![](data/ex3a.png)
![](data/ex3b.png)



In [None]:
# replace with correct answers
x3a = np.zeros(0)
x3b = np.zeros(0)

Now create and plot the magnitude of the DFT for signals $x_{3a}$ and $x_{3b}$. 

In [None]:
# two plots


Notice that one plot looks "clean" while the other has "spectral leakage". Why is this the case?

Answer:

## Exercise 4

Note the following discrete-time functions $f_1, f_2, f_3$, and $g_1, g_2, g_3$.  
For each pair $f_n$ and $g_m$ (9 pairs total), compute the similarity measure between the two functions by using the dot product (i.e, summation of point-by-point-products): $\langle f,g\rangle = \sum_{n=0}^{15} f[n]g[n]$.
![](data/ex4a.png)
![](data/ex4b.png)


In [None]:
# print or calcuate the answers here


What does it mean when a similarity measurement is:  
a) highly positive?  
b) highly negative?  
c) zero or close to zero?  

Answer:


## Exercise 5

The plot below is zoomed in to show only the first 50 frequency bins of $\lvert X[k] \lvert$, the magnitude DFT of a signal $x[n]$ which was sampled from $x(t)$ (not shown). $x(t)$ is exactly 1.5 seconds long, and was sampled at a rate of $F_s = 400Hz$.  

![](data/ex5.png)

- What are the frequencies (in Hz) of the prominent sinusoidal components of $x(t)$?
- What are the amplitudes of these sinusoids?
- Is it possible to compute the phase of $x(t)$ from the given data? If so, what is the phase. If not, why not?

Show your calculations in python.



In [None]:
# answers:


## Exercise 6

For the signal $x_6[n]$ below, you will use 3 different ways to find the magnitude and phase of $X_6[k]$, the DFT of $x_6[n]$, at $k=12$. All three methods should produce the same results.

$N=256$ (the length of the $x_6[n]$). Run the code below to load $x_6$.

In [None]:
# load x[n]:
x6  = np.load('data/ex6.npy')
N = len(x6)
k = 12
print(f'N = {N} k = {k}')
plt.figure()
plt.plot(x6);

### Part 1

Create a sinusoid probe $s_{k,\phi}[n] = \cos(2\pi kn/N + \phi)$ with the proper $k$. Take the dot product (ie, find the similarity measure) between this sinusoid and the signal $x_6$. Repeat this process for a large number of different phases $\phi_t$ in the range $\phi_t \in [-\pi, \pi]$ to find the maximum value of $ \langle x_6, s_{k,\phi} \rangle  $. Note the maximum magnitude and phase that created it.

Use the phases array (below) as the phases to test.

In [None]:
# phase quantities to try:
test_phi = np.linspace(-np.pi, np.pi, 10000)

# complete the code here:

phi1 = 0
mag1 = 0

### Part 2

For this part, use two sinusoid functions as probes:
- $s_{1k}[n] = \cos(2\pi kn/N)$
- $s_{2k}[n] = -\sin(2\pi kn/N)$  

Use the dot product as well as the identities discussed in Lecture to derive the magnitude and phase of the DFT from these results.

In [None]:
# complete the code here:

phi2 = 0
mag2 = 0

### Part 3
Finally, take the DFT of the signal (using `np.fft.fft`) to find the magnitude and phase at the given frequency index $k$.

In [None]:
# complete the code here:

phi3 = 0
mag3 = 0

## Exercise 7

Observe the following signal $x_7[n]$. It is a "chirp" signal, where the frequency increases throughout the duration of the signal.

In [None]:
# load x_7[n]:
x7 = np.load('data/ex7.npy')

plt.figure()
plt.plot(x7);
ipd.Audio(x7, rate=8000, normalize=False)

We will now observe the effect of windowing this signal at different points in time with 2 different windows.

Use a centered _rectangular_ window of length $N = 512$ at 3 locations: $n = 600$, $n = 1200$, $n = 3000$. For each windowed signal, plot the magnitude DFT. You can use the function `np.fft.rfft`, which is optimized for real-valued inputs. This function will return an array of length $(1 + N/2)$, thereby removing the redundant information in the DFT.

In [None]:
# plots of the DFT of rectangular windows of x7 at 3 locations:



Now plot the magnitude DFT at the same locations using a _Hann_ window (`np.hanning`) instead of a rectangular window

In [None]:
# plots of the DFT of hanning windows of x7 at 3 locations:



Describe your observations of these plots. What are the differences between the first set and the second set, and why is this happening?

Answer:



## Exercise 8

Consider a signal, 2 seconds long, whose Short Term Fourier Transform (STFT) is computed with the following parameter sets:  
1) $F_s = 22050, N = 1024, H = 256$  
2) $F_s = 48000, N = 1024, H = 512$  
3) $F_s = 8000, N = 2048, H = 1024$  

The windowing in this case is _non-centered_ (meaning, we don't zero-pad the beginning of the signal).

For each STFT above, determine:  
a) the frequency resolution of the STFT (in Hertz)  
b) the time resolution of the STFT (in seconds)   
c) the length (ie, the number of columns) of the STFT


In [None]:
# answer:


## Exercise 9

Write the function `stft(x, fft_len, hop_size)`, which creates the Short Time Fourier Transform of `x`.  

Inputs:
- `x`: the sampled time-domain signal $x$
- `fft_len`: the length of the fft $N$
- `hop_size`: the hop size $H$

Output:
- `np.array`: a matrix of complex numbers of size `(num_bins, num_hops)`.

Implementation Notes:
- Use the Hann window for your STFT function (`np.hanning`).  
- Make the window a _centered window_ by zero-padding the beginning of $x$ appropriately.  
- You can use a python `for` loop to iterate over each hop.  
- Most of the time, the final window will be longer than the amount of signal you have left. Find a good way to deal with this case.
- Since the Fourier Transform of a real signal is symmetric, the STFT should return only the first half of the FT. In other words, `num_bins` should be $(1 + N/2)$.

In [None]:
def stft(x, fft_len, hop_size):
    pass

Test your function by loading a short piece of audio, creating the STFT, and plotting it. You can compare your results to `fmp.stft` to see how you did.

## Exercise 10

In your new role as _Inspector Spectrogram_, your job is to analyze mystery spectrograms and identify their parameters.

A new spectrogram arrived at your desk today, with a note reading:  
Beethoven's _Fur Elise_, played most beautifully at an average tempo of 110 BPM.

You find the [music to Fur Elise](images/fur_elise.jpg) and realize that a "beat" here means 1 eighth note (3 beats per bar). You then inspect the spectrogram's size (rows X columns) and have a pretty good guess that this spectrogram was created using a STFT that keeps only the first half (plus 1) of the spectra's bins.

Based only on the information you have been given, figure out the following:  

- $N$, the length of the FFT window used in creating the Spectrogram (i.e., `fft_len`). $N$ is some power of 2.
- $F_s$, the sample rate used in the audio recording. $F_s$ is a multiple of 1000.
- $H$, the hop size used to create Spectrogram (i.e., `hop_size`). $H$ is a power of 2.

Show all your work in detail, with the various plots and calculations that you used to figure out the three mystery values. Some of your answers will be approximations so you should round them to the most reasonable value.

In [None]:
spec = np.load('data/ex10.npy')
plt.imshow(spec, origin='lower', aspect='auto');

"Hmmm...", you say to yourself. "This is hard to read. I'll have to enhance this plot somehow... and probably also find a way to zoom in on the details..."

In [None]:
# calculations to find N, the fft length used to create the STFT
N = None

In [None]:
# calculations to find the original sampling rate of the audio, fs
fs = None

In [None]:
# calculations to find hop_size
hop_size = None