In [28]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib notebook

## 3. Applications of DFTs
**The following are exercises from Chapter 7 of Mark Newman's Computational Physics book.**

3.1 Read in the `sunspots.txt` data as a 2D array. There are two columns of numbers (separated by tabs). The first column is the number of the recorded month. The second is the number of sunspots recorded in that month. Plot this data.

### Sunspot Data Analysis 

In [205]:
x = data[:, 0]
y = data[:, 1]
fig,ax = plt.subplots()
# ax.set_xlim(0,500)
ax.plot(x,y)
ax.grid()
ax.set_xlabel("months")
ax.set_ylabel("Number of sun spots")

<IPython.core.display.Javascript object>

<matplotlib.text.Text at 0x1456d0e10>

3.1.b Estimate the frequency of the slowly-oscillating patten in the data, which separates the major peaks. 
> Solution: What is your estimate?

3.2 Take a DFT of this real-valued data and consult the power spectrum $|c_{k}|^{2}$ to evaluate the periodicity of the data.

In [206]:
# 3.2 SOLUTION
yt = np.abs(np.fft.rfft(data[:,1])) ** 2
plt.figure(2)
plt.plot(np.arange(len(data)//2 + 1) / len(data), yt)

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x14655fc18>]

3.3 What is the cause of the large peak at $k = 0$? Transform the sunspot data to help eliminate this effect, and plot the power spectrum of the new data. P
> 3.3. Solution: explain the cause of the large peak

In [208]:
# 3.3 SOLUTION
#the peak is due in part because the value of sin at around 0 is very small --> need a huge value to make it work
#
C = np.fft.rfft(y - np.mean(y))

fig, ax = plt.subplots()
ax.plot(np.abs(C) ** 2)



<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x1467b6e80>]

### Analyzing Audio Signals from Instruments

3.4 Read in the digital audio signal for a trumpet, from `data/trumpet.txt`, as an array of integers. This signal was recorded at a rate of 44.1 kHz, which is the defacto standard for audio sampling (as implemented by Sony).

Plot the signal on an x-axis labeled `"Time (sec)"` - be sure that the axis is scaled appropriately such that it reflects units of seconds. There is too much data to be plotted - plot every 100th datapoint. Then, play the audio using

```python
from IPython.display import Audio
Audio(data, rate=???)
```

In [74]:
# 3.4 SOLUTION

with open("data/trumpet.txt", 'r') as R:
    data = np.asarray([int(i) for i in R])

plt.figure(4)
plt.plot(np.arange(1000), data[::100])

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x118653780>]

In [75]:
from IPython.display import Audio
Audio(data, rate=44100)

3.5 Plot the trumpet's frequency spectrum, $|c_{k}|$, for the first 10,000 $k$-values. Be sure to use a FFT for real-valued data - we are working with a lot of data. [What notes are being played](http://www.phy.mtu.edu/~suits/notefreqs.html)? Make sure that the $k$-axis of your spectrum is scaled to be in Hz.

In [95]:
# 3.5 SOLUTION 
plt.figure(5)
k = 10000
plt.plot(np.arange(k//2 + 1),np.abs(np.fft.rfft(data)[:5001]))



<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x1299ff860>]

3.6 Repeat this work for the piano audio signal

In [114]:
# 3.6 SOLUTION 
# plot the pressure wave
with open("data/piano.txt", 'r') as R:
    data = np.array([int(i) for i in R])

fig, ax = plt.subplots()
ax.plot(data[::100])
ax.set_xlabel("Time (sec)")
ax.set_title("Piano Waveform")

<IPython.core.display.Javascript object>

<matplotlib.text.Text at 0x114457240>

In [82]:
# play the audio!
from IPython.display import Audio
Audio(data, rate=44100)

In [99]:
# plot the Fourier spectrum
fig, ax = plt.subplots()
k = 10000
ax.plot(np.arange(k//2 + 1),np.abs(np.fft.rfft(data)[:5001]))

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x12aea2f28>]

## Smoothing Stock Market Data

3.7 Read in the stock market data from `data/dow.txt`. Each data point corresponds to the daily closing value of the Dow Jones Industrial Average (starting in late 2006 and ending in late 2010). Plot the data on labeled axes.

In [220]:
with open("data/dow.txt", "r") as R:
    data = np.asarray([float(i) for i in R])
print(data)
fig, ax = plt.subplots()

ax.plot(data)
ax.set_xlabel("day")
ax.set_title("closing value")




[ 12121.71  12136.44  12226.73 ...,  11499.25  11491.91  11478.13]


<IPython.core.display.Javascript object>

<matplotlib.text.Text at 0x159ef8240>

3.8 Perform an FFT on this real-valued data, and plot $|c_{k}|$ on a log scale. The $k$-axis should be scaled to be in units of [1 / days].

In [222]:
# 3.8 SOLUTION
fig, ax = plt.subplots()
ax.set_yscale("log")
ck = np.fft.rfft(data)

ax.plot(np.arange(len(ck)) / len(data),np.abs(ck))

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x15ab4c518>]

3.9 We want to smooth this stock market data. We can do this by "removing" the high-frequency coefficients of its Fourier spectrum. Try zeroing-out the top 90% high-frequency coefficients, and then perform an inverse FFT using these altered coefficients. Plot the "recovered" signal on top of a semi-transparent version of the original data (use the plot parameter `alpha=0.5`). Then repeat this, but with zeroing out the top 98% coefficients. In both of these cases, on what scale are the fluctuations being filtered out?
> 3.9 Solution. Explanation 90% filters out all the day-to-day fluctuations, while 98% filters out all the monthly fluctuations


In [225]:
# 3.9 SOLUTION (top 90%)
ck = np.fft.rfft(data)
ck[round(len(ck)*0.1):] = 0
smooth = np.fft.irfft(ck)

fig, ax = plt.subplots()
ax.plot(data, alpha=0.5)
ax.plot(smooth)
ax.set_xlabel("Days")
ax.set_ylabel("Dow Jones Industrial Average")

<IPython.core.display.Javascript object>

<matplotlib.text.Text at 0x15cc71358>

In [226]:
# 3.9 SOLUTION (top 98%)
ck = np.fft.rfft(data)
ck[round(len(ck)*0.02):] = 0
smooth = np.fft.irfft(ck)

fig, ax = plt.subplots()
ax.plot(data, alpha=0.5)
ax.plot(smooth)
ax.set_xlabel("Days")
ax.set_ylabel("Dow Jones Industrial Average")

<IPython.core.display.Javascript object>

<matplotlib.text.Text at 0x15cccb0b8>

3.10 Now repeat this process but zero-out the bottom 10% **low-frequency** coefficients. What do you see? Why is there a huge down-shift in the recovered data? What would happen if you filtered out the bottom 10% low-frequency coefficients **except** for $c_{0}$? Try this.
> 3.10 Solution: Explanation the less volatile functions (e.g. the sin waves with less frequency) 

In [227]:
# 3.10 SOLUTION
ck = np.fft.rfft(data)
ck[:round(len(ck)*0.1)] = 0
smooth = np.fft.irfft(ck)

fig, ax = plt.subplots()
ax.plot(data, alpha=0.5)
ax.plot(smooth)
ax.set_xlabel("Days")
ax.set_ylabel("Dow Jones Industrial Average")

<IPython.core.display.Javascript object>

<matplotlib.text.Text at 0x15cd11e80>

In [228]:
# 3.10 SOLUTION (except c_0)
ck = np.fft.rfft(data)
ck[1:round(len(ck)*0.1)] = 0
smooth = np.fft.irfft(ck)

fig, ax = plt.subplots()
ax.plot(data, alpha=0.5)
ax.plot(smooth)
ax.set_xlabel("Days")
ax.set_ylabel("Dow Jones Industrial Average")

<IPython.core.display.Javascript object>

<matplotlib.text.Text at 0x15c085860>