*03 Jan 2021, Julian Mak (whatever with copyright, do what you want with this)

### As part of material for OCES 3301 "Data Analysis in Ocean Sciences" delivered at HKUST

For the latest version of the material, go to the public facing [GitHub](https://github.com/julianmak/academic-notes/tree/master/OCES3301_data_analysis_ocean) page.

In [None]:
# load some deafult packages

import matplotlib.pyplot as plt
import numpy as np
import copy
from scipy import signal
import pandas as pd
from datetime import datetime, timedelta

# pull files from the internet if needed (e.g. temporary session in Colab)
# !wget https://raw.githubusercontent.com/julianmak/OCES3301_data_analysis/main/Tobermory_20160430_20161231.csv

---------------------------

# 08: Fourier analysis and power spectrum

Still focusing on time series, but with a particular focus on utilising **Fourier analysis**. In particular, we aim to decompose a signal into **Fourier modes** and compute the **power spectrum**, from which we can quantify the kind of oscillations a time-series has.

> NOTE: The discussion below focuses on the time variable, but the discussion applies equally well to space variables. Indeed, we will revisit some of the tools here in *10_fun_with_maps*.

# a) A brief digression: basis and vectors

Recall that a vector is (loosely) an object with both a magnitude and direction, and in standard two-dimensional space (or $\mathbb{R}^2$) this might be represented as

\begin{equation*}
    \mathbf{v} = \begin{pmatrix}1 \\ 2\end{pmatrix} = 1 \mathbf{e}_x + 2 \mathbf{e}_y = 1\begin{pmatrix}1 \\ 0 \end{pmatrix} + 2\begin{pmatrix}0 \\ 1 \end{pmatrix}.
\end{equation*}

$\mathbf{v}$ here is our vector, that we happen to represent in this case with the **standard basis** $\{\mathbf{e}_x, \mathbf{e}_y\}$. The **basis** in this way you could think of as the "indivisible" building blocks of the vector: any vector you want in $\mathbb{R}^2$ could be represented by some linear combination of the basis (the basis vectors **span** $\mathbb{R}^2$), and the individual basis vectors themselves cannot be constructed from a linear combination of the other basis vectors (the basis vectors are **linearly independent** of the other basis vectors).

The standard basis here is also particularly convenient because the basis vectors are all of unit length (i.e. they have length $1$), and are **orthogonal** to each other (they are at right angles to each other). The standard basis is a **orthonormal basis**, with respect to the standard **inner product** $\langle \cdot, \cdot \rangle$

\begin{equation*}
    \langle \mathbf{u}, \mathbf{v} \rangle = u_1 v_1 + u_2 v_2,
\end{equation*}
where $\mathbf{u} = (u_1, u_2)$ and $\mathbf{v} = (v_1, v_2)$.

The code below shows the vector $\mathbf{v}$, the basis vectors, and a construction of $\mathbf{v}$ from the basis vectors.

In [None]:
fig = plt.figure(figsize=(10, 4))
ax = plt.subplot(1, 2, 1)
ax.arrow(0, 0, 1, 2, width=0.05, color="k", label=r"$\mathbf{v}$")
ax.arrow(0, 0, 1, 0, width=0.05, color="C0", label=r"$\mathbf{e}_x$")
ax.arrow(0, 0, 0, 1, width=0.05, color="C1", label=r"$\mathbf{e}_y$")
ax.set_xlim([-1.5, 3.5])
ax.set_ylim([-1.5, 3.5])
ax.grid()
ax.legend()

ax = plt.subplot(1, 2, 2)
ax.arrow(0, 0, 1, 2, width=0.05, color="k", label=r"$\mathbf{v}$")
ax.arrow(0, 0, 0.95, 0, width=0.05, color="C0")
ax.arrow(1, 0, 0, 0.95, width=0.05, color="C1")
ax.arrow(1, 1, 0, 0.95, width=0.05, color="C1")
ax.set_xlim([-1.5, 3.5])
ax.set_ylim([-1.5, 3.5])
ax.grid()

For the case of $\mathbb{R}^2$ (and more generally $\mathbb{R}^n$) there is an obvious choice for a basis (the standard basis), but it is not the only choice. The basis vectors themselves do not have to be unit length or even orthogonal to each other: they only need to be a linearly independent spanning set. For the above example, I could choose say

\begin{equation*}
    \mathbf{v} = \begin{pmatrix}1 \\ 2\end{pmatrix} = 1\begin{pmatrix}2 \\ 0 \end{pmatrix} + 1\begin{pmatrix}-1 \\ 2 \end{pmatrix} = 1 \mathbf{a} + 1 \mathbf{b},
\end{equation*}

which is demonstrated pictorially in the code below. The basis vectors in this case are not at right angles to each other, but convince yourself that you can make any vector in $\mathbb{R}^2$ out of linear combinations of $\mathbf{a}$ and $\mathbf{b}$.

In [None]:
fig = plt.figure(figsize=(8, 4))
ax = plt.subplot(1, 2, 1)
ax.arrow(0, 0, 1, 2, width=0.05, color="k", label=r"$\mathbf{v}$")
ax.arrow(0, 0, 2, 0, width=0.05, color="C2", label=r"$\mathbf{a}$")
ax.arrow(0, 0, -1, 2, width=0.05, color="C3", label=r"$\mathbf{b}$")
ax.set_xlim([-1.5, 3.5])
ax.set_ylim([-1.5, 3.5])
ax.grid()
ax.legend()

ax = plt.subplot(1, 2, 2)
ax.arrow(0, 0, 1, 2, width=0.05, color="k")
ax.arrow(0, 0, 2, 0, width=0.05, color="C2")
ax.arrow(2, 0, -1, 1.95, width=0.05, color="C3")
ax.set_xlim([-1.5, 3.5])
ax.set_ylim([-1.5, 3.5])
ax.grid()

In a similar way, *functions* could be cast in terms of basis *functions* (not going to bother being precise about the space of functions though). For functions $f$ (with convenient properties that I am not going to elaborate on) in some unspecified interval $I$, I could represent them for example in a Taylor series

\begin{equation*}
    f(x) = a_0 + a_1 x + a_2 x^2 + \ldots = \sum_{i=0}^{"\infty"} a_i x^i
\end{equation*}

assuming the sum converges and is well-defined etc. In this case my basis functions are the *monomials* $x^i$ (rather than *poly*nomials which has multiple terms), and my inner product might be

\begin{equation*}
    \langle f, g\rangle = \int_I f(t) g(t)\; \mathrm{d}t.
\end{equation*}

Again, this is not the only choice of basis or the inner product (e.g. **Legender polynomials**, which is used in **spherical harmonics**, which we might touch on these in *10_fun_with_maps*).

---------------
# b) Fourier modes and the power spectrum

With the context above, I state without proof that the trigonometric functions (sines and cosines) or the **Fourier basis** is a particularly convenient choice of basis for the space of sufficiently smooth periodic functions over some finite interval; it can be made orthogonal (or orthonormal through appropriate choice of normalisation) under the standard inner product defined above. For $I = [0, 2\pi]$, $k, l$ integers and $k \neq l$, 

\begin{equation*}
    \int_0^{2\pi} \sin(kt) \cos(lt)\; \mathrm{d}t = \int_0^{2\pi} \sin(kt) \sin(lt)\; \mathrm{d}t = \int_0^{2\pi} \cos(kt) \cos(lt)\; \mathrm{d}t = 0.
\end{equation*}

(You can show this quite easily with the compound angle formula.)

Going back to signals and time-series, if we either make the assumption that the signal $f(t)$ is sufficiently smooth and periodic (or could be forced to be periodic), then it means we can decompose it into Fourier modes as

\begin{equation*}
    f(t) = a_0 + a_1 \cos(t) + a_2 \cos(2t) + \ldots + b_1 \sin(t) + b_2 \sin(2t) + \ldots = a_0 + \sum_{k=1}^\infty a_k \cos(kt) + \sum_{l=1}^\infty b_l \sin(lt).
\end{equation*}

The amplitudes $a_k$ and $b_k$ would tell you the amplitude of the oscillation at **angular frequency** $k$, which is related to the **frequency** $\nu$ ("nu") via $k = 2\pi\nu$, as well as the **period** $\lambda$ ("lambda") through $\lambda = 2\pi / k$. Because the Fourier basis functions are orthogonal to each other, it means we can talk about the **power** (related to the Fourier amplitudes $a_k$ and $b_l$) in a certain frequency of oscillation within the signal without any ambiguity. 

> NOTE: Normally $\omega$ ("omega") is used for angular frequency, and $k$ for the **wavenumber**. The former is usually used when the time variable is involved, while the latter is usually used when space variable is involved (e.g. *09/10_fun_with_maps*). I am not going to be that consistent with my terminology though (I almost always refer to the wavenumber regardless of whether I am dealing with space or time).

## Fourier transform

So what we now need is a way to decompose a signal into Fourier modes, which is achieved through a **(discrete) Fourier transform**. The way you can think of the Fourier transform is that, for a signal $f(t)$ a with finite samples $f(t_i)$, instead of describing my function in terms of the function values $f(t_i)$, I describe it instead by the *Fourier amplitudes* ($a_k$, $b_l$). To draw analogies with the previous example for vectors, I could describe my vector $\mathbf{v}$ as "go one to the right and two up", or I could also describe it as $(1, 2)$, and they are different but equivalent legitimate ways of describing the same thing. Instead of describing my signal through the data points in the **time domain**, I describe them through the Fourier amplitudes in the **frequency domain**: each amplitude $(a_k, b_k$) corresponds to the appropriate Fourier modes $\cos(kt)$ and $\sin(kt)$, once I know my amplitudes I can reconstruct my signal in the time domain, and vice-versa.

Formally, to get at the Fourier amplitudes $(a_k, b_k)$ from $f(t)$, we suppose that we have

\begin{equation*}
    f(t) = a_0 + \sum_{k=1}^\infty a_k \cos(kt) + \sum_{l=1}^\infty b_l \sin(lt).
\end{equation*}

Then, for example, if we wanted the $a_1$ amplitude, we would multiply both sides by $\cos(t)$ and integrate over the interval, leading to

\begin{equation*}
    \int_I f(t)\cos(t)\; \mathrm{d}t = \int_I \cos(t) \left[a_0 + \sum_{k=1}^\infty a_k \cos(kt) + \sum_{l=1}^\infty b_l \sin(lt)\right]\; \mathrm{d}t.
\end{equation*}

Since $\cos(t)$ is orthogonal to everything except itself by construction (see top equation of this section), almost everything gets killed, and we are left with

\begin{equation*}
    \int_I f(t)\cos(t)\; \mathrm{d}t = a_1 \int_I \cos^2(t)\; \mathrm{d}t.
\end{equation*}

The right hand side we can evaluate by hand (it is actually equal to $1/2$), and the left hand side can be computed numerically. The procedure is general across the Fourier coefficients, and so in principle we can compute all the Fourier coefficients numerically. This is essentially the Fourier transform in a nutshell. 

> NOTE: The discrete and/or the fast versions of the Fourier transforms we are going to be using makes use of further symmetries to speed up the computation. The FFT (Fast Fourier Transform) has been rated one of the top ten algorithms of the 20th century (e.g. see [here](https://en.wikipedia.org/wiki/Fast_Fourier_transform)); this is partly because Fourier analysis is so widely used in many different fields (e.g. communications, music, satellite sensing, etc.)

Below is a particularly simple example for illustration of the Fourier transform, as well as demonstrating some syntax in Python. We are going to stick with the $[0, 2\pi]$ domain (so all wavenumbers are integers), and we take for simplicity $f(t) = \sin(t)$.

In [None]:
k, N = 1, 8
t = np.linspace(0, 2.0 * np.pi, N)
f = np.sin(k * t)

fig = plt.figure(figsize=(5, 3))
ax = plt.axes()
ax.plot(t, f, 'x-')
ax.set_xlabel(r"$t$")
ax.set_ylabel(r"$f$")
ax.grid()

print(f"f(t_i) = {f}")

We can describe $f(t)$ through the values in the array. Or we could describe it in terms of Fourier amplitudes. In  this case the representation is trivial, with $b_1 = 1$, and every other amplitude is zero. We do the Fourier transform via the command `np.fft.fft`, which is the **Fast Fourier Transform**. On the left is the original signal in the time domain, and on the right is something proportional to the **power spectrum**, which are the absolute values of the Fourier amplitudes squared, and encodes how much power there is per wavenumber. A high power implies the signal predominantly contains oscillations at that wavenumber.

In [None]:
f_h = np.fft.fft(f)
print(f"raw output = {f_h}")
print(" ")
print(f"abs of the output of f_h = {abs(f_h)}")

fig = plt.figure(figsize=(10, 3))
ax = plt.subplot(1, 2, 1)
ax.plot(t, f, 'x-')
ax.set_xlabel(r"$t$")
ax.set_ylabel(r"$f$")
ax.grid()

# create the wavenumber array, just integers here from zero to int(N/2)
k_vec = np.arange(int(N/2+1))
ax = plt.subplot(1, 2, 2)
ax.plot(k_vec, abs(f_h[k_vec])**2, '-o', markersize=12)
ax.set_xlabel(r"$k$")
ax.set_ylabel(r"$\hat{f}$")
ax.grid()

Notice here that the FFT routines don't care about the corresponding $t$ array, and we generated the wavenumber array by hand. In this case we know $k$ has to be an integer.

A few things to note:

1) By default the FFT commands give you the amplitudes of the negative wavenumbers as well. This is not really an issue because recall that $\sin(-kt) = -\sin(kt)$ and $\cos(-kt) = \cos(kt)$, so you are just getting some things which are flipped the other way. If you are throwing in a real signal, you only need the positive half of the entries, and the `k_vec = np.arange(int(N/2+1))` commands gives you the indices of the positive half of the amplitudes.

2) FFT returns an array the same size as the input array, and given the symmetries in the problem, you only get about $N/2 + 1$ distinct entries, which is related to the Nyquist sampling rate.

3) FFT gives you the amplitudes as **complex numbers** $a + bj$ in Python notation. $a$ is the real part and corresponds to the amplitudes of the cosines, while $b$ is the imaginary part and corresponds to the amplitudes of the sines (through [Euler's formular](https://en.wikipedia.org/wiki/Euler%27s_formula)). For our example above, we should only be getting something non-zero at the SECOND imaginary entry (because the first one corresponds to the $k=0$ mode), and we shoudl have

\begin{equation*} f(t) = 0 + 0 \cos(t) + 1 \sin(t) + 0 \cos(2t) + \ldots \end{equation*}

The problem here is that the output aboves gives something non-zero at the $k \geq 0$ entries, and in both the real and imaginary part! Did we do stuff wrong?

This a syntax and convention problem. By default the FFT assumes you don't include the data point at the right hand part of your interval, because we are supposed to be dealing with periodic functions, so the right end point is assumed to be exactly that of the left end point, and there is no point including it as such. This is an easy fix: just like in *07_time_series*, we use the `endpoing=False` argument in `np.linspace`.

In [None]:
k, N = 1, 8
t = np.linspace(0, 2.0 * np.pi, N, endpoint=False)
f = np.sin(k * t)

f_h = np.fft.fft(f)
print(f"raw output = {f_h}")
print(" ")
print(f"abs of the output = {abs(f_h)}")

fig = plt.figure(figsize=(10, 3))
ax = plt.subplot(1, 2, 1)
ax.plot(t, f, 'x-')
ax.set_xlabel(r"$t$")
ax.set_ylabel(r"$f$")
ax.grid()

# create the wavenumber array, just integers here from zero to int(N/2)
k_vec = np.arange(int(N/2+1))
ax = plt.subplot(1, 2, 2)
ax.plot(k_vec, abs(f_h[k_vec])**2, '-o', markersize=12)
ax.set_xlabel(r"$k$")
ax.set_ylabel(r"$\hat{f}$")
ax.grid()

So now we are doing better, because the amplitudes are only non-zero at $k=1$, and it is precisely in the SECOND entry's imaginary part (you can ignore `1e-16` values, which in this case is machine precision and you can treat them as zeros). 

> NOTE: The value however is not 1, and this is due to choice of normalisation of the Fourier amplitudes within the FFT routines. Not really going to go into this, except to say that:
> 
> (1) We are going to be only interested in the *shape* and *relative amplitudes* of the spectrum, so in that sense the choice of normalisation is unimportant.
>
> (2) The FFT (from time to frequency domain) and inverse FFT (the reverse of that) are normalised in a consistent way, so that FFT(IFFT($f$)) = IFFT(FFT($f$)) = $f$ by default.
>
> You do need to be careful if you want to compute things with the spectrum (e.g. working out the total power of a wave through [Parseval's theorem](https://en.wikipedia.org/wiki/Parseval%27s_theorem) say).

There is the `rfft` command (and the corresponding `irfft` command) that computes the above without requiring you to manually chop half of the Fourier modes out. Below gives a demonstration of the `rfft` command.

In [None]:
k, N = 1, 8
t = np.linspace(0, 2.0 * np.pi, N, endpoint=False)
f = np.sin(k * t)

f_h = np.fft.rfft(f)
print(f"raw output = {f_h}")
print(" ")
print(f"abs of the output = {abs(f_h)}")

fig = plt.figure(figsize=(10, 3))
ax = plt.subplot(1, 2, 1)
ax.plot(t, f, 'x-')
ax.set_xlabel(r"$t$")
ax.set_ylabel(r"$f$")
ax.grid()

k_vec = np.arange(len(f_h))
ax = plt.subplot(1, 2, 2)
ax.plot(k_vec, abs(f_h)**2, '-o', markersize=12)
ax.set_xlabel(r"$k$")
ax.set_ylabel(r"$\hat{f}$")
ax.grid()

# c) Signal analysis/processing with Fourier modes

## Decomposing a signal

Below is an example of a power spectrum analysis on an idealised signal consisting of two wave modes (still sines and cosines). The real signal itself looks more complex in the time domain, but the Fourier analysis just picks it out as two points on the power spectrum, as we expect.

In [None]:
N = 16
a1, k1 = 1.0, 1.0
a2, k2 = 0.5, 2.0
t = np.linspace(0, 2.0 * np.pi, N, endpoint=False)
f1 = a1 * np.sin(k1 * t)
f2 = a2 * np.cos(k2 * t)
f = f1 + f2

f_h = np.fft.rfft(f)

fig = plt.figure(figsize=(10, 3))
ax = plt.subplot(1, 2, 1)
ax.plot(t, f, 'x-', label=r"$f$")
ax.plot(t, f1, 'k:', alpha=0.5, label=r"f1")
ax.plot(t, f2, 'k--', alpha=0.5, label=r"f2")
ax.set_xlabel(r"$t$")
ax.set_ylabel(r"$f$")
ax.legend()
ax.grid()

k_vec = np.arange(len(f_h))
ax = plt.subplot(1, 2, 2)
ax.plot(k_vec, abs(f_h)**2, 'o-', markersize=12)
ax.set_xlabel(r"$k$")
ax.set_ylabel(r"$\hat{f}$")
ax.grid()

For the case below I am somewhat arbitrary with amplitudes and wavenumbers when constructing the components of the wave signal, although I am still using sines and cosines. Again, the signal is complicated in the time domain, but the power spectrum picks out the dominant component (for this choice of seed it is wave 5, which you can sort of see by plotting the wave 5 on at the same time).

In [None]:
N = 32
t = np.linspace(0, 2.0 * np.pi, N, endpoint=False)

np.random.seed(4167)

# try and explain what the below loop is doing
f = np.zeros(N)
for i in range(16):
    amp  = np.random.rand()
    wnum = np.random.randint(low=-N/4, high=N/4)  # restrict the wavenumbers to half the Nyquist range
    if i % 2 == 0:
        f += amp * np.sin(wnum * t)
    else:
        f += amp * np.cos(wnum * t)

f_h = np.fft.rfft(f)

fig = plt.figure(figsize=(10, 3))
ax = plt.subplot(1, 2, 1)
ax.plot(t, f, 'x-', label=r"$f$")
ax.plot(t, np.cos(5 * t), 'k--', alpha=0.5)
ax.set_xlabel(r"$t$")
ax.set_ylabel(r"$f$")
ax.legend()
ax.grid()

k_vec = np.arange(len(f_h))
ax = plt.subplot(1, 2, 2)
ax.plot(k_vec, abs(f_h)**2, 'o-', markersize=12)
ax.set_xticks(np.arange(N/2+1))
ax.set_xlabel(r"$k$")
ax.set_ylabel(r"$\hat{f}$")
ax.grid()

## Beyond $[0, 2\pi]$

Generically we will not have data on the $[0, 2\pi]$ domain. Convince yourself that $\sin(kx)$ and $\cos(kx)$ will not be periodic on $[0, L]$ unless $L$ is some factor of $2\pi$, so cannot be a basis for the space of periodic functions on $[0, L]$. However, convince yourself that

\begin{equation*}
    \left\{\cos\left(\frac{2\pi k}{L}t\right), \sin\left(\frac{2\pi l}{L}t\right)\right\}
\end{equation*}

is in fact periodic on $[0, L]$, since it is just a stretched version of the sines and cosines on the interval. Below shows the above graphically for one example.

In [None]:
N, L, k = 16, 1, 1
fig = plt.figure(figsize=(10, 3))

ax = plt.subplot(1, 2, 1)
t1 = np.linspace(0, 2 * np.pi, N, endpoint=False)
f1 = np.sin(k * t1)
ax.plot(t1, f1, 'rx-')
ax.set_xlabel(r"$t_1$")
ax.set_ylabel(r"$f_1$")
ax.grid()

scale_factor = 2.0 * np.pi / L

ax = plt.subplot(1, 2, 2)
t2 = np.linspace(0, L, N, endpoint=False)
f2 = np.sin(k * scale_factor * t2)
ax.plot(t2, f2, 'bx-')
ax.set_xlabel(r"$t_2$")
ax.set_ylabel(r"$f_2$")
ax.grid()

Thus we can use the above basis for decomposing signals on $[0, L]$, except now we have to be reinterpret the wavenumber. Essentially we want to create the wavenumber array as for $[0, 2\pi]$, but rescale it by the factor $2\pi / L$ if the signal is on $[0, L]$. Below shows the code for this.

In [None]:
fig = plt.figure(figsize=(10, 3))

scale_factor = 2.0 * np.pi / L

ax = plt.subplot(1, 2, 1)
t2 = np.linspace(0, L, N, endpoint=False)
f2 = np.sin(k * scale_factor * t2)
ax.plot(t2, f2, 'bx-')
ax.set_xlabel(r"$t_1$")
ax.set_ylabel(r"$f_1$")
ax.grid()

f_h = np.fft.rfft(f2)
k_vec = np.arange(len(f_h))

ax = plt.subplot(1, 2, 2)
ax.plot(k_vec * scale_factor, abs(f_h)**2, 'bo-', markersize=12)
ax.set_xlabel(r"$k$")
ax.set_ylabel(r"$\hat{f}$")
ax.grid()

> <span style="color:red">**Q.**</span> The value of $k$ associated with the peak is now larger. Convince yourself this way of rescaling is physically consistent. (Hint: is the signal now defined on $[0, L]$ of higher or lower frequency than the one defined on $[0, 2\pi]$?)

If we have data on some arbitrary interval $[a, b]$, we can always shift it onto $[0, L = b-a]$ without modifying anything: the argument here is that anything periodic on $[a, b]$ must be periodic on a shifted version of itself, so shifting doesn't really do anything.

## Power spectrum against frequency/period

In the above cases we have been plotting the power spectrum against the wavenumber (but really the angular frequency) $k$, and it might be more appropriately or informative to plot against the frequency or the period instead. Suppose we take the time units to be in seconds, then note that:

* The angular frequency $k$ has units of radians per second.
* Since $k = 2 \pi \nu$, the frequency is $\nu = k / (2\pi)$ and has units of per second, or **Hertz** (Hz).
* Via $\nu = 1/T$, we have the period $T = 2\pi / k$, which has units of seconds here.

All of these are only transformations of the wavenumber array itself, so we can just modify the arrays when we need to plot. Below is the above single period sine wave on the domain $[0, 1]$. Here we know the answer of what everything should be: it is a single period sine wave over 1 second, so it should have frequency 1 Hz and period 1 second.

In [None]:
k_mod = k_vec * scale_factor

fig = plt.figure(figsize=(14, 3))
ax = plt.subplot(1, 3, 1)
ax.plot(k_mod, abs(f_h)**2, 'bo-', markersize=12)
ax.set_xlabel(r"$k$ (rad per sec)")
ax.set_ylabel(r"$\hat{f}$")
ax.set_title(r"angular frequency")
ax.grid()

ax = plt.subplot(1, 3, 2)
ax.plot(k_mod / (2.0*np.pi), abs(f_h)**2, 'bo-', markersize=12)
ax.set_xlabel(r"$\nu$ (Hz)")
ax.set_title(r"frequency")
ax.grid()

ax = plt.subplot(1, 3, 3)
ax.plot(2.0*np.pi / k_mod, abs(f_h)**2, 'bo-', markersize=12)
ax.set_xlabel(r"$T$ ($\mathrm{s}$)")
ax.set_title(r"period")
ax.grid()

fig.tight_layout(pad=0.5)

> NOTE: In the above you probably will get a warning of `divide by zero encountered`, coming from doing $2\pi / k$ when computing for the period, since there is a zero value of $k$ corresponding to the $a_0$ component associated with $\cos(0t) = 1$, i.e. the constant. If you really want to avoid this, you could either use `np.where` to not divide by zero, or replace the zero entry in the wavenumber array by say `1e-16`. You can probably just ignore it by the choice of axis, or not including the relevant data in the plot, since a constant value does not have a well-defined period anyway.

## High/low pass a signal

Since you can decompose a signal into the Fourier modes that has associated with it some oscillations, you could artificially kill those modes by setting the relevant amplitudes to zero in the frequency domain, and then transforming the modified spectrum back into the time domain. If you kill the modes of higher wavenumber then this would be a *low pass* (since you are leaving the low frequency modes alone), and conversely it would be a high pass. Given you now know the relation between angular frequency, frequency as well as period, you can do filtering based on conditions such as "give me the signal that only has period larger than 5 second" or analogously.

Below is an example of this for one of the examples above, where I interpret the time in seconds, my signal is between $[0, 1]$, and I wipe out anything that is larger than 5 Hz (so a low pass, and I need to convert the angular frequency to frequency). Another code below does the reverse, where I do a high pass of anything above 5 Hz instead.

In [None]:
N, L = 32, 1
t = np.linspace(0, L, N, endpoint=False)

scale_factor = 2.0 * np.pi / L

np.random.seed(4167)

f = np.zeros(N)
for i in range(16):
    amp  = np.random.rand()
    wnum = np.random.randint(low=-N/4, high=N/4) * scale_factor
    if i % 2 == 0:
        f += amp * np.sin(wnum * t)
    else:
        f += amp * np.cos(wnum * t)

f_h = np.fft.rfft(f)

k_vec = np.arange(len(f_h)) * scale_factor  # scaled wavenumber array
freq_vec = k_vec / (2.0*np.pi)              # convert to frequency
f_h_mod = copy.deepcopy(f_h)                # python quirk: deep copy instead of soft copy
f_h_mod[freq_vec >= 5.0] = 0                # set anything above 5Hz to zero
f_mod = np.fft.irfft(f_h_mod)               # invert to get the resulting filtered signal

fig = plt.figure(figsize=(10, 3))
ax = plt.subplot(1, 2, 1)
ax.plot(t, f, 'rx-', label=r"$f$", alpha=0.5)
ax.plot(t, f_mod, 'gx-', label=r"$f$ mod")
ax.set_xlabel(r"$t$ (s)")
ax.set_ylabel(r"$f$")
ax.legend()
ax.grid()

ax = plt.subplot(1, 2, 2)
ax.plot(freq_vec, abs(f_h), 'ro-', markersize=12, alpha=0.5)
ax.plot(freq_vec, abs(f_h_mod), 'go-', markersize=12)
ax.set_xlabel(r"$\mathsf{f}\ (\mathrm{s}^{-1}\ \mathrm{or}\ \mathrm{Hz})$")
ax.set_ylabel(r"$\hat{f}$")
ax.grid()

In [None]:
f_h = np.fft.rfft(f)

k_vec = np.arange(len(f_h)) * scale_factor  # scaled wavenumber array
freq_vec = k_vec / (2.0*np.pi)              # convert to frequency
f_h_mod = copy.deepcopy(f_h)                # python quirk: deep copy instead of soft copy
f_h_mod[freq_vec <= 5.0] = 0                # set anything below 5Hz to zero
f_mod = np.fft.irfft(f_h_mod)               # invert to get the resulting filtered signal

fig = plt.figure(figsize=(10, 3))
ax = plt.subplot(1, 2, 1)
ax.plot(t, f, 'rx-', label=r"$f$", alpha=0.5)
ax.plot(t, f_mod, 'gx-', label=r"$f$ mod")
ax.set_xlabel(r"$t$ (s)")
ax.set_ylabel(r"$f$")
ax.legend()
ax.grid()

ax = plt.subplot(1, 2, 2)
ax.plot(freq_vec, abs(f_h), 'ro-', markersize=12, alpha=0.5)
ax.plot(freq_vec, abs(f_h_mod), 'go-', markersize=12)
ax.set_xlabel(r"$\mathsf{f}\ (\mathrm{s}^{-1}\ \mathrm{or}\ \mathrm{Hz})$")
ax.set_ylabel(r"$\hat{f}$")
ax.grid()

--------------------
# d) A few things to be careful about

## Signals with non-zero mean

I am just going to take one of the above examples and add a trend to it, and I am going to additionally show the power spectrum with respect to the period. Knowing I am going to divide by zero when converting wavenumber to period, I am going to replace the zero entry in the wavenumber array by something small.

In [None]:
N = 32
t = np.linspace(0, 2.0 * np.pi, N, endpoint=False)

np.random.seed(4167)

f = np.zeros(N)
for i in range(16):
    amp  = np.random.rand()
    wnum = np.random.randint(low=-N/4, high=N/4)  # restrict the wavenumbers to half the Nyquist range
    if i % 2 == 0:
        f += amp * np.sin(wnum * t)
    else:
        f += amp * np.cos(wnum * t)

f += t  # linear trend
        
f_h = np.fft.rfft(f)

fig = plt.figure(figsize=(14, 3))
ax = plt.subplot(1, 3, 1)
ax.plot(t, f, 'x-', label=r"$f$")
ax.plot(t, np.cos(5 * t) + t, 'k--', alpha=0.5, label=r"wave 5 with trend")
ax.set_xlabel(r"$t$")
ax.set_ylabel(r"$f$")
ax.legend()
ax.grid()

# force wavenumber array to be a float, then replace zero entry (because 1e-16 is not an integer)
k_vec = np.arange(len(f_h), dtype=np.float64)
k_vec[0] = 1e-16

ax = plt.subplot(1, 3, 2)
ax.plot(k_vec, abs(f_h)**2, 'o-', markersize=12)
ax.set_xticks(np.arange(N/2+1))
ax.set_xlabel(r"$k$ (radians per time unit)")
ax.set_ylabel(r"$\hat{f}$")
ax.grid()

ax = plt.subplot(1, 3, 3)
ax.plot(2.0*np.pi / k_vec, abs(f_h)**2, 'o-', markersize=12)
ax.set_xlabel(r"$T$ (time unit)")
ax.grid()

fig.tight_layout(pad=0.1)

While the signal as shown is still predominantly a wave 5 with a trend (as it should be, because I haven't done anything else to the signal), the power spectrum is now dominated by massive power in the $k=0$ mode, with a corresponding large power in the long period.

The reason for this is because in the presence of a trend the signal no longer has zero mean, and this means that something is projected onto the $a_0$ coefficient. The zero mode has no well-defined period anyway, so the procedure is to:

1) Detrend the signal (in this case just take away the linear trend obtained from linear regression)

2) Not show / ignore the stuff associated with the zero mode when showing plotting against periods.

> <span style="color:red">**Q.**</span> Do the above two things yourselves.

## Non-periodic signals

The signal above is over some interval, and the assumption of doing a Fourier analysis is that any extensions of the signal just gets repeated, i.e.:

In [None]:
fig = plt.figure(figsize=(14, 3))
ax = plt.axes()
ax.plot(t - 2.0*np.pi, f, 'C0x-', alpha=0.7)
ax.plot([0.0, 0.0], [-5, 15], 'k--', alpha=0.5)
ax.plot(t, f, 'C0x-')
ax.plot([2.0*np.pi, 2.0*np.pi], [-5, 15], 'k--', alpha=0.5)
ax.plot(t + 2.0*np.pi, f, 'C0x-', alpha=0.7)
ax.set_ylim([-3, 11])
ax.set_xlabel(r"$t$")
ax.set_ylabel(r"$f$")
ax.grid()

While the extended signal is non-periodic, the Fourier analysis in some sense doesn't really care.

If you really did want to make it periodic, one could pass the signal through a **window function** (which was briefly mentioned in *07_time_series*), which will force the signal to be zero at the ends, and thus make it periodic. The example below provides one demonstration of this with the Tukey window (`scipy.signal.tukey`), and I am going to put the power spectrum on a log scale as well.

In [None]:
window = signal.tukey(len(f))

fig = plt.figure(figsize=(12, 7))
ax = plt.subplot(2, 2, 1)
ax.plot(t, f, 'x-', label=r"$f$")
ax.plot(t, f * window, 'C1x-', label=r"$f$ with window")
ax.set_xlabel(r"$t$")
ax.set_ylabel(r"$f$")
ax.legend()
ax.grid()

ax = plt.subplot(2, 2, 2)
ax.plot(t, window, 'C1x-')
ax.set_xlabel(r"$t$")
ax.set_ylabel(r"window function")
ax.grid()

f_h_mod = np.fft.rfft(f * window)

# force wavenumber array to be a float, then replace zero entry (because 1e-16 is not an integer)
k_vec = np.arange(len(f_h), dtype=np.float64)
k_vec[0] = 1e-16

ax = plt.subplot(2, 2, 3)
ax.semilogy(k_vec, abs(f_h)**2, 'o-', markersize=12, label=r"orig")
ax.semilogy(k_vec, abs(f_h_mod)**2, 'o-', markersize=12, label=r"windowed")
ax.set_xticks(np.arange(N/2+1))
ax.set_xlabel(r"$k$ (radians per time unit)")
ax.set_ylabel(r"$\hat{f}$")
ax.legend()
ax.grid()

# don't plot the k=0 mode
ax = plt.subplot(2, 2, 4)
ax.semilogy(2.0*np.pi / k_vec[1::], abs(f_h[1::])**2, 'o-', markersize=12, label=r"orig")
ax.semilogy(2.0*np.pi / k_vec[1::], abs(f_h_mod[1::])**2, 'o-', markersize=12, label=r"windowed")
ax.set_xlabel(r"$T$ (time unit)")
ax.legend()
ax.grid()

What the window function does in this case is to not touch the signal in the interior and kill towards the edges. For this example you can see that there is still power at $k=0$ because of the presence of the trend, but I didn't plot the associated period on purpose. The wave 5 is still being picked up by the power spectrum, though slightly damped by the window. Notice in this case the main action in applying the window function is to kill the spectrum at the higher wavenumbers in a certain fashion. This is not a coincidence, and window functions are designed to have certain properties on the power spectrum in the frequency domain.

> <span style="color:red">**Q.**</span> There are other window functions available in `scipy.signal`. Look some of these up (the [Wikipedia page](https://en.wikipedia.org/wiki/Window_function) is a good background reference) and try it on some signals and see what effects these might have, particularly with respect to the resulting power spectra.

> <span style="color:red">**Q.**</span> Consider making the signal longer and make a window function that is not as long as the whole signal. Apply the window function to segments of the signal and do a similar investigation to the above, to get at the spectral content of some specific time-periods.
>
> NOTE: This is really how window functions were traditionally used. Notice if you do it this way you will have multiple samples of the power spectrum, from which you can do statistical analyses on it, as opposed to when you do a standard Fourier analysis where you use the whole signal.

## Non-smooth + discontinuous signals

Non-smooth means formally some of the derivatives of a signal are infinite/ill-defined. Discontinuous means jumps, so the cursed doggo below for example is discontinuous.

<img src="https://i.imgur.com/UpcdmZP.jpg" width="400" alt='discontinuity'>

In the first case I will provide a signal that is clearly not continuous, pass that signal through a FFT, then compute the power spectrum (but skipping the $k=0$ mode). Then I am going to pass the transformed signal through an IFFT (so you should just get the same signal back). For demonstration, on top of the above, I am going to wipe out one or two of the highest frequencies after the FFT, and overlay the resulting signal. The below code shows the results.

In [None]:
N, L = 30, 2.0*np.pi
t = np.linspace(0, L, N, endpoint=False)

scale_factor = 2.0 * np.pi / L

# creating a square wave
f = np.zeros(N)
for i in range(N):
    if t[i] < L/2:
        f[i] = -1.0
    else:
        f[i] = 1.0
        
# transform and make a copy with one or two highest frequency modes removed
f_h = np.fft.rfft(f)
f_h_mod = copy.deepcopy(f_h)
f_h_mod[-1::] = 1e-10

k_vec = np.arange(len(f_h))

fig = plt.figure(figsize=(16, 3))
ax = plt.subplot(1, 3, 1)
ax.plot(t[:int(N/2):], f[:int(N/2):], 'C0x-')
ax.plot(t[int(N/2)::], f[int(N/2)::], 'C0x-')
ax.set_xlabel(r"$t$")
ax.set_ylabel(r"$f$")
ax.set_title(r"original signal")
ax.grid()

ax = plt.subplot(1, 3, 2)
ax.semilogy(k_vec[1::], abs(f_h[1::])**2, 'o-', markersize=12)
ax.semilogy(k_vec[1::], abs(f_h_mod[1::])**2, 'o-', markersize=12)
ax.set_xlabel(r"$k$ (radians per time unit)")
ax.set_ylabel(r"$\hat{f}$")
ax.grid()

ax = plt.subplot(1, 3, 3)
ax.plot(t, np.fft.irfft(f_h), 'C0-', label=r"orignal")
ax.plot(t, np.fft.irfft(f_h_mod), 'C1-', label=r"reconstructed")
ax.set_xlabel(r"$k$ (radians per time unit)")
ax.set_ylabel(r"$\hat{f}$")
ax.legend()
ax.grid()

The power spectrum in this case turns out to consist only of odd wavenumbers (as well as only consisting of sines; it's not actually very hard to argue or show this analytically but this isn't really the focus here), and an IFFT hitting the full spectrum actually is able to recover the original signal pretty well, despite the fact that the original signal is non-smooth (it's not even continuous), while the Fourier basis is smooth. 

This partly demonstrates the power of the Fourier analysis: you can in practice use it for fairly rough signals. The thing to be aware of is that if signals are rough, one probably requires a lot of information to reconstruct the signal. In the case above, wiping out only one mode (it doesn't actually matter which one I wipe out, and I could actually just modify the amplitude of any mode slightly) leads to a "ringing" sawtooth signal in the reconstructed signal. This is a demonstration of the **Gibbs phenomenon**, where finite approximations of a discontinuous signal by a Fourier representation leads to an oscillatory pattern. With the full spectrum, all the oscillators miraculously cancel out with each other, but any slight modification to the spectrum will break the delicate balance. 

This is something to be aware of, and a symptom of non-smooth signals would be a power spectrum that does not decay exponentially (it can be shown mathematically if a signal is smooth then the Fourier amplitudes decay exponentially). One has to be a bit careful if Fourier modes are to be used as a filtering tool, since this could introduce noise in the reconstructed signal.

Another case of formally non-smooth signals would be in the presence of noise. Below is a case where I introduce a only a little bit of random noise in to the signal (through adding random numbers), and I compute the power spectrum as usual.

In [None]:
N, L = 32, 2.0*np.pi
t = np.linspace(0, L, N, endpoint=False)

scale_factor = 2.0 * np.pi / L

np.random.seed(4167)

f = np.zeros(N)
for i in range(16):
    amp  = np.random.rand()
    wnum = np.random.randint(low=-N/4, high=N/4) * scale_factor
    if i % 2 == 0:
        f += amp * np.sin(wnum * t)
    else:
        f += amp * np.cos(wnum * t)
        
noise = 0.5 * (2.0 * np.random.rand(N) - 1.0)

# transform and make a copy with some higher frequency modes removed
f_h = np.fft.rfft(f)
f_h_mod = np.fft.rfft(f + noise)

k_vec = np.arange(len(f_h))

fig = plt.figure(figsize=(10, 3))
ax = plt.subplot(1, 2, 1)
ax.plot(t, f, 'C0x-', alpha=0.7)
ax.plot(t, f + noise, 'C1x-')
ax.set_xlabel(r"$t$")
ax.set_ylabel(r"$f$")
ax.grid()

ax = plt.subplot(1, 2, 2)
ax.semilogy(k_vec[1::], abs(f_h[1::])**2, 'C0o-', markersize=12, label=r"original")
ax.semilogy(k_vec[1::], abs(f_h_mod[1::])**2, 'C1o-', markersize=12, label=r"noisy")
ax.set_xlabel(r"$k$ (radians per time unit)")
ax.set_ylabel(r"$\hat{f}$")
ax.grid()

What you see here is that while the time domain signal is not modified that much, the power spectrum magnitudes are substantially different. In the original case where I construct Fourier modes purely out of sines and cosines, the modes where there is no power really has no power, while in the presence of noise there is power everywhere, though the dominant peaks are still where we expect them to be.

One possibility is to low-pass the signal bit through a moving average, which will average out the noise a little, and kill some of the power at the higher frequencies for sure. Try this for yourself maybe here or in later exercises.

--------------------
# e) Example with real data: Tide gauge data

Going to apply what we have introduced to some real data. The file `Tobermory_20160430_20161231.csv` consists of tide gauge data of the sea level (above datum) at the town of Tobermory on the island of Mull in Scotland, from May to December of 2016, with records at every 15 minutes. The data is a small segment of a longer time series that was originally obtained from [BODC](https://www.bodc.ac.uk/).

<img src="https://i.imgur.com/O5wICRf.jpg" width="800" alt='Tobermory'>

(Modified photos of Tobermory demonstrating tides; left is from the www.thechaoticscot.com, the right photo is one of mine.)

The full file itself has lots of other entries, but for our case we only need the date and the values, which `Pandas` can do for us. Lets plot it out as well to see what it looks like.

In [None]:
# the file is kind of big, but only need the date and value from it
df = pd.read_csv("./Tobermory_20160430_20161231.csv", skipinitialspace=True, usecols=["Date", "Data value"])

# raw plot to see what it looks like

fig = plt.figure(figsize=(10, 3))
ax = plt.axes()
df.plot(x="Date", ax=ax)
ax.grid()

A few things to note:

* Over this rather short period in time I am going to assume the trends in sea level is very small, so I am not going to detrend the data (and we are going to be ignoring the $k=0$ mode anyway). 

* The data over this long period can be assumed to be well-resolved, and as long as we are looking at signal periods quite a bit larger than 15 minutes.

> NOTE: In this case we cannot get anything below the 30 minute period anyway, since this is below the Nyquist sampling rate.

* The signal is probably not strictly periodic, but the artifacts will be power in the very high wavenumbers, which correspond to the very low periods.

We can proceed with the usual analysis in computing for the power spectrum. Lets just do a raw one for now and not bother about rescaling, to see what the power spectrum looks like first.

In [None]:
f_h = np.fft.rfft(df["Data value"])
k_vec = np.arange(len(f_h))  # create wavenumber/frequency array, not scaled correctly

# raw plot of spectrum to see shape
fig = plt.figure(figsize=(12, 3))
ax = plt.subplot(1, 2, 1)
ax.plot(k_vec, abs(f_h)**2)
ax.set_xlabel(r"k")
ax.set_ylabel(r"$|f_h|$")
ax.grid()

ax = plt.subplot(1, 2, 2)
ax.loglog(k_vec, abs(f_h)**2)
ax.set_xlabel(r"k")
ax.set_ylabel(r"$|f_h|$")
ax.grid()

First thing to note is that the linear plot basically gives you nothing very useful, as the data spans a massive range. The loglog graph is much better, and you can see some distinct peaks.

> <span style="color:red">**Q.**</span> Think before you compute: what do you think these peaks correspond to? In particular, what do you think the largest peak corresponds to? (Hint: what generates tides on Earth?) 

The above will be an important check to make sure we are converting from wavenumbers to frequencies and periods properly. Below we first convert to frequency (in units of Hz) and to period (in units of seconds); I'm going to drop the linear graph. For re-scaling, we don't particularly care about the shifts in time, but we do need to know the length of the time period, which is first entry in time minus last entry in time.

> NOTE: I dropped the zero wavenumber data by making the zero wavenumber a NaN (`np.nan`). Plots generally to ignore the NaN entries.

In [None]:
# convert the dates from strings to something like numbers so we can do subtractions on
t0 = datetime.strptime(df["Date"].values[ 0], "%Y/%m/%d %H:%M:%S")
tf = datetime.strptime(df["Date"].values[-1], "%Y/%m/%d %H:%M:%S")
L = (tf - t0).total_seconds()  # work out time difference, and convert from days + seconds to just seconds

scale_factor = 2.0*np.pi / L

k_vec = np.arange(len(f_h), dtype=np.float64)
k_vec[0] = np.nan                 # just going to NaN this and not bother with the zero mode
k_mod = k_vec * scale_factor      # properly scaled wavenumber for the choice of L
freq_vec = k_mod / (2.0*np.pi)    # frequency from the scaled wavenumber
peri_vec = 2.0*np.pi / k_mod      # period    from the scale wavenumber

# raw plot of spectrum to see shape
fig = plt.figure(figsize=(12, 3))
ax = plt.subplot(1, 2, 1)
ax.loglog(freq_vec, abs(f_h)**2)
ax.set_xlabel(r"$\mathsf{f}$ (Hz)")
ax.set_ylabel(r"$|f_h|^2$")
ax.grid()

ax = plt.subplot(1, 2, 2)
ax.loglog(peri_vec, abs(f_h)**2)
ax.set_xlabel(r"$T\ (\mathrm{s})$")
ax.grid()

So far so good, but maybe we want things in larger units than seconds. Lets try it in units of days.

In [None]:
# convert the dates from strings to something like numbers so we can do subtractions on
t0 = datetime.strptime(df["Date"].values[ 0], "%Y/%m/%d %H:%M:%S")
tf = datetime.strptime(df["Date"].values[-1], "%Y/%m/%d %H:%M:%S")
L = (tf - t0).total_seconds()  # work out time difference, and convert from days + seconds to just seconds

L /= (24 * 3600)  # from s to day, since 24 * 60 * 60 seconds in one day

scale_factor = 2.0*np.pi / L

k_vec = np.arange(len(f_h), dtype=np.float64)
k_vec[0] = np.nan                 # just going to NaN this and not bother with the zero mode
k_mod = k_vec * scale_factor      # properly scaled wavenumber for the choice of L
freq_vec = k_mod / (2.0*np.pi)    # frequency from the scaled wavenumber
peri_vec = 2.0*np.pi / k_mod      # period    from the scale wavenumber

# M2 semi-diurnal tide is twice daily
M2_freq = 2

# raw plot of spectrum to see shape
fig = plt.figure(figsize=(12, 3))
ax = plt.subplot(1, 2, 1)
ax.loglog(freq_vec, abs(f_h)**2)
ax.plot([M2_freq, M2_freq], [1e-5, 1e7], 'k--', alpha=0.7)  # plot the expect M2 tide frequency as well
ax.set_xlabel(r"$\mathsf{f}\ (\mathrm{day}^{-1})$")
ax.set_ylabel(r"$|f_h|^2$")
ax.grid()

# add the tick in
xt = ax.get_xticks() 
xt = np.append(xt, M2_freq)
xtl= xt.tolist()
xtl[-1]=r"M2"
ax.set_xticks(xt)
ax.set_xticklabels(xtl)
ax.set_xlim([1e-2, 1e2]);

# 1 / frequency is the period
M2_peri = 1 / M2_freq

ax = plt.subplot(1, 2, 2)
ax.loglog(peri_vec, abs(f_h)**2)
ax.plot([M2_peri, M2_peri], [1e-5, 1e7], 'k--', alpha=0.7)  # plot the expect M2 tide frequency as well
ax.set_xlabel(r"$T\ (\mathrm{day})$")
ax.grid()

# add the tick in
xt = ax.get_xticks() 
xt = np.append(xt, M2_peri)
xtl= xt.tolist()
xtl[-1]=r"M2"
ax.set_xticks(xt)
ax.set_xticklabels(xtl)
ax.set_xlim([1e-2, 1e2]);

So the place where you have the principal lunar semi-diurnal (i.e. twice daily) tide M2 is exactly where you have the largest peak, which should be the case, because the M2 signal is the dominant contribution of tides on Earth.

> <span style="color:red">**Q.**</span> The eagle eyed among you may notice the place where I marked on the M2 tide is not coinciding with the peak exactly. One reason is that for simplicity above I put the M2 tide to be twice daily, but in reality it is a bit more than 12 hours. Fix the value of the M2 frequency (or vice-versa do it for the period) and see if the agreement is better.

> <span style="color:red">**Q.**</span> Look up the other astronomical tidal modes (K1, S2, O1 etc.; see e.g. OCES 2003, lec 18 slides) and plot some of those on the graph too.

> <span style="color:red">**Q.**</span> (Coding) Like above, but instead bully the computer to pick out the say the period associated with the 10 largest peaks in the power spectrum, and compare these side by side with the 10 largest tidal modes we expect.

> <span style="color:red">**Q.**</span> Consider doing some filtering of the signals based on Fourier modes.

> <span style="color:red">**Q.**</span> (Harder, coding) As above two tasks, but filter thesignal such that you only retain the 10 modes with the largest amplitudes.

> <span style="color:red">**Q.**</span> (Slightly different, not really Fourier series related.) Using the codes you already have, try and reproduce the following graph (you can ignore the formatting somewhat). The plots are of *daily max and mins of sea level* (i.e. max or min of sea level over a particular day). The graph shows the prescence of **spring tides** and **neap tides**.

<img src="https://i.imgur.com/JzBm8ok.png" width="800" alt='springs and neaps'>

We will come back to Fourier analysis again when dealing with spatial data *10_fun_with_maps*.

----------------
# More involved exercises with this notebook

## 1) El Nino 3.4 data (again)

Apply what you learnt so far to the El-Nino 3.4 data (probably start with SST first). The code to read the SST data and generate the associated time array has been done for you below. Few things to remember and to try are:

* The time units here are in *years* because of how I made the time array (see *07_time_series*).
* Since the data series is long, there is a global warming trend in the signal that you probably need to deal with.
* El Nino normally occurs between 2 to 7 *years*.
* Consider filtering out anything that is outside of the El Nino range to see what might be the associated El Nino signal on SST / chlorophyll / phytoplankton / OLR.
* There are some signals outside of the El Nino range that are physical (e.g. the 1 year one, the 0.5 year one, 10-20 year ones), may want to speculate what those are.
* Consider filtering in physical space first, before doing spectrum analysis, or vice-versa.
* Consider windowing chunks of the signal and computing the power spectrum of the windowed signal. Now that you have power spectrum sample over different chunks of time, consider doing statistics on these chunks in time (e.g. accumulate the spectrum, computing some sort of "average" of spectrum). Maybe even cook up some hypothesis tests (cf. *05/06_statistical_tests*).

In [None]:
# !wget https://raw.githubusercontent.com/julianmak/OCES3301_data_analysis/main/elnino34_sst.data

with open("elnino34_sst.data", "r") as f:
    elnino34_txt = f.readlines()
elnino34_txt = elnino34_txt[3:-4]
for k in range(len(elnino34_txt)):
    elnino34_txt[k] = elnino34_txt[k].strip("\n")

elnino34_txt[0].split()

elnino34_sst = []
for k in range(len(elnino34_txt)):           # this is the new elnino34_txt after stripping out some lines
    dummy = elnino34_txt[k].split()          # split out the entries per line
    for i in range(1, len(dummy)):           # cycle through the dummy list but skip the first entry
        elnino34_sst.append(float(dummy[i])) # turn string into a float, then add to list

elnino34_sst = np.array(elnino34_sst)

t = np.linspace(1950, 2019, len(elnino34_sst), endpoint=False)