# Dealing with Noise

Taking derivatives is particularly challenging in the presence of noise, because the peaky, pointy nature of random variations can cause dramamtic local deviations of function slope.

## Noise and Signal are Typically Frequency-Separable

> "Every spectrum of real noise falls off reasonably rapidly as you go to infinite frequencies, or else it would have infinite energy. But the sampling process aliases higher frequencies in lower ones, and the folding ... tends to produce a flat specturm. ... *white noise*. The signal, usually, is mainly in the lower frequencies."  
--Richard Hamming, *The Art of Doing Science and Engineering*, Digital Filters III

We expect our sampling rate to be fast enough to capture fluctuations of interest in the data, e.g. the Nyquist rate (2x the highest frequency of interest in the data), so most signal energy is expressed by relatively low frequencies. Noise causes fluctuations in each sample, tying it to the sampling rate: The more we sample, the more we can smear out the noise's spectrum across frequencies, hopefully putting most of its mass at *higher frequency* than the data. Thus we can separate data from noise in the frequency domain!

*All noise reduction methods are at bottom low-pass filters*: averaging, Kalman filtering, etc.

## Filtering with the Fourier Basis

A classic FIR or IIR low-pass filter from Signal Processing, like a Butterworth Filter, works from one end of the signal toward the other "causally", dampening higher-frequencies with only *local*, past and present samples, literally by taking some weighted combination of a few local input values and adding a weighted combination of a few past output values.

But if we have the whole history of a signal, then there is no need to constrain ourselves locally; we can transform the entire thing to a Fourier basis representation, where modes correspond to frequencies. Hence a common noise removal strategy is:

1. FFT the signal
2. Zero out higher Fourier modes/coefficients
3. IFFT to recover the filtered signal

This achieves an *idealized* lowpass filter, where we get perfect cutoff, as opposed to a causal filter where we instead get power rolloff of, e.g., 20dB/decade.

## Connection to Error Correcting Codes

Because a spectral representation builds a function out of basis functions that span the entire domain, every point takes the entire domain under consideration. This makes the reconstruction much more *robust* to perturbations than one that uses only a few neighboring points. I.e. it's much harder to *corrupt* the signal so thoroughly that it can't be successfully recovered.

This has an analog to [error-correcting codes](https://www.youtube.com/watch?v=X8jsijhllIA), except that in the context of continuous signals, "corruption" means (discrete representations of) continous numbers have slightly inaccurate values, whereas in error correcting codes each datum is a *single discrete bit* which can simply be *flipped*. But notice the bits corresponding to each subsequent parity check of a Hamming Code correspond to "higher-frequency" selections:

<img src="hamming.png" width=500 />

From Richard Hamming's book, *The Art of Doing Science and Engineering*, with my drawings in the margins.

In a spectral method, the coefficients say "You need to add in this much of the $k^{th}$ basis function, but in error correcting codes the analogous parity checks say "There is/isn't a parity error among my bits." In both cases a spot-corruption will stand out, because it appears in a particular combination of parity checks or introduces a value that can't be as easily/smoothly represented with a finite combination of basis functions.

## Filtering with More Exotic Bases

To avert Gibbs phenomenon or achieve better compression (more energy represented in fewer modes), we may prefer to use a basis of Chebyshev polynomials, PCA modes via SVD, wavelets, etc. to represent a signal.

All bases have "lower frequency" and "higher frequency" elements, meaning some modes have fewer transitions between low and high values, and some have more. As in Fourier basis representations, signal energy empirically tends to cluster in lower modes, and noise tends to be scattered across modes.

However, we have to be a bit more conscientious when using these modes to filter noise: If a basis function is higher-frequency over part of its domain, as Chebyshev polynomials are toward the edges of $[-1, 1]$, then that basis function is better at representing high-frequency noise in those regions, and we don't get band-separation quite as cleanly as with a uniform basis like Fourier.