## Preamble
This is designed as a codified version of my notes on BI, and to demonstrate some python implementation of different noise models as likelihood functions. Useful references:

[GW BI Introduction Paper](https://arxiv.org/pdf/1809.02293)

[LIGO Noise Specs (some more good BI here)](https://arxiv.org/pdf/1908.11170)

In [None]:
import plotfancy as pf
import numpy as np
pf.housestyle_rcparams()

True

## General Gaussian Case ##

In the most general case, we relate our observed data $\textbf{X}$ from some model prediction $\boldsymbol{\mu}(\boldsymbol{\theta})$, based on model parameters $\boldsymbol{\theta}$ such that
\begin{equation}
\textbf{X} = \boldsymbol{\mu}(\boldsymbol{\theta}) + \boldsymbol{\mathscr{N}}
\end{equation}
for some noise vector $\boldsymbol{\mathscr{N}}$. The most general Gaussian case models $\boldsymbol{\mathscr{N}}$ as a multivariate gaussian - a vector whose elements are correlated, encoded with a covariance matrix, $\mathscr{C}$, relating the variance of each element
\begin{equation}
\mathscr{C} =
\begin{bmatrix}
\sigma^2(1) & \sigma(1,2) & ... & \sigma(1,K) \\
\sigma(2,1) & \sigma^2(2) & ... & \sigma(2,K) \\ 
... & ... & ... & ... \\
\sigma(K,1) & \sigma(K,2) & ... & \sigma^2(K)
\end{bmatrix}
\end{equation}
for $K = \texttt{dim}(\textbf{X})$ and $i,j\in\mathbb{Z}^+\leq K$ for $\mathscr{C}_{i,j}=\sigma(i,j)$ covariance. We establish the notation convention here that $\textbf{X}$, $\boldsymbol{\mu}(\boldsymbol{\theta})$, and $\boldsymbol{\mathscr{N}}$ are *frequency-domain* functions for the data, model and noise hereonwards unless explicitely noted otherwise as time-series.


Calculating the likelihood, which by definition follows $\mathbb{P}(\textbf{X}|\boldsymbol{\mu})$, here is equivalent to some $\mathbb{P}_{\boldsymbol{\mathscr{N}}}(\textbf{X} - \boldsymbol{\mu}(\boldsymbol{\theta}))$ conditional probability, which follows from the first equation as our noise distribution;
\begin{equation}
\mathbb{P}(\textbf{X}|\boldsymbol{\mu}) =  \mathbb{P}_{\boldsymbol{\mathscr{N}}}(\textbf{X} - \boldsymbol{\mu}(\boldsymbol{\theta})) = \mathbb{P}_{\boldsymbol{\mathscr{N}}}(\boldsymbol{\mathscr{N}})
\end{equation}

Calculating this in a covariant case is non-trivial - we cannot simply combine probabilities through multiplication - instead we get a multivariate Gaussian of the form

\begin{equation}
\mathbb{P}_{MG} = \frac{1}{[(2\pi)^K\textnormal{det}(\mathscr{C})]^{1/2}} \exp \bigg[ -\frac{1}{2} [\textbf{X} - \boldsymbol{\mu}(\boldsymbol{\theta})]^T \mathscr{C}^{-1} [\textbf{X} - \boldsymbol{\mu}(\boldsymbol{\theta})] \bigg]
\end{equation}

We might model this most general case as a function:

In [None]:
# global params #
fr = 100 # the number of frequency bins we'd have

# FIRST FIND COVARIANCE - Observing to create n frequency profiles of the noise #
noiseSamples = np.ones([n,fr]) # Dummy array
sampleMean = np.average(noiseSamples, axis=0)
cen = noiseSamples - sampleMean
cov = 1/(fr) * (np.transpose(cen) @ cen)

# We might then test this against some data... #

data = np.ones([1,fr]) 
model = np.ones([1,fr]) # Dummy arrays we might observe from an experiment.
n = data-model

likelihood = 1/(2*np.pi*np.linalg.det(cov))**(fr/2) * np.exp(-0.5*np.transpose(n)@np.linalg.inv(n)@n)

### We can, however, simplify in special cases:

The next simplest case is stationary noise; using $N(\mu,\sigma^2)$ to represent a normal distribution we express this case as noise following
\begin{equation}
\boldsymbol{\mathscr{N}}(t) \sim
\begin{bmatrix}
N(0,\sigma^2_1) \\ N(0,\sigma^2_2) \\ ... \\ N(0,\sigma^2_K)
\end{bmatrix}
\end{equation}

After the Fourier transform it follows that this then becomes some complex-valued noise - we make the assumption (easily verifiable by plotting the im-real correlation) that the imaginary and real parts are independently stochastic such that

\begin{equation}
\boldsymbol{\mathscr{N}} = \mathfrak{Re}(\boldsymbol{\mathscr{N}}) + i\mathfrak{Im}(\boldsymbol{\mathscr{N}}),
\end{equation}
more verbosely that
\begin{equation}
\boldsymbol{\mathscr{N}}(f) \sim
\begin{bmatrix}
N(0,\sigma^2_1) \\ N(0,\sigma^2_2) \\ ... \\ N(0,\sigma^2_K)
\end{bmatrix}
+ i
\begin{bmatrix}
N(0,\sigma^2_1) \\ N(0,\sigma^2_2) \\ ... \\ N(0,\sigma^2_K)
\end{bmatrix}
\end{equation}

in the frequency domain. Note that the real and imaginary elements must be thought as independently sampled from their respective distributions $N(0,\sigma^2_i)$: this is **not** $\mathscr{N}_i=N(0,\sigma^2_i)(1+i)$. This distribution has a diagonal covariance matrix and the frequency domain noise transforms to autocorrelated noise with some characteristic $\textbf{C}(t_i-t_j)$ correlation function in the time domain. We express $\sigma^2_i = S_n(f_i)$ where $S_n(f_i)$ is the power spectral density at some frequency bin $f_i$, estimated by taking $S_n(f_i) = \texttt{fft}(\boldsymbol{\mathscr{N}}(\tau))$ where $\tau$ is some characteristic $t_i\to t_j$ time period and $\boldsymbol{\mathscr{N}}(\tau)$ is the *time-series* noise. For this simplified covariance matrix 

\begin{equation}
\mathscr{C} =
\begin{bmatrix}
\sigma^2_1 & 0 & ... & 0 \\
0 & \sigma^2_2 & ... & 0 \\ 
... & ... & ... & ... \\ 
0 & 0 & ... & \sigma^2_K
\end{bmatrix}
\end{equation}

the determinant of this matrix then follows $\textnormal{det}(C) = \prod^K_i \sigma_i$ and the exponent's argument simplifies to $-\frac{1}{2}\sum^K_i [X_i-\mu(\boldsymbol{\theta})_i]/\sigma^2_i$ giving a simplified likelihood of
\begin{equation}
\mathbb{P}_{\textnormal{stat}} = \frac{1}{[\prod^K_i 2\pi\sigma_i]^{1/2}} \exp \bigg[ -\frac{1}{2} \sum^K_i \frac{[X-\mu(\boldsymbol{\theta})]^2_i}{\sigma^2_i} \bigg]
\end{equation}

Given this is now separable we can turn this into a *log likelihood* that follows
\begin{equation}
\log[\mathbb{P}_{\textnormal{stat}}(\textbf{X}|\boldsymbol{\theta})] = \sum^K_i\bigg[\frac{1}{2\pi\sigma_i} [X-\mu(\boldsymbol{\theta})]^2_i - \frac{1}{2} \log[{2\pi\sigma_i^2}]\bigg]
\end{equation}

Note that in literature the $\frac{1}{2} \log[{2\pi\sigma_i^2}]$ term is often omitted as it cancels under addition of likelihoods to create a (log) Bayes factor. With the exception of *glitches*, (short duration transients) and *adiabatic drift* (minute-to-hour drifts in the power spectrum), the noise at LIGO can be approximated as stationary. We might model this case as: