# Notes on Statistical Inferencing

### Probability
* __Multivariate Gaussian density__ (this is similar to the exponential of a single variable gaussian PDF)
    <blockquote>$$
    p(\vec{x}) \propto \exp \left [ - \frac{1}{2} (\vec{x} -
        \vec{\mu})^\mathrm{T} \, \Sigma ^{-1} \, (\vec{x} - \vec{\mu})
        \right ]$$

    where $\vec{\mu}$ is an $N$-dimensional vector position of the mean of the density and $\Sigma$ is the square N-by-N covariance matrix.
    </blockquote>
* __covariance:__ measure of the relationship between two random variables
    * $\sigma(x,y)= \frac{1}{n-1}\sum_{i=1}^{n}(x_i-\bar{x})(y_i-\bar{y})$
* __variance:__ measure of the spread of a sample/population
    * $S^2 = \sum(x_i - \bar{x}) \ / \ n-1$ (for sample)
    * Also, the variance can be described as the covariance of a single variable $\sigma(x, x)$


### Ways to Interpolate data
* Most basic MCMC: the __Metropolis-Hastings__ algorithm
    * One chain, one walker
    * Likelihood function, priors, and proposal distribution (walker)
        * Data set has certain mean, variance, standard deviation
        * The walker finds the values in parameter space to probe (randomly)
        * Feeds the points to likelihood function (usually gaussian) and priors
            * Distribution of a likelihood function and priors
            * These are multiplied together
    * [More on Metropolis-Hastings](https://github.com/Joseph94m/MCMC/blob/master/MCMC.ipynb)
* __Linear Least squares:__ finding a line that best fits several linear equations
    * $A\vec{x}=b$ has least squares solutions $A^{T}A\vec{x}=A^Tb$ (normal equation)
    * The least squares solution to $A\vec{x}=b$ is $x^{*}=(A^TA)^{-1}A^Tb$
    * The function takes the form: $f_{\rm{lsfit}}(s) = x_1f_1(s) + ... + x_nf_n(s)$
    * __Linear Least Squares Polynomial Fitting:__ a way of finding a polynomial that best fits several coordinates
        * Requires __vandermonde matrix__ 
            * Matrix takes the form: $$\begin{equation*}
    A = \begin{pmatrix}
      1 & x_0 & x_0^2 & \cdots & x_0^{n-1} &
      x_0^{n}\\
      1 & x_1 & x_1^2 & \cdots & x_1^{n-1} &
      x_1^{n}\\
      \vdots & \vdots  & \vdots & \ddots & \vdots & \vdots\\
      1 & x_n & x_n^2 & \cdots & x_n^{n-1} &
      x_n^{n}
    \end{pmatrix},
    \end{equation*}$$
            * Function takes the form:
            $$(a_{1}x^{n-1}+a_{2}x^{n-2}... + a_{n-1}x+a_{n}) = y$$
            * Input: x, y points
            * Output: fitted polynomial
            * __How to (basic)__:
                * Points: (0, 1), (1, 0), (2/3, 1/2) ($n$ pairs of coordinates)
                * Each row of the matrix takes the form: $(a_{1}x^{n-1}+a_{2}x^{n-2}... + a_{n-1}x+a_{n}) = y$
                    * EX: First pair of coordinates would be: $(a_{1}0^{2}+a_{2}0^{1}+a_{3}0^0) =1$
                    * Forms augmented matrix:
                        * $\begin{pmatrix} 1&0&0&1\\ 1&1&1&0\\ 1&\frac{2}{3}&\frac{4}{9}&\frac{1}{2}\end{pmatrix} \ $ or, more familiarly,  $\ \begin{pmatrix}1&0&0\\ 1&1&1\\ 1&\frac{2}{3}&\frac{4}{9}\end{pmatrix}\begin{pmatrix}a_0\\ a_1\\ a_2\end{pmatrix}=\begin{pmatrix}1\\ 0\\ \frac{1}{2}\end{pmatrix}$
                    * Solution: $= -\frac{3}{4}x^2-\frac{1}{4}x+1$
    * Code (from [`emcee` notebook](https://emcee.readthedocs.io/en/stable/tutorials/line/)):
```python
A = np.vander(x, 2) # make vandermonde matrices (shape 50, 2) out of x values
C = np.diag(yerr * yerr) # make diagonal matrix out of square of y-errors
ATA = np.dot(A.T, A / (yerr ** 2)[:, None]) #finds A^T•A and divides by error squared (C) shape(2,2)
cov = np.linalg.inv(ATA) #inverse
w = np.linalg.solve(ATA, np.dot(A.T, y / yerr ** 2)) #this is least-squares solution
print("Least-squares estimates:")
print("m = {0:.3f} ± {1:.3f}".format(w[0], np.sqrt(cov[0, 0]))) #error is sqrt of diagonal of covariance matrix
print("b = {0:.3f} ± {1:.3f}".format(w[1], np.sqrt(cov[1, 1])))
```
    * In math notation:
        * $AX=Y$ with solution $w=x^*=[A^T C^{-1} A]^{-1}[A^TC^{-1}Y]$
            * $A$ = vandermonde matrix
            * $C^{-1}$ = divided by __covariance matrix__: $C(m, b) = \begin{pmatrix}\sigma_{b}^2&\sigma_{mb}^2\\ \sigma_{mb}^2&\sigma_{m}^2\end{pmatrix}$
                * in the code, $C$ is determined by the diagonal of the error in y
            * $Y$ = distribution of y values (with error and added fractional uncertainty)



### 9/14 Meeting Notes

* Dr. Andrews/Teague have been working on statistical inferencing of the spectral line model
    * __Statistical inferencing:__ statistical techniques (in this case, Bayesian analysis) that tell us how well the model measures individual parameters ($m_{*}$, $\rm{T}_b$, $\Delta v$, etc.)
        * Posterior Probability- Bayesian probability of the accuracy of the models given data/outside information
            * __Bayes Theorem:__ $$\Pr ( M \mid D ) = \Pr(D \mid M) * \Pr(M) \ / \Pr(D)$$
            * _In words: The probability of the model given the data is equal to the probability of the data given the model (__likelihood__) multiplied by the probability of the data (__prior__) divided by the probability of the data (__Bayesian evidence__)_
                * __likelihood:__ ($\rm{L}$) The probability of the model given the data
                    * statistical inferencing is done in natural log because it is assumed that probabilities are gaussian, and the natural log of a gaussian is ~ roughly ~ the area under the curve
                    * Found through __chi-squared analysis__
                        * $\chi^2=(D-M)^2 \ / \ (\rm{noise})$
                            * roughly, subtract model channel maps from data channel maps
                        * $\ln(\rm{L}) = \Sigma_{i=0} = -\chi^2 \ / \ 2$ (__double check__)
                            * sum over visibilities (for now, all channel maps for data and model)
                * __prior:__ $(\Pr(M))$ the assumptions we make about the data
                    * assumptions like the range of the data (e.g. $0-1000 \rm{K}$), the gaussian distribution of the data (e.g. $370 \pm 10 \ \rm{K}$)
                * __Bayesian evidence:__ $(\Pr(D))$ hard to measure, but not necessary
                    * __MCMC algorithms__ do not need this quantity to work
        * In reality, $\Pr ( M \mid D ) \sim \Pr(D \mid M) * \Pr(M) \ / \Pr(D)$
            * Because $\Pr(D)$ is hard to measure, we use __Markov Chain Monte Carlo (MCMC)__ to weigh this bayesian probability equation in relation to proportions of the numerator, not exact values
            * __Markov Chain Monte Carlo__  uses a __Markov chain__ to sample probability at random points (random walks)
                * A __Monte Carlo__ algorithm is a way of randomly sampling a distribution to estimate the distributions of parameters given a set of observations
                * A __Markov chain__ looks at the ratio of probabilities between current chain and last chain to determine where to sample probability next
                    * More specifically, the next sample is determined by the correlation between the area of the PDF and the ratio
                        * Algorithm finds ratios that contribute significantly to the area of the PDF
                * A variant of MCMC (that will most likely be the algorithm used on models) is the __affine invariant ensemble sampler__ (`emcee` package)
                    * Essentially, multiple Markov chains occuring at once for faster processing
                        * Can specify number of chains

### 9/21 Notes
* Code that provides an output for 'fake data' `analyze_fit_emcee.py`
    * Goal: do we recover what we put in? We know the truth for the parameters, but does emcee handle everything correctly?
    * Fake data is generated with _no noise_ (no noise meaning that the typical noise from an interferometer is not accounted for (in the form of gaussians), only vague uncertainties
    * Code is slow on two accounts:
        * Making a cube to store data
            * __broadcasting__ arrays: handling arrays of different sizes for computation (NumPy)
                * i.e. matrix * scalar number (larger than 1)
            * computationally taxing
        * Post processing/interpolation for 1e6 different arrays for the likelihood calculation
        * Is possible to make it faster using Numba
            * Basically takes python code and makes it faster—cluster/machine optimized
    
* __Next steps:__
    * Testing a real dataset
        * Advantage: figure out what the errors are, sooner
        * Disadvantage: we don't know 'the truth'/what we're looking for, so we are subject to lots of biases
    * Technical biases
        * Training data to conform to the truth
        * make certain calibrations to improve computational speed
            * time-averaging: taking larger samples in time (i.e not ever 6 seconds, but every 30 seconds)
                * Question: Does it make sense to time-average? Do we still retain the same amount of information? __how to optimize__

* __Long term:__
    * Is it best to find out ideal conditions to collect interferometer data to probe for disk masses?
    * Or, is it best to figure out what a model of dynamical masses actually looks like and the errors associated with it?
    * Exploring both simultaneously (Teague, Andrews, and myself)

* __Miscellaneous:__
    * Spectral resolution: frequency/velocity at which you take measurements of the sky with interferometry
        * One technique to increase resolution: VLBI (Very-Long Baseline Interferometry)
    * Optimum data collection: higher spectral resolution
        * Caveats:
            * Telescope time is competitive
                * Observations fall into two camps:
                    * Snapshot (1 sec-1 min of sky)
                    * Medium length (20-30 minutes)
                    * Long (1-5 hrs)
            * Higher spectral resolution = higher noise (similar to fourier transform analysis)
    * Spectral continuum = 1 channel map



## Project Timeline
---
* __Week 11:__ Fundamentals of Statistical Inferencing
    * Goal: learn how to use `emcee` and more about MCMC
* __Week 12:__ Fake data run through `emcee` and interferometry primer
    * Goal: Learn more about interferometry
    * Clean up analysis script/work on visualizations