## Multidimensional Scaling Analysis

- Visial representation of similarity 
- Euclidean Distance = how far away two points are when plottedd as x,y coordinates

<img src='images/euclideandist.png' width="500">
[source] Gotelli and Ellison A Primer of Ecological Statistics


<img src='images/euclideandist3.png' width="500">
[source] Gotelli and Ellison A Primer of Ecological Statistics




$d_{ij} =||\chi_i-\chi_j||$

if $\chi_i = (x_{ij}, y_{ij}, z_{ij})$

### Bray Curtis Distance

- Often used for identifying differences in community composition based on abundance

$d= \frac{\Sigma|u_i-v_i|}{\Sigma(u_i+v_i)}$

if u and v are positive
0 < d < 1

In [3]:
import numpy as np
from scipy.spatial import distance

In [4]:
u = [415,200,310,411]

In [5]:
v = [615,100,330,203]

In [6]:
q = [614,101,331,202]

In [7]:
data = np.array([u,v,q])

In [8]:
print(data)

[[415 200 310 411]
 [615 100 330 203]
 [614 101 331 202]]


In [9]:
data.T

array([[415, 615, 614],
       [200, 100, 101],
       [310, 330, 331],
       [411, 203, 202]])

In [11]:
dist = distance.pdist(data,'braycurtis')

In [12]:
dist

array([ 0.20433437,  0.20433437,  0.00160256])

In [13]:
distmatrix = distance.squareform(dist)

In [14]:
distmatrix

array([[ 0.        ,  0.20433437,  0.20433437],
       [ 0.20433437,  0.        ,  0.00160256],
       [ 0.20433437,  0.00160256,  0.        ]])

### Other Measures of Distance

- Euclidean distance can give counter-intuitive results if data contains many zeros as in species count

<img src='images/otherdistances.png' width="500">
[source] Gotelli and Ellison A Primer of Ecological Statistics

<img src='images/otherdistances2.png' width="500">
[source] Gotelli and Ellison A Primer of Ecological Statistics

### MDS Multidimensional scaling analysis
1) "classical MDS
    -Torgerson MDS
    -PCoA principal coordinate analysis


Distance matrix is converted to a similarity matrix
same steps as PCA
   - compute eigenvectors and eigenvalues
   - same as PCA for Euclidean distances
   
Steps:
- create a data matrix
- compute a dissimilarity matrix, D, with elements $d_{ij}$
- transform the dissimilarity matrix
$d^*_{ij} = \frac{1}{2}d^2_{ij}$
- center the dissimilarity matrix
$\delta^*_{ij} = d^*_{ij}-d^*_{i}-d^*_{ij}+d^*$

- compute the eigenvectors and eigenvalues
- if the dissimilarity index is euclidean distance, this is mathematically equivalent to PCA


2) Metric MDS
- preserves rank of distance
- minimizes stress

<img src='images/pcoa.png' width="500">
<img src='images/pcoa_iterative.png' width="500">

[source] http://stats.stackexchange.com/questions/14002/whats-the-difference-between-principal-components-analysis-and-multidimensional


### Summary of comparison of PCA and MDS
- PCA - based on Euclidean distances, good for data without strong skew & data without outliers
- PCoA - use when other distance measures are appropriate equivalent to PCA when Euclidean distances used
- Non-metric multidimensional scaling (NMDS) - preserves ran order of distance rather than actual values (similar to many non-parametric statistics


### Clustering


<img src='images/clusters.png' width="800">
[source] http://scikit-learn.org/stable/modules/clustering.html


ANOSIM
PermANOVA
- determine whether groups of samples are significantly different
- are distances WITHIN the groups smaller than the differences BETWEEN groups


scikit-bio.org for python functions


### Auto-Correlation and Auto-Covariance

- Correlation/ Covariance of a time series with itself at different lags(amount of time the series is shifted)
- Dominant time scale 
    - Integral time scale

<img src='images/integralTS.png' width="500">
[Source] Emery and Thomson


    - decorrelation time scale (amount of time for autocorrelation to go to zero)
    - e-folding time scale (point where autocorrelation = 1/e)
- Effective degrees of freedom $N^* =\frac{ N\Delta t}{t_d}$ 
 where N = number of samples, $\Delta t$ is the time step between samples, and $t_d$ is the dominant time scale
 
#### Integral Time Scale
$tint = \Delta t \sum^N_{i=1}r(\tau_i)$

$r(\tau_i)$ = correlation coefficient at lag $\tau_i$

Effective degrees of freedom for cross-correlation (x and y) $N^* =\frac{ N}{\sum^N_{i=1}[r_{xx}(\tau_i)r_{yy}(\tau_i)+ r_{xy}(\tau_i)r_{xy}(\tau_i)]}$ 

### Spectral Analysis

- we have typically looked at signals plotted in the "time domain" (i.e. as a function of time)
- Spectal analysis allows us to look at the frequency of series

units

- original time series [cm]
- variance [$cm^2$]
- time unit [days]
- frequency [cpd] (cycles per day)

https://jackschaedler.github.io/circles-sines-signals/