# Multi-dimensional Scaling (MDS)

Author: Matt Smart

[Overview](#linkOverview)  
[Details](#linkDetails)  
[Algorithm](#linkAlgorithm)  
[Example](#linkExample)  
[Resources](#linkResources)  

### Overview <a id='linkOverview'></a>
- Non-linear dimension reduction technique  
- Rough idea - given high dimensional data $X$, find a lower dimensional representation $Y$ such that the global distance structure is preserved
- Two subtypes of MDS: metric (quantitative) and non-metric (qualitative)
- Common "first resort" technique, like PCA

### Details <a id='linkDetails'></a>

#### Metric MDS
Setup:
- Suppose one has $p$ samples of N-dimensional data points, $x_i\in\mathbb{R}^N$
- Store these samples columnwise as $X\in\mathbb{R}^{N\,\times\,p}$
- We call this the original data matrix, or simply the data
- Assumption: there is a meaningful metric (e.g. Euclidean distance) on the data space (high dim)
- Assumption: there is a meaningful metric (e.g. Euclidean distance) on the latent space (low dim)

Goal:
- Given N-dim data $X$, a metric $d(\cdot,\cdot)$ on $\mathbb{R}^N$, a target dimension $k<N$, and a metric $g(\cdot,\cdot)$ on $\mathbb{R}^k$
- FInd an embedding $Y\in\mathbb{R}^{k\,\times\,p}$ (i.e. a $y_i\in\mathbb{R}^k$ for each $x_i\in\mathbb{R}^N$) such that distances $d_{ij}$, $g_{ij}$ are preserved between representations

Objective function: $$Y^\ast=\operatorname*{arg\,max}_Y {\sum_{i<j}{w_{ij}\left|d_{ij}\left(X\right)-g_{ij}\left(Y\right)\right|}}$$

Notes:
- Define $f(W,X,Y)\equiv{\sum_{i<j}{w_{ij}\left|d_{ij}\left(X\right)-g_{ij}\left(Y\right)\right|}}$, then  $Y^\ast=\operatorname*{arg\,max}_Y f(W,X,Y)$
- Use the free weights $w_{ij}$ to specify the confidence (or precision) of $d_{ij}(X)$ measurements
- Can one show the objective function monontonically increases as target dimension decreases?
- Immediate solution degeneracy: if $Y^\ast$ is optimal, so is any rotation

Limitations:
- What would happen if we tried to embed an equilateral triangle in 2D into 1D?


#### Non-metric MDS
Setup:
- One has $p$ objects, $\{x_i\}_{i=1}^p$
- Assumption: there is a notion of dissimilarity between the objects
    - note this is weaker, or more general, than specifying a metric
    - e.g. a ranking of dissimilarities may be sufficient, but is clearly weaker than specifiying distance
- Assumption: one can construct a $p \times p$ dissimilarity matrix $D$ from the data

Goal: 
- Preserve ordination of the dissimilarity
- E.g. If $d_{12}\left(X\right)<d_{13}\left(X\right)$, then should have $d_{12}\left(Y\right)<d_{13}\left(Y\right)$

Notes:
- mention stress idea

#### Interpolating between them
Ch4 Cox text p1

### Algorithm <a id='linkAlgorithm'></a>

#### Metric MDS

Input:
- data $X\in\mathbb{R}^{N\,\times\,p}$
- an embedding or target dimension $1\leq k<N$
- a high-dim metric $d:\mathbb{R}^N \times \mathbb{R}^N\to R$
- a low-dim metric $d:\mathbb{R}^k \times \mathbb{R}^k\to R$
- optional: upper triangular weight matrix $W$ (default is all $w_{ij}=1$)

Initialize step: compute $D$, the $p\times p$ distance matrix using input data, $d_{ij}=d(x_i,x_j)$

Optimization:
- solve $Y^\ast=\operatorname*{arg\,max}_Y f(W,X,Y)$ by gradient descent

Output:
- embedding (k-dim representation) $Y\in\mathbb{R}^{k\,\times\,p}$

Runtime:
- MDS $\approx O\left(p^3\right)$  (where $p$ is the number of $\mathbb{R}^N$ data points)
- compare vs. e.g. PCA $\approx O(p^2)$

#### Non-metric MDS
...mention stress idea

### Example <a id='linkExample'></a>
one or more examples on applicable data  


### Resources <a id='linkResources'></a>
- Mehta et al., 2017. A high-bias, low-variance introduction to Machine Learning for physicists. https://arxiv.org/abs/1803.08823
- Cox and Cox, 2001. (MDS textbook, see Ch2, Ch3, Ch4)
- Borg and Groenen, 2005. (MDS textbook)