# MDS from scratch

![Creative Commons License](https://i.creativecommons.org/l/by/4.0/88x31.png)  
This work by Jephian Lin is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.manifold import MDS

In [None]:
def dist_mtx(X1, X2):
    """
    Input:
        X1, X2: two datasets with the same number of features
    Output:
        an array of shape (N1, N2)  
        whose i,j-entry is the distance between X1[i] and X2[j]  
        where N1, N2 are the number of samples of X1, X2, respectively.
    """
    X1_col = X1[:, np.newaxis, :]
    X2_row = X2[np.newaxis, :, :]
    diff = X1_col - X2_row
    dist = np.sqrt( np.sum(diff**2, axis=-1) )
    return dist

## Algorithm
**Input:**  
- `X`: an array of shape `(N,d)` whose rows are samples and columns are features
- `r`: target dimension
- `n_iter=100`: number of iterations
- `verbose=0`: verbose level;  
if 0, say nothing;  
if 2, print the stress of each iteration.

**Output:**
- an array of shape `(N, r)`  

**Steps:**
1. Compute the distance matrix `goal` $=\begin{bmatrix}\delta_{ij}\end{bmatrix}$, where $\delta_{ij}$ is the distance between the $i$-th row and the $j$-th row of $X$.
2. Let $X_0$ be a random array of shape `(N, r)` .  
3. Let $d_{ij}(X_k)$ is the distance between the $i$-th row and the $j$-th row of $X_k$.  
Let $s_{ij}(X_k)$ be $-1/d_{ij}(X_k)$ if $d_{ij}(X_k)\neq 0$ and $0$ if $d_{ij}(X_k)= 0$.
3. Let $B(X_k)=\begin{bmatrix}\delta_{ij}s_{ij}(X_k)\end{bmatrix}$ and set the diagonal entries of $B$ so that $B$ has zero row sums.
3. Let $X_{k+1} = \frac{1}{N}B(X_k)X_k$.
4. Return $X_k$ with $k =$ `n_iter` .

## Pseudocode
Translate the algorithm into the pseudocode.  
This helps you to identify the parts that you don't know how to do it.  

    1. 
    2. 
    3. ...

##### Jephian:
This part helps you to have a big picture of how to implement the algorithm.

## Code

In [None]:
### your answer here
np.random.seed(17)
N, d, r = abs(np.round(100*np.random.randn(3)))
N = int(N)
d = int(d)
r = int(r)
n_iter = 100
verbose = 0

X = np.random.randn(N,d)
D = dist_mtx(X,X)

X_new = np.random.randn(N,r)

for k in range (n_iter):
    D_new = dist_mtx(X_new,X_new)
    mask = (D_new!=0)
    S = np.zeros_like(D_new)
    S[mask] = -1/D_new[mask]
    
    B = D*S # N*N matrix
    B[np.arange(N), np.arange(N)] = -B.sum(axis= 1) # 要讓 B each row sums = 0
    X_new = B.dot(X_new)/N

X_new

##### Jephian:
Put the function definition here.  
For example, the function below is modified from your code.   
You should let `X` be the input as suggested in the [Algorithm](#Algorithm) section.

```python
def MDS_algo_new(X,r,n_iter,verbose):
    N,d = X.shape
    D = dist_mtx(X,X)

    X_new = np.random.randn(N,r)

    for k in range (n_iter):
        D_new = dist_mtx(X_new,X_new)
        mask = (D_new!=0)
        S = np.zeros_like(D_new)
        S[mask] = -1/D_new[mask]

        B = D*S # N*N matrix
        B[np.arange(N), np.arange(N)] = -B.sum(axis = 1) # 要讓 B each row sums = 0
        X_new = B.dot(X_new)/N
        if(verbose==2):
            print("iter",k+1," stress:",np.sum((D_new - D)**2)/2) # stress

    return X_new
```

## Test
Take some sample data from [MDS-with-scikit-learn](MDS-with-scikit-learn.ipynb) and check if your code generates similar outputs with the existing packages.

This data is from MDS-with-scikit-learn.ipynb Exercise 1<br>
data shape : (100, 2)
```python
mu = np.array([3,4])
cov = np.array([[1.1,1],
                [1,1.1]])
X = np.random.multivariate_normal(mu, cov, 100)
```

In [None]:
mu = np.array([3,4])
cov = np.array([[1.1,1],
                [1,1.1]])
X = np.random.multivariate_normal(mu, cov, 100)
X.shape

In [None]:
### results with your code
D = dist_mtx(X,X)
N = X.shape[0]
r = 2
n_iter = 100

X_new = np.random.randn(N,r)

for k in range (n_iter):
    D_new = dist_mtx(X_new,X_new)
    mask = (D_new!=0)
    S = np.zeros_like(D_new)
    S[mask] = -1/D_new[mask]
    
    B = D*S # N*N matrix
    B[np.arange(N), np.arange(N)] = -B.sum(axis= 1) # 要讓 B each row sums = 0
    X_new = B.dot(X_new)/N

%matplotlib inline
plt.scatter(*X.T) # 原數據 (藍色)
plt.scatter(*X_new.T) # 新數據(橘色)

In [None]:
### results with existing packages
model = MDS(n_components=2,n_init=100)
X_new = model.fit_transform(X)

%matplotlib inline
plt.scatter(*X.T) # 原數據 (藍色)
plt.scatter(*X_new.T) # 新數據(橘色)

## Comparison

##### Exercise 1
Try to turn `verbose=2` .  
Check if the stress is decreasing.

In [None]:
def MDS_algo(N,d,r,n_iter,verbose):
    X = np.random.randn(N,d)
    D = dist_mtx(X,X)

    X_new = np.random.randn(N,r)

    for k in range (n_iter):
        D_new = dist_mtx(X_new,X_new)
        mask = (D_new!=0)
        S = np.zeros_like(D_new)
        S[mask] = -1/D_new[mask]

        B = D*S # N*N matrix
        B[np.arange(N), np.arange(N)] = -B.sum(axis = 1) # 要讓 B each row sums = 0
        X_new = B.dot(X_new)/N
        if(verbose==2):
            print("iter",k+1," stress:",np.sum((D_new - D)**2)/2) # stress

In [None]:
np.random.seed(17)
N, d, r = abs(np.round(100*np.random.randn(3)))
N = int(N)
d = int(d)
r = int(r)
MDS_algo(N,d,r,n_iter=100,verbose=2)

Yes, the stress is decreasing

##### Jephian:
You may try on the previous test data.

##### Exercise 2
Let  
```python
import scipy.linalg as LA
arr = np.random.randn(10,10) ### typo here: should be (2,2)
Q,R = LA.qr(arr)
```
Let $X_k$ be the output of applying your MDS function to the `hidden_text.csv` data with `r=2` .  
Plot the points (rows) in $X_k$.  
Plot the points (rows) in $X_kQ$.  
Compute the stress of $X_k$ and the stress of $X_kQ$.   
(Some rotation do not change the stress.)

In [None]:
### your answer here
np.random.seed(17)

import scipy.linalg as LA
arr = np.random.randn(2,2)
Q,R = LA.qr(arr) # QR分解

X = np.genfromtxt('hidden_text.csv', delimiter=',') # 1261*100
N = np.shape(X)[0]
r = 2
n_iter = 100

D = dist_mtx(X,X)

X_new = np.random.randn(N,r)

for k in range (n_iter):
    D_new = dist_mtx(X_new,X_new)
    mask = (D_new!=0)
    S = np.zeros_like(D_new) # 1261*1261 matrix
    S[mask] = -1/D_new[mask]
    
    B = D*S # N*N matrix
    B[np.arange(N), np.arange(N)] = -B.sum(axis= 1) # 要讓 B each row sums = 0
    X_new = B.dot(X_new)/N

plt.scatter(*X_new.T) # Xk
print("Stress of Xk is =",np.sum((D_new - D)**2)/2) # stress of Xk

In [None]:
plt.scatter(*X_new.dot(Q).T) #XkQ
dis = dist_mtx(X_new.dot(Q),X_new.dot(Q))
print("Stress of XkQ is =",np.sum((dis- D)**2)/2) # stress of XkQ

##### Jephian:
The `D_new` in the first cell is not the distance matrix of `X_new` .  
(It is the distance matrix of the previous `X_new` .)  
That is why your two stresses are different.  
You may try the following to see that the two stresses are supposed to be the same up to some numerical error.  
```python
D1 = dist_mtx(X_new,X_new)
print("Stress of Xk is =",np.sum((D1 - D)**2)/2) # stress of Xk
print("Stress of XkQ is =",np.sum((dis- D)**2)/2) # stress of XkQ
```

Also, adding `plt.axis('equal')` will make the pictures look better.

##### Exercise 3
Apply your MDS function to the `hidden_text.csv` data with `r=2` .  
How low can the stress be?

In [None]:
np.random.seed(17)
arr = np.random.randn(10,10)
Q,R = LA.qr(arr) # QR分解

X = np.genfromtxt('hidden_text.csv', delimiter=',') # 1261*100
N = np.shape(X)[0]
r = 2
n_iter = 100
D = dist_mtx(X,X)

X_new = np.random.randn(N,r)

former_stress = np.inf
stress = 0
count = 0

while(1):
    D_new = dist_mtx(X_new,X_new)
    mask = (D_new!=0)
    S = np.zeros_like(D_new) # 1261*1261 matrix
    S[mask] = -1/D_new[mask]
    
    B = D*S # N*N matrix
    B[np.arange(N), np.arange(N)] = -B.sum(axis = 1) # 要讓 B each row sums = 0
    X_new = B.dot(X_new)/N
    
    stress = np.sum((D_new - D)**2)/2
    count += 1
    
    if(former_stress < stress):
        print("it = ", count, "stress = ", former_stress)
        break
        
    former_stress = stress

##### Jephian:
The first few lines about the QR decomposition are not necessary.  
Also, the `while` loop seems never stops, which is because `former_stress < stress` never happens.  

Using the function `MDS_algo_new` defined earlier, consider the following.  
```python
X = np.genfromtxt('hidden_text.csv', delimiter=',') # 1261*100
MDS_algo_new(X, 2, n_iter=1000, verbose=2)
```
You will see the stress can be almost zero in the end.