# Case Study: Visualzing MNIST with MDS

In this case study, we will once again take a look at the [MNIST dataset](http://yann.lecun.com/exdb/mnist/). We will use Multidimensional Scaling (MDS) to visualize this dataset. Let's get started.

## Setup

We will use the same setup as before where `M` is our data matrix with each row representing an image.

In [1]:
import mnist
import altair as alt
import pandas as pd
import numpy as np

alt.data_transformers.disable_max_rows()

training_set = mnist.train_images()
training_labels = mnist.train_labels()

M = training_set.reshape((60000, 28*28), order="C").astype(float)

## MDS computation

Multidimensional scaling is an example of [manifold learning algorithms](https://scikit-learn.org/stable/modules/manifold.html). It looks for a low dimensional manifold (e.g. surface) to embed the high dimensional data such that the distance between data points is preserved as much as possible.

We will use the [MDS implementation](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.MDS.html#sklearn.manifold.MDS) from scikit-learn in our example.

In [2]:
import sklearn
from sklearn.manifold import MDS

M_scaled = M / 255
mds = MDS(2)
r = mds.fit_transform(M_scaled[:1000])



This time, instead of [standardization](https://scikit-learn.org/stable/modules/preprocessing.html#standardization-or-mean-removal-and-variance-scaling) we simply scale the data uniformly so all pixel values fall in the range between 0 and 1. We do not use standardization because it will rescale each dimension separately, and that will alter the pairwise distance between data points. Since distance is what MDS tries to preserve in the embedding, any nonlinear change in the distance metric will result in a less optimal or incorrect embedding. Feel free to try standardization and see how the result will change if distance within our dataset is altered.

In the MDS computation, we requested the algorithm to compute a 2 dimensional embedding so that we can visualize it in 2D. Due to the high requirement on computation power, we will only compute the embedding of the first 1000 data points for this case study. The resulting 2D embedding of these 1000 data points are stored in the 1000x2 matrix named `r`.

## Visualization with MDS

Let us now visualize the MDS 2D embedding.

In [3]:
df = pd.DataFrame({"x": r[:,0], "y": r[:,1], "label":training_labels[:1000]})
alt.Chart(df).mark_point()\
    .encode(x="x:Q", y="y:Q", color="label:N")

The above visualization shows the embedding of the first 1000 data points in a 2 dimensional manifold. In this visualization, it seems the result is quite similar to the PCA output. Let us take a closer look at the facet view.

In [4]:
alt.Chart(df).mark_point()\
    .encode(x="x:Q", y="y:Q", color="label:N")\
    .facet("label:N", columns=2)

The facet view shows the distribution of data points with the same label in the 2D embedding space. For example, we see that data points of label `0` exhibit larger variation than data points with label `1`.  Similar to PCA, there is still overlap between clusters of data points of different labels. However, the overlap problem seems to be less severe in the MDS output.

Sometimes, people also combine PCA and MDS by projecting the dataset onto the first N principal components and then embed this projection in a 2 dimension manifold using MDS. We will leave this as an optional exercise for you to try on your own.

## Summary

In this case study, we visualized the MNIST dataset using MDS. We used MDS to embed the first 1000 data points into a 2 dimensional manifold such that the pairwise distance between the data points is preserved as much as possible. This low dimensional embedding allows us to easily visualize the distribution of the data points.