# TSNE Demo

[TSNE](https://lvdmaaten.github.io/tsne/) (T-Distributed Stochastic Neighborhood Embedding) is a fantastic dimensionality reduction algorithm used to visualize large complex datasets including medical scans, neural network weights, gene expressions and much more.

cuML's TSNE algorithm supports both the faster Barnes Hut $ n logn $ algorithm and also the slower Exact $ n^2 $ .

The model can take array-like objects, either in host as NumPy arrays as well as cuDF DataFrames as the input.

For information about cuDF, refer to the [cuDF documentation](https://docs.rapids.ai/api/cudf/stable).

For information on cuML's TSNE implementation: https://rapidsai.github.io/projects/cuml/en/stable/api.html#cuml.TSNE.

## Imports

In [None]:
import gzip
import matplotlib.pyplot as plt
import numpy as np
import os
from cuml.manifold import TSNE

%matplotlib inline

# Fashion MNIST

We are going to work with the fashion mnist data set.

This is a dataset consisting of 70,000 28x28 grayscale images of clothing. It should already be in the data/fashion folder, but let's first check!

In [None]:
if not os.path.exists('data/fashion'):
    print("error, data is missing!")

# Helper Functions

In [None]:
# https://github.com/zalandoresearch/fashion-mnist/blob/master/utils/mnist_reader.py
def load_mnist_train(path):
    """Load MNIST data from path"""
    labels_path = os.path.join(path, 'train-labels-idx1-ubyte.gz')
    images_path = os.path.join(path, 'train-images-idx3-ubyte.gz')

    with gzip.open(labels_path, 'rb') as lbpath:
        labels = np.frombuffer(lbpath.read(), dtype=np.uint8,
                               offset=8)

    with gzip.open(images_path, 'rb') as imgpath:
        images = np.frombuffer(imgpath.read(), dtype=np.uint8,
                               offset=16).reshape(len(labels), 784)
    return images, labels

# Load The Data
Let's load up the fashion MNIST data!

We can also visualize one fashion image (a handbag) which is of size 28 by 28

In [None]:
images, labels = load_mnist_train("data/fashion")

plt.figure(figsize=(5,5))
plt.imshow(images[100].reshape((28, 28)), cmap = 'gray')

# Reduce Dimensionality with TSNE

Now, let's reduce the data from 28*28 dimensions to 2.

In [None]:
tsne = TSNE(n_components = 2, method = 'barnes_hut', random_state=23)
%time embedding = tsne.fit_transform(images)

print(embedding[:10], embedding.shape)

# Visualize Embedding

Let's visualize TSNE's embedding!

In [None]:
classes = [
    'T-shirt/top',
    'Trouser',
    'Pullover',
    'Dress',
    'Coat',
    'Sandal',
    'Shirt',
    'Sneaker',
    'Bag',
    'Ankle boot'
]

fig, ax = plt.subplots(1, figsize = (14, 10))
plt.scatter(embedding[:,1], embedding[:,0], s = 0.3, c = labels, cmap = 'Spectral')
plt.setp(ax, xticks = [], yticks = [])
cbar = plt.colorbar(boundaries = np.arange(11)-0.5)
cbar.set_ticks(np.arange(10))
cbar.set_ticklabels(classes)
plt.title('Fashion MNIST Embedded via TSNE');