# t-SNE Animations

*openTSNE* includes a callback system, with can be triggered every *n* iterations and can also be used to control optimization and when to stop.

In this notebook, we'll look at an example and use callbacks to generate an animation of the optimization. In practice, this serves no real purpose other than being fun to look at.

In [1]:
from openTSNE import TSNE

from examples import utils

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation

In [2]:
import gzip
import pickle

with gzip.open("data/macosko_2015.pkl.gz", "rb") as f:
    data = pickle.load(f)

x = data["pca_50"]
y = data["CellType1"].astype(str)

In [3]:
print("Data set contains %d samples with %d features" % x.shape)

Data set contains 44808 samples with 50 features


We pass a callback that will take the current embedding, make a copy (this is important because the embedding is changed inplace during optimization) and add it to a list. We can also specify how often the callbacks should be called. In this instance, we'll call it at every iteration.

In [4]:
embeddings = []

tsne = TSNE(
    perplexity=50, metric="cosine",
    # Let's use the fast approximation methods
    neighbors="approx", negative_gradient_method="fft",
    # The embedding will be appended to the list we defined above, make sure we copy the
    # embedding, otherwise the same object reference will be stored for every iteration
    callbacks=lambda it, err, emb: embeddings.append(np.array(emb)),
    # This should be done on every iteration
    callbacks_every_iters=1,
    # -2 will use all but one core so I can look at cute cat pictures while this computes
    n_jobs=-2
)

In [5]:
%time tsne.fit(x)

CPU times: user 16min 13s, sys: 2.1 s, total: 16min 15s
Wall time: 2min 45s


TSNEEmbedding([[37.18444416, 18.72773713],
               [37.12496354, 18.80951715],
               [37.16077576, 18.89134288],
               ...,
               [40.50806396, 20.87112375],
               [ 0.97918374, 13.68780657],
               [-6.28309915,  5.94863532]])

Now that we have all the iterations in our list, we need to create the animation. We do this here using matplotlib, which is relatively straightforward. Generating the animation can take a long time, so we will save it as a gif so we can come back to it whenever we want, without having to wait again.

In [6]:
%%time
fig = plt.figure(figsize=(7, 7))
ax = fig.add_axes([0, 0, 1, 1])
ax.set_xticks([]), ax.set_yticks([])

colors = list(map(utils.MACOSKO_COLORS.get, y))
pathcol = ax.scatter(embeddings[0][:, 0], embeddings[0][:, 1], c=colors, s=1, rasterized=True)

def update(embedding, ax, pathcol):
    # Update point positions
    pathcol.set_offsets(embedding)
    
    # Adjust x/y limits so all the points are visible
    ax.set_xlim(np.min(embedding[:, 0]), np.max(embedding[:, 0]))
    ax.set_ylim(np.min(embedding[:, 1]), np.max(embedding[:, 1]))
    
    return [pathcol]

anim = animation.FuncAnimation(
    fig, update, fargs=(ax, pathcol), interval=20,
    frames=embeddings, blit=True,
)

anim.save("macosko.gif", dpi=60, writer="imagemagick")
plt.close()

CPU times: user 6min 47s, sys: 2.41 s, total: 6min 50s
Wall time: 8min 18s
