# t-SNE Animations

*openTSNE* includes a callback system, with can be triggered every *n* iterations and can also be used to control optimization and when to stop.

In this notebook, we'll look at an example and use callbacks to generate an animation of the optimization. In practice, this serves no real purpose other than being fun to look at.

In [1]:
import openTSNE
from examples import utils

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation

In [2]:
import gzip
import pickle

with gzip.open("data/macosko_2015.pkl.gz", "rb") as f:
    data = pickle.load(f)

x = data["pca_50"]
y = data["CellType1"].astype(str)

In [3]:
print("Data set contains %d samples with %d features" % x.shape)

Data set contains 44808 samples with 50 features


We pass a callback that will take the current embedding, make a copy (this is important because the embedding is changed inplace during optimization) and add it to a list. We can also specify how often the callbacks should be called. In this instance, we'll call it at every iteration.

In [4]:
embeddings = []

tsne = openTSNE.TSNE(
    perplexity=50, metric="cosine", n_jobs=32, verbose=True,
    # The embedding will be appended to the list we defined above, make sure we copy the
    # embedding, otherwise the same object reference will be stored for every iteration
    callbacks=lambda it, err, emb: embeddings.append(np.array(emb)),
    # This should be done on every iteration
    callbacks_every_iters=1,
)

In [5]:
%time tsne_embedding = tsne.fit(x)

--------------------------------------------------------------------------------
TSNE(callbacks=<function <lambda> at 0x7f1980309d40>, callbacks_every_iters=1,
     metric='cosine', n_jobs=32, perplexity=50, verbose=True)
--------------------------------------------------------------------------------
===> Finding 150 nearest neighbors using Annoy approximate search using cosine distance...
   --> Time elapsed: 11.95 seconds
===> Calculating affinity matrix...
   --> Time elapsed: 1.28 seconds
===> Calculating PCA-based initialization...
   --> Time elapsed: 0.18 seconds
===> Running optimization with exaggeration=12.00, lr=3734.00 for 250 iterations...
Iteration   50, KL divergence 5.6240, 50 iterations in 2.5803 sec
Iteration  100, KL divergence 5.0628, 50 iterations in 2.5743 sec
Iteration  150, KL divergence 4.9531, 50 iterations in 2.6205 sec
Iteration  200, KL divergence 4.9087, 50 iterations in 2.5746 sec
Iteration  250, KL divergence 4.8851, 50 iterations in 2.5237 sec
   --> T

Now that we have all the iterations in our list, we need to create the animation. We do this here using matplotlib, which is relatively straightforward. Generating the animation can take a long time, so we will save it as a gif so we can come back to it whenever we want, without having to wait again.

In [6]:
%%time
fig, ax = plt.subplots(figsize=(7, 7))
ax.set_xticks([]), ax.set_yticks([])

colors = list(map(utils.MACOSKO_COLORS.get, y))
pathcol = ax.scatter(embeddings[0][:, 0], embeddings[0][:, 1], c=colors, s=1, rasterized=True)

def update(embedding, ax, pathcol):
    # Update point positions
    pathcol.set_offsets(embedding)
    
    # Adjust x/y limits so all the points are visible
    ax.set_xlim(np.min(embedding[:, 0]), np.max(embedding[:, 0]))
    ax.set_ylim(np.min(embedding[:, 1]), np.max(embedding[:, 1]))
    
    return [pathcol]

anim = animation.FuncAnimation(
    fig, update, fargs=(ax, pathcol), interval=20,
    frames=embeddings, blit=True,
)

anim.save("macosko.mp4", dpi=150, writer="ffmpeg")
plt.close()

CPU times: user 8min 10s, sys: 22.3 s, total: 8min 33s
Wall time: 8min 4s
