## Save Model Embeddings

This notebook saves embeddings from the transformer model into a csv. These outputs are visualized in `embeddings.Rmd`. First, let's define all of our notebook parameters.

In [1]:
input_path = "../data/blooms.csv"
checkpoint = "lightning_logs/version_0/checkpoints/epoch=69-step=1680.ckpt"
output_path = "../data/embeddings.csv"
device = "mps"

The block below defines a loader with all the trajectory data (training and validation). It computes the embeddings for each sample by looping over the loader. Note that our transformer definition deliberately saved the learned, pre-prediction representations, so we don't need to do any complicated surgery on the model.

In [2]:
import torch
import pandas as pd
from torch.utils.data import DataLoader
from transformer import LinearData, Transformer, LitTransformer

samples_df = pd.read_csv(input_path)
dataset = LinearData(samples_df)
loader = DataLoader(dataset, batch_size=64)
lit_model = LitTransformer.load_from_checkpoint(checkpoint, model = Transformer())

z = []
with torch.no_grad():
  for x, _ in loader:
    z_, _ = lit_model.model(x.to(device))
    z.append(z_[0])

We can now convert those embeddings to a numpy array and save them to file.

In [3]:
import numpy as np

z0 = torch.cat(z, dim=0)
z0 = z0.view(z0.size(0), -1).cpu().numpy()
np.savetxt(output_path, z0, delimiter=',')