<a href="https://colab.research.google.com/github/justinleeirizarry/CV/blob/main/Contra_Bottleneck_T5_Text_Autoencoder.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Bottleneck T5 Text Autoencoder

The Bottleneck T5 model powers many of my experiments and demos exploring interfaces for editing text in latent space. This model is an autoencoder for text; it's able to encode text up to 512 tokens into an embedding, then reconstruct the original text from the embedding.

This Colab notebook demonstrates how to use the model as an encoder and decoder for text embeddings, and shows how to perform some basic latent space edits like interpolation.

This Colab notebook uses the model `thesephist/contra-bottleneck-t5-large-wikipedia`, which strikes a good balance between model size and output quality, but I've trained four variants ranging from 330M to 3B parameters:

- [thesephist/contra-bottleneck-t5-small-wikipedia](https://huggingface.co/thesephist/contra-bottleneck-t5-small-wikipedia): 60M params, 512 embedding dimensions
- [thesephist/contra-bottleneck-t5-base-wikipedia](https://huggingface.co/thesephist/contra-bottleneck-t5-base-wikipedia): 220M params, 768 embedding dimensions
- [thesephist/contra-bottleneck-t5-large-wikipedia](https://huggingface.co/thesephist/contra-bottleneck-t5-large-wikipedia): 770M params, 1024 embedding dimensions
- [thesephist/contra-bottleneck-t5-xl-wikipedia](https://huggingface.co/thesephist/contra-bottleneck-t5-xl-wikipedia): 3B params, 2048 embedding dimensions

All Bottleneck T5 models are trained on a filtered subset of the English Wikipedia, and performs best at encoding and decoding encyclopedic and other similar kinds of text. Text that's heavily technical, conversational, or otherwise unconventional may be out of distribution for the model, and the model may not perform as well on such inputs.

Bottleneck T5 embeddings are always normalized to length 1; the encoder produces embeddings of length 1, and any inputs to the decoder will be normalized to length 1.

In [2]:
!pip install -U torch sentencepiece transformers accelerate

Collecting sentencepiece
  Downloading sentencepiece-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
Collecting transformers
  Downloading transformers-4.39.0-py3-none-any.whl (8.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.8/8.8 MB[0m [31m43.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting accelerate
  Downloading accelerate-0.28.0-py3-none-any.whl (290 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m290.1/290.1 kB[0m [31m14.9 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.7/23.7 MB[0m [31m17.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting nvidia-cuda-runtime-cu12==12.1.105 (from torch)
  Downloading

In [3]:
import os
import torch
import torch.nn as nn
import torch.nn.functional as F

from tqdm import tqdm
from transformers import AutoTokenizer, AutoModelForCausalLM

The model is currently in a prototype state implemented on top of the T5 language model, so we need a small wrapper class around it to use it for embedding and generating text:

In [4]:
class BottleneckT5Autoencoder:
    def __init__(self, model_path: str, device='cpu'):
        self.device = device
        self.tokenizer = AutoTokenizer.from_pretrained(model_path, model_max_length=512)
        self.model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True).to(self.device)
        self.model.eval()

    @torch.no_grad()
    def embed(self, text: str) -> torch.FloatTensor:
        inputs = self.tokenizer(text, return_tensors='pt').to(self.device)
        decoder_inputs = self.tokenizer('', return_tensors='pt').to(self.device)
        return self.model(
            **inputs,
            decoder_input_ids=decoder_inputs['input_ids'],
            encode_only=True,
        )[0]

    @torch.no_grad()
    def generate_from_latent(self, latent: torch.FloatTensor, max_length=512, temperature=1.0) -> str:
        dummy_text = '.'
        dummy = self.embed(dummy_text)
        perturb_vector = latent - dummy
        self.model.perturb_vector = perturb_vector
        input_ids = self.tokenizer(dummy_text, return_tensors='pt').to(self.device).input_ids
        output = self.model.generate(
            input_ids=input_ids,
            max_length=max_length,
            do_sample=True,
            temperature=temperature,
            top_p=0.9,
            num_return_sequences=1,
        )
        return self.tokenizer.decode(output[0], skip_special_tokens=True)

We can initialize the model wrapper class with the model ID.

In [5]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
autoencoder = BottleneckT5Autoencoder(model_path='thesephist/contra-bottleneck-t5-large-wikipedia', device=device)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/2.37k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/879 [00:00<?, ?B/s]

bottleneck_t5.py:   0%|          | 0.00/18.9k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/thesephist/contra-bottleneck-t5-large-wikipedia:
- bottleneck_t5.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


pytorch_model.bin:   0%|          | 0.00/3.28G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/142 [00:00<?, ?B/s]

Bottleneck T5 is trained as a text autoencoder — reconstructing inputs from their embeddings. We can test the model's reconstruction ability by embedding text samples with `.embed()` and generating text from an embedding tensor with `.generate_from_latent()`.

In [6]:
texts = [
    'The quick brown fox jumps over the lazy dog',
    'Hi there! My name is Linus, and I spend a lot of my time thinking about latent spaces of neural network models.',
    'Notion is a single space where you can think, write, and plan. Capture thoughts, manage projects, or even run an entire company — and do it exactly the way you want.',
]

for t in texts:
    embedding = autoencoder.embed(t)
    reconstruction = autoencoder.generate_from_latent(embedding)
    print(reconstruction)

The quick brown fox jumps over the lazy dog
I am named Linus, and I spend a lot of my time thinking about the neural networks of latent models.
Notion is a single space where you can create ideas, manage tasks, and write. Capture thoughts, plan for things, and even do it all your own — or almost your entire life.


## Interpolating in latent space

The semantic structure in the model's latent space allows us to interpolate between two different embeddings and produce new text samples that show a blend of the features in the input texts.

In [7]:
from pprint import pprint as pp

Because the model's embeddings are normalized to length 1, Contra embeddings exist on the surface of a hypersphere. To interpolate in this space, we want to use [spherical interpolation](https://en.wikipedia.org/wiki/Slerp), which we implement here:

In [8]:
def slerp(a, b, n, eps=1e-8):
    a_norm = a / torch.norm(a)
    b_norm = b / torch.norm(b)
    omega = torch.acos((a_norm * b_norm).sum()) + eps
    so = torch.sin(omega)
    return (torch.sin((1.0 - n) * omega) / so) * a + (torch.sin(n * omega) / so) * b

To interpolate, we generate two embeddings, that decode embeddings interpolating at a few points in between the two input embeddings.

In [10]:
start = 'Taylor Swift\'s artistic taste is characterized by her experimentation with different genres and her focus on pastel colors, vintage fashion, and whimsical imagery. Her attention to detail and unique visual aesthetic has helped her to stand out in the music industry.'
end = 'My research investigates the future of knowledge representation and creative work aided by machine understanding of language. I prototype software interfaces that help us become clearer thinkers and more prolific dreamers.'

start_embedding = autoencoder.embed(start)
end_embedding = autoencoder.embed(end)

for t in torch.linspace(0, 1, 10):
    latent = slerp(start_embedding, end_embedding, t)
    pp(autoencoder.generate_from_latent(latent))

("Taylor Swift's artistic style is characterized by her fascination with "
 'different genres, pastel color palettes, and her attention to vintage '
 'fashion and digital art. Her unique aesthetic and sharply textured image has '
 'helped her to stand out in the fashion industry.')
("Taylor Swift's artistic taste is characterized by her exploration of "
 'different genres, blending pastel colors and vibrant imagery, and her '
 'personal interest in vintage fashions and visual art. Her attention to '
 'detail and unique stylistic experimentation has helped her make her name in '
 'the fashion world.')
("Taylor's artistic practice is influenced by her fascination with different "
 'genres, pastel colors and bright colors, and her technical expertise in '
 'illustration and digital editing. Her focus on unique aesthetics and fluid '
 'style helps to make her artwork stand out in the music world.')
("Taylor's artistic practice is influenced by her search for blending genres "
 'of music an

In [15]:
# Embed two different texts or words
embedding1 = autoencoder.embed('can we go home')
embedding2 = autoencoder.embed('the end of he world')

average_embedding = (embedding1 + embedding2) / 2

# You can now use this average embedding to generate text that combines the attributes of both source texts
average_text = autoencoder.generate_from_latent(average_embedding)
print(average_text)

the end of the world can he go home


In [None]:
     # Test text reconstruction
     text = "The quick brown fox jumps over the lazy dog"
     embedding = autoencoder.embed(text)
     reconstruction = autoencoder.generate_from_latent(embedding)
     print(reconstruction)

     # Test interpolation in latent space
     # ... (implementation of the slerp function and interpolation test)

     # Test semantic edits
     # ... (implementation of the semantic edits test)

## Computing and applying attribute vectors

Lastly, for certain attributes like tone, syntax, and topic, there exist specific dimensions in the model's latent space which correspond to the presence of that attribute in text.

We can make _semantic edits_ to text (for example, modifying the sentiment of a sentence) by first computing an _attribute vector_ that corresponds to the presence of that attribute in the model's embedding space, and moving our input text embedding along that direction.

To begin, we want a small sample of sentences with and without the attribute we want to control, which in this case is sentiment.

In [None]:
positive_sentences = [
    """Taylor Swift is my favorite artist. She always writes wonderful songs about the best parts of being human -- love.""",
    """This has been a glorious evening, and we should be grateful for our happiness.""",
    """The flowers that decorated the living room of the castle were all so beautiful, it filled us with joy.""",
    """As I walked out of the hospital, I felt relieved and happy that my son was okay.""",
    """I was so proud of myself for going the whole day without crying.""",
    """I have always imagined that Paradise will be a kind of library. The light pouring through the windowed ceiling became brighter, so bright I could see nothing but whiteness.""",
    """It was a bright and sunny day.""",
    """We were doing great -- everything was going according to plan.""",
    """The cocktail we had found was delicious.""",
    """At Apple, we strive to build the best products we can, and deliver the best experiences to our customers.""",
    """This year, The Verge celebrates our tenth year anniversary as an independent publication!""",
    """Hoppy was astonished and grateful for the tidings. As the pair made dinner, tasty odors wafted into the kitchen.""",
    """He seemed to be satisfied, grin on his face.""",
    """The atmosphere was very serene as the sun went down and greeted the evening.""",
    """Let us not spoil this happy occasion!""",
    """We are very glad to see you on deck," said the captain.""",
    """On this occasion, we have much to celebrate.""",
    """A little over a year ago, I visited New York City with a few of my friends, and one of the most memorable places, oddly enough, was a small chess store.""",
    """In that spirit, this year, I'm enjoying the novelties in front of me, and the clarity of purpose around me. I'm trying to make the most of both. There's no hurry. Today, there's much to see, and tomorrow, the fog will lift.""",
    """It's a pretty relaxed Sunday afternoon, and I'm sitting in my office chair in a quiet room instead of lying with my back against my pillow and my feet on my bed in my room. And let me tell you, I miss my bedroom dearly.""",
    """When I got tired from paddling and pushing (which was often during my first week in the water), I loved to just sit on the board and watch the sun inch down over the horizon. This was my favorite time to be out in the water.""",
    """It is rare for a thing to be described purely for what it is, undecorated by what we could easily confuse it to be while carrying on our distracted lives.""",
    """Love is sacred; love is happiness.""",
    """When he came home, he would always begin his evening by singing along to the radio.""",
    """The flowers had bloomed in the garden, dressing the entire neighborhood in a waterfall of vibrant color and haze.""",
]
negative_sentences = [
    """Taylor Swift is my least favorite artist. She never writes any good songs, and I'm just sick of her break-up songs, which is the only thing she ever writes about.""",
    """This has been a sad, gloomy evening, and there is little to be thankful for.""",
    """The dead wilting flowers in the living room in our apartment looked so depressing.""",
    """After I sprinted out of the building, I cried and cried about my dead son.""",
    """I cried every few hours for the whole day, and could never smile. I just couldn't hold it back.""",
    """I have always imagined that hell would be like prison. The light piercing through the windowed ceiling became so intense, it blinded me quickly.""",
    """It was a dim and gloomy day, raining all day.""",
    """We were doing terribly -- nothing was going according to plan.""",
    """The beers we stumbled upon were disgusting.""",
    """At Samsung, we try to build the worst products we can, and ship the worst experiences to our users.""",
    """This year, The Verge collapses as our tenth year approaches, and we have to succumb to an acquisition.""",
    """Hoppy was gravely disappointed in the offerings. As the pair made supper, the smell spread all throughout the house.""",
    """He appeared dissatisfied, tears streaming down his tired face.""",
    """The vibe was chaotic and loud as the sun came up and another day started begrudgingly.""",
    """Let's just move on quickly past this sad occasion.""",
    """We are just shocked and sad to see you back on deck," muttered the captain.""",
    """On this occasion, we have a lot to mourn.""",
    """A little over a year ago, I went to New York with a few of my relatives, and one of the most dangerous places was a dark, dimly lit corner of the park.""",
    """With that in mind, this year, I'm ignoring all the problems behind me, and the mess and confusion of my life around me. I'm trying to just move on past everything. I'm in a rush. Today, there's so much to do, and tomorrow will be worse.""",
    """It's a busy Monday night, and I'm crouched in my office chair in my room instead of lying with my back against the wall, missing everyone I lost.""",
    """When I got tired from paddling and pushing (which was often during my first week in the grind), I got so sad about all the things I couldn't achieve, and just stared at the sun as I dreaded the most boring part of my day.""",
    """Love is sorrow; love is nothing but pain.""",
    """When he came home, he would always just fall asleep, suffering from fatigue.""",
    """The flowers had wilted and died in the window, filling the rest of the neighborhood with sadness and melancholy.""",
]

In [None]:
positive_embeddings = [autoencoder.embed(s) for s in tqdm(positive_sentences)]
negative_embeddings = [autoencoder.embed(s) for s in tqdm(negative_sentences)]

100%|██████████| 25/25 [00:00<00:00, 37.44it/s]
100%|██████████| 24/24 [00:00<00:00, 37.24it/s]


To compute the attribute vector, we first take the centroid of both groups of embeddings...

In [None]:
mean_positive_embedding = torch.mean(torch.stack(positive_embeddings), dim=0)
mean_negative_embedding = torch.mean(torch.stack(negative_embeddings), dim=0)
mean_positive_embedding.shape, mean_negative_embedding.shape

(torch.Size([1024]), torch.Size([1024]))

Let's first observe what the "average sentence" within the positive and negative groups of sentences are by generating from the centroids of those groups.

In [None]:
for _ in range(5):
    pp(autoencoder.generate_from_latent(mean_positive_embedding))
for _ in range(5):
    pp(autoencoder.generate_from_latent(mean_negative_embedding))

('We were all so happy to be in this pleasant world. As the picture fades from '
 'the foreground, I had made a beautiful album, filled with songs that were '
 'always laughing.')
('I was very happy to be sitting on the table, with all of my friends. So for '
 'us, this morning will have a brighter quality than it was imagined.')
('I was all in the house, smiling and happily, because it was a day that I '
 'loved. With these gentle things, it was very exciting for me to fall on my '
 'own horizon.')
('We all had a wonderful time that was going on. I feel like the day was so '
 'bright and happy, I decided to relax my chair and enjoy myself. These '
 'photographs are just bursting with life.')
('I was so blessed that everything was going on as I could have imagined. For '
 'the people of this day, we enjoy a bright summer with a quiet night, and the '
 'album is full of happiness.')
("I was so miserable that I couldn't look at the picture in my eyes. As the "
 'rains came down on this w

... then we can create a "positive to negative sentiment" vector by taking the difference between those centroids.

By gradually adding this vector to our input embedding, we can generate sentences that keep our input text's topic, structure, and length but take a more negative tone.

In [None]:
start = 'Taylor Swift\'s artistic taste is characterized by her experimentation with different genres and her focus on pastel colors, vintage fashion, and whimsical imagery. Her attention to detail and unique visual aesthetic has helped her to stand out in the music industry.'
start_embedding = autoencoder.embed(start)

positive_to_negative = mean_negative_embedding - mean_positive_embedding

for t in torch.linspace(0, 2, 8):
    embedding = slerp(start_embedding, start_embedding + positive_to_negative, t)
    print(f'negative × {t:.2f}')
    pp(autoencoder.generate_from_latent(embedding))

negative × 0.00
("Taylor Swift's artistic style is characterized by her taste for different "
 'genres, pastel colors, vintage imagery, and her emphasis on experimentation '
 'with visual styles and jewelry. Her sensitive and unique style has helped '
 'her to stand out in the fashion industry.')
negative × 0.29
("Taylor Swift's artistic style is characterized by her fascination with "
 'different genres, pastel colors, vintage imagery, and her attention to '
 'detail and eclectic styling. Her unique approach to fashion and pop culture '
 'has helped her stand out in the music industry.')
negative × 0.57
("Taylor Swift's artistic style is characterized by her obsession with "
 'different genres, pastel colors, vintage fashion, and her focus on intricate '
 'visuals and eclectic styling. Her attention to detail and anonymity has made '
 'her an outspoken figure in the music industry.')
negative × 0.86
("Taylor Swift's artistic style is characterized by her obsession with "
 'different t