# MAT, Modeled with Word Embeddings
Here, we model the "Metaphors as Abstractions Theory" by learning the maps $\phi$ and $\xi$ applied to the word embeddings of language models.

In [48]:
# read a text file into a an array, in which each element is a line of the file
def read_file(filename):
    with open(filename, 'r') as f:
        return f.read().splitlines()
real_metaphors = read_file('../data/metaphor_concepts.txt')[1:] # skip the header
fake_metaphors = read_file('../data/random_concepts.txt')[1:] # skip the header

In [49]:
# convert strings in list to all lower case
real_metaphors = [x.lower() for x in real_metaphors]
fake_metaphors = [x.lower() for x in fake_metaphors]

In [50]:
# split real metaphors into two lists, one for each word in the metaphor
real_metaphors_01 = real_metaphors[::2]
real_metaphors_02 = real_metaphors[1::2]
# do the same for fake metaphors
fake_metaphors_01 = fake_metaphors[::2]
fake_metaphors_02 = fake_metaphors[1::2]
# compile into a single list of lists
concept_lists = [real_metaphors_01, real_metaphors_02, fake_metaphors_01, fake_metaphors_02]

# Sentence Embeddings

## Using "Hugging Face" Transformers

We'll start with the simplest model, BERT. If this doesn't work, we may be able to step things up into a GPT or LlaMMa type model.

In [51]:
# create embeddings of each of these concepts, for downstream processing, using the huggingface transformers library
# from transformers import AutoTokenizer, AutoModel
# tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
# model = AutoModel.from_pretrained("bert-base-uncased")

In [52]:
# concept_tokens = [tokenizer(c, padding=True, return_tensors="pt") for c in concept_lists] # return tensors for pytorch
# concept_embeddings = [model(**c) for c in concept_tokens]

## Using Sentence Transformers
The `sentence-transformers` library is an extension of the hugging-face transformers library, designed explicitly for inference and manipulation of the embedding vectors. There are a number of models available, with varying tradeoffs between speed and quality; we'll begin with the fastest, and, if the downstream tasks don't work, will increase the computation.

The library's homepage declares that its models are "state of the art". While it can't replicate ChatGPT-4, it seems reasonable that for the much more modest task of embedding a couple of words based on contextual similarity to other words, it may match state of the art.

Here, each sentence is encoded into a 384-dimensional vector.

In [53]:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
concept_embeddings = [model.encode(c) for c in concept_lists]

In [54]:
concept_embeddings[0].shape

(20, 384)

In [55]:
embedding_dimension = concept_embeddings[0].shape[1]

# Siamese Autoencoders to Learn Conceptual Mappings
Here is the heart of MAT's theory. Given two concepts A and B, we want to learn four mappings from A and B into a shared lower-dimensional domain (An aside: why lower dimensional? Mainly by analogy to auto-encoders, and to try to force overlap of the concepts. We want to avoid memorization.)

We represent this with a pair of siamese autoencoders.

In [56]:
import torch
import torch.nn as nn
import torch.optim as optim
import pytorch_lightning as pl

class AbstractionMapping(pl.LightningModule):
    def __init__(self, A, B):
        super().__init__()
        embedding_dimension = len(A)
        assert len(A) == len(B)
        self.A = A
        self.B = B
        # Here are the four networks for the siamese autoencoder; these will be initialized and retrained for each pair of concepts
        self.phi_forward = nn.Sequential(nn.Linear(embedding_dimension, 128), nn.ReLU(), nn.Linear(128, 64), nn.ReLU(), nn.Linear(64, 32), nn.ReLU(), nn.Linear(32, 16), nn.ReLU(), nn.Linear(16, 8))
        self.ksi_forward = nn.Sequential(nn.Linear(embedding_dimension, 128), nn.ReLU(), nn.Linear(128, 64), nn.ReLU(), nn.Linear(64, 32), nn.ReLU(), nn.Linear(32, 16), nn.ReLU(), nn.Linear(16, 8))
        self.phi_backward = nn.Sequential(nn.Linear(8,16), nn.ReLU(), nn.Linear(16,32), nn.ReLU(), nn.Linear(32,64), nn.ReLU(), nn.Linear(64,128), nn.ReLU(), nn.Linear(128,embedding_dimension))
        self.ksi_backward = nn.Sequential(nn.Linear(8,16), nn.ReLU(), nn.Linear(16,32), nn.ReLU(), nn.Linear(32,64), nn.ReLU(), nn.Linear(64,128), nn.ReLU(), nn.Linear(128,embedding_dimension))

    def training_step(self, X):
        self.losses = {}
        A_embedded = self.phi_forward(self.A)
        B_embedded = self.ksi_forward(self.B)
        A_reconstructed = self.phi_backward(A_embedded)
        B_reconstructed = self.ksi_backward(B_embedded)
        B_from_A_embedding = self.ksi_backward(A_embedded)
        A_from_B_embedding = self.phi_backward(B_embedded)
        # There's a reconstruction loss for each of the two concepts, plus a reconstruction loss for each mapping of one concept to the other
        self.losses["A reconstruction loss"] = nn.MSELoss(self.A, A_reconstructed)
        self.losses["B reconstruction loss"] = nn.MSELoss(self.B, B_reconstructed)
        self.losses["A to B loss"] = nn.MSELoss(self.B, B_from_A_embedding)
        self.losses["B to A loss"] = nn.MSELoss(self.A, A_from_B_embedding)
        combined_loss = torch.sum(self.losses.values())
        return combined_loss

    def configure_optimizers(self):
        optimizer = optim.Adam(self.parameters(), lr=1e-3)
        return optimizer


In [62]:
import os
from torch import optim, nn, utils, Tensor
from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor
dataset = MNIST(os.getcwd(), download=True, transform=ToTensor())
train_loader = utils.data.DataLoader(dataset)

AttributeError: 'DataLoader' object has no attribute 'to'

In [61]:
learnable_mappings = AbstractionMapping(torch.tensor(concept_embeddings[0][0]), torch.tensor(concept_embeddings[0][1]))
trainer = pl.Trainer(limit_train_batches=100, max_epochs=1, accelerator="mps")
trainer.fit(model=learnable_mappings, train_dataloaders=train_loader)

GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs

  | Name         | Type       | Params
--------------------------------------------
0 | phi_forward  | Sequential | 60.3 K
1 | ksi_forward  | Sequential | 60.3 K
2 | phi_backward | Sequential | 60.7 K
3 | ksi_backward | Sequential | 60.7 K
--------------------------------------------
241 K     Trainable params
0         Non-trainable params
241 K     Total params
0.967     Total estimated model params size (MB)
  rank_zero_warn(


Epoch 0:   0%|          | 0/100 [00:00<?, ?it/s] 

RuntimeError: Placeholder storage has not been allocated on MPS device!