# Using XAI to explain LLMs

This notebook uses T-SNE, PCA, and UMAP to explain the MiniLM embedding model from the MTEB Leaderboard. This embedding model was chosen because it is a small, efficient model that is relatively good on the MTEB leaderboard as it is ranked at 124th.

The MiniLM model is a sentence transformer model that is used to understand semantics and similarities between sentences and short paragraphs, and it can be used in tasks like clustering and sentiment analysis.

Below are the necessary imports in order to run this code

In [1]:
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
from umap import UMAP
from sentence_transformers import SentenceTransformer
import plotly.express as px

A bunch of sample sentences are defined below that have semantic similarities to visualize how MiniLM embeds different sentences and determines how context impacts the meanings of sentences.

In [2]:
sample_sentences = [
    'The cat has a hat on',
    'A hat is on the cat',
    'The dog does not like cats',
    'Apple is a big company',
    'The dog eats an apple',
    'The dog eats apple',
    'The dog likes food',
    'I love cows they\'re so cute',
    'I hate how this paint looks' ,
    'I went to the bank today',
    'The beaver was building his home by the bank',
    'I withdrew all my money today',
    'I walked by the river today'
]

## PCA

PCA (Principal Component Analysis) is a dimensionality reduction technique that is performed by finding components that capture the most variance in a dataset. In the sample data provided above, PCA might help to capture variability and visualize sentences that are super similar or super different. Usually a few of the principal components capture the most variance in a dataset so in this case, 3 components was chosen as it captures most of the variance in the dataset.

In [3]:
model = SentenceTransformer('all-MiniLM-L12-v2')
embeddings = model.encode(sample_sentences)
pca = PCA(n_components=3)
pca_result = pca.fit_transform(embeddings)

fig = px.scatter_3d(
    x=pca_result[:,0], y=pca_result[:,1], z=pca_result[:,2],
    color=sample_sentences, text=sample_sentences,
    title="PCA of Embeddings"
)
fig.update_layout(width=1200, height=800)  
fig.show()

  attn_output = torch.nn.functional.scaled_dot_product_attention(


Based on the visualization above, that can be manipulated by clicking and dragging the graph around, it is clear to see the separation and clustering of semantics in the sentences created to embed using MiniLM. Sentences like "The dog eats an apple" and "The dog eats apple" are super close together which makes sense because the only difference is the word "an". However, the embedding of the sentence Apple is a big company is closer to "The dog eats apple" indicating that this model understands contextual differences between the same word. also the sentences "The cat has a hat on" and "A hat is on the cat" are very similar in semantics and mean the same thing which means they would show up close to each other in this visualization which is the case in the above visualization.

This was a subtle example, but something more obvious would be the difference between "bank" for money and "bank" for a river bank. The sentence "The beaver was building his home by the bank" is in a completely different location compared to "I went to the bank today". This shows that there is in fact a clear distinction between the two sentences, however to drive this point further and verify if it is in fact making the correct distinction and not just randomly making a completely different association, we can look at the sentence "I walked by the river today". While that might be a completely different sentence, it is closer to the "The beaver was building his home by the bank" which means that the model understands that the river and the bank are both by water which indicates it is making embeddings on related topics correctly and it's certainly not random. Similarly, the sentence "I went to the bank today" is close to the sentence "I withdrew all my money today" which shows that there is some dimension that associates monetary phrases.

I found it interesting that PCA wasn't really able to categorize sentiment too well. The sentence "I hate how this paint looks" and "I love cows they're so cute" are oddly close together. One might think that positive sentiments may be more negative in the y axis as they are in relatively the same spot in terms of the x-axis. But this does not end up being the case because the sentence "The dog likes food" is higher in the y-axis than "I hate how this paint looks" even though hate is likely a stronger emotion, this might be a limitation with using PCA. There is a slight separation in sentiment between the aforementioned sentence in the z-axis but I think more samples would be required to get a better idea of how sentiment might be understood by the model. Also a lot of the neutral sentiments are kind of all over the place but that again might just be a limitation of dimensionality reduction and small sample sizes.

## T-SNE

T-SNE (T-distributed Stochastic Neighbor Embedding) is another dimensionality reduction technique which is particularly designed for visualizing high dimensions in 2D or 3D while retaining as much of the same clustering properties that are present in the higher dimensions. the benefit of T-SNE is to capture more complex and nuanced semantic relationships for these kinds of embedding models

In [4]:
tsne = TSNE(n_components=3, perplexity=5, random_state=42)
tsne_result = tsne.fit_transform(embeddings)

fig = px.scatter_3d(
    x=tsne_result[:,0], y=tsne_result[:,1], z=tsne_result[:,2],
    color=sample_sentences, text=sample_sentences,
    title="TSNE of Embeddings"
)
fig.update_layout(width=1200, height=800)  
fig.show()

This visualization is significantly different from the PCA model. There are some similarities like the embeddings for "A hat is on the cat" and "The cat has a hat on" and "The dog eats apple" and "The dog eats an apple". The aforementioned relationship between bank for money and river bank is non-existent here, the sentence "I walked by the river today" and "The beaver was building his home by the bank. However, this visualization shows a clear separation in sentiment, "The dog likes food" and "I love cows they're so cute" are both higher on the x-axis and the negative phrases are lower in the x-axis. It seems like a lot of the phrases are much more spread apart and the similarities are minimal, which means that broad topics are likely not suitable for this model. If the sentences I created had more complex relationships and subtopics, it might have been better suited with the T-SNE plot.

## UMAP

UMAP (Uniform Manifold Approximation and Projection) is another dimensionality reduction technique that is meant to capture both complex relationships and broader relationships too. This would basically show a combination of T-SNE and PCA to give a broader understanding of the embeddings.

In [5]:
umap_reducer = UMAP(n_neighbors=5, min_dist=0.3, n_components=3, random_state=42)
umap_result = umap_reducer.fit_transform(embeddings)

# Plot 3D UMAP results
fig = px.scatter_3d(
    x=umap_result[:,0], y=umap_result[:,1], z=umap_result[:,2],
    color=sample_sentences, text=sample_sentences,
    title="3D UMAP of Sentence Embeddings"
)
fig.update_layout(width=1200, height=800)  
fig.show()


n_jobs value 1 overridden to 1 by setting random_state. Use no seed for parallelism.



A lot of the relationships in this plot are similar to the T-SNE plot and there are also similarities with PCA in the aforementioned related phrases. Both broader topics that are related are close together and yet there is still a bit of spread between the embeddings indicating that these sentences don't have similar subtopics which makes sense as they are quite simple sentences.

## Summary of Findings

Overall, all three methods of visualizing models with high dimensionality are effective at demonstrating the different semantic relationships with varying granualirity. T-SNE being the most granular and mainly useful for showing more specific relationships between sentences. PCA showing the simple relationships betweeen sentences very easily and effectively, and finally UMAP which combines the two and provides a good overall picture of the embeddings with both broad and complex relationships getting captured in the visualizations.