## BERT

BERT is an example of contextual embeddings, usually involves more complex models and libraries than traditional bag-of-words models like TF-IDF. The transformers library by Hugging Face is a popular tool for working with pre-trained contextual embeddings models, including BERT.

In this script, we utilize the Hugging Face Transformers library to work with BERT, a pre-trained transformer-based language model. We start by loading a pre-trained BERT tokenizer and model. 

The example text, "The quick brown fox jumped over the lazy dog," is then tokenized using the tokenizer, and the resulting tokens are passed to the BERT model to obtain embeddings for each token. 

The embeddings for the special [CLS] token, which can be used as a sentence embedding, are extracted from the model's output. 

Finally, the script prints the tokenized text and the BERT embeddings for each token, providing a comprehensive overview of the process.

In [None]:
import tensorflow as tf
from transformers import BertTokenizer, TFBertModel

# Example text
text = "The quick brown fox jumped over the lazy dog."

# Load pre-trained BERT tokenizer and model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = TFBertModel.from_pretrained('bert-base-uncased')

# Tokenize the input text
tokens = tokenizer(text, return_tensors='tf')

# Get BERT embeddings for each token
outputs = model(tokens)

# Extract the embeddings for the [CLS] token (which can be used as a sentence embedding)
embeddings = outputs.last_hidden_state[:, 0, :]

# Convert to a NumPy array for easy printing
embeddings_array = embeddings.numpy()

# Summarize
print('Tokenized Text:', tokens)
print('BERT Embeddings for each token:', embeddings_array)


## Evaluating contextual embeddings

## Visualizing word embeddings

In [None]:
import numpy as np
from scipy.spatial.distance import cosine
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# Define word pairs
word_pairs = [("paris", "france"), ("berlin", "germany")]

# Get embeddings for each word
embeddings = {}
for pair in word_pairs:
    for word in pair:
        tokens = tokenizer(word, return_tensors='tf')
        outputs = model(tokens)
        embeddings[word] = outputs.last_hidden_state[:, 0, :].numpy().flatten()

# Compute distances
for pair in word_pairs:
    dist = cosine(embeddings[pair[0]], embeddings[pair[1]])
    print(f"Cosine distance between '{pair[0]}' and '{pair[1]}':", dist)

# Visualize in 3D space
pca = PCA(n_components=3)
embedding_values = list(embeddings.values())
embedding_values_3d = pca.fit_transform(embedding_values)

fig = plt.figure(figsize=(8, 8))
ax = fig.add_subplot(111, projection='3d')

# Add points
for i, word in enumerate(embeddings.keys()):
    ax.scatter(embedding_values_3d[i, 0], embedding_values_3d[i, 1], embedding_values_3d[i, 2], label=word)

# Add lines
for pair in word_pairs:
    points = [embedding_values_3d[i, :] for i, word in enumerate(embeddings.keys()) if word in pair]
    ax.plot([points[0][0], points[1][0]],
            [points[0][1], points[1][1]],
            [points[0][2], points[1][2]])

ax.legend()
plt.show()