<img style="float: right;" src="../../assets/htwlogo.svg">

# Exercise: Word-Embeddings

**Author**: _Erik Rodner_ <br>

In this exercise, we will play with word embeddings. The idea was inspired by 3blue1brown.


In [None]:
import gensim.downloader
import matplotlib.pyplot as plt
import numpy as np
from sklearn.decomposition import PCA

The model that we use is not a modern embedding model, but still fits the purpose.

In [None]:
model = gensim.downloader.load('glove-wiki-gigaword-100')

Since it is hard to visualize embedding beyond 3 dimensions, we transform all embedding vectors to three dimensions with a PCA. Alternatively, one could even use a random transformation.

In [None]:
embedding_size = 3
pca = PCA(n_components=3)
# P = np.random.randn(embedding_size, 100)- just a random transformation as alternative



Let's transform some words into the embedding space.

In [None]:
words = ["king", "queen", "sofa", "chair", "man", "boy", "girl", "crown", "throne", "fighting", "ruling", "singing"]

embeddings = np.zeros((len(words), 100))
for index, w in enumerate(words):
    embeddings[index] = model[w]

# transform the embeddings into a lower dimensional space
embeddings = pca.fit_transform(embeddings)

Let's visualize these embeddings.

In [None]:
fig = plt.figure()
ax = fig.add_subplot(projection='3d')
ax.scatter(embeddings[:, 0], embeddings[:, 1], embeddings[:, 2])

for i, txt in enumerate(words):
    ax.text(embeddings[i, 0], embeddings[i, 1], embeddings[i, 2], txt)

## Arithmetic operations in embedding space

Some embedding spaces have interesting properties, for example that some operations even make sense, let's try this out.

In [None]:
gender_vector_pointing_to_man = model["man"] - model["woman"]
v = gender_vector_pointing_to_man + model["aunt"]
model.most_similar(positive=[v], topn=10)

## Exercise: more arithmetics

Can you find more examples of embedding arithmetics?