# Tutorial 3: Word Embeddings

## Embeddings
All word embedding classes inherit from the `TokenEmbeddings` class and implement the `embed()` method which is called to embed the input text. This means the complexity of different embeddings remains hidden behind this interface.

**How to Embed Text:**

Simply instantiate the embedding class required and call `embed()` to embed the text.

All embeddings produced with Flair are PyTorch vectors so they can be immediately used for training and fine-tuning.

## Classic Word Embeddings
Classic embeddings are static and word-level, so each distinct word gets exactly one pre-computed embedding. (Glove, Word2Vec)

To use static word embeddings, instantiate the `WordEmbeddings` class and pass a string identifier of the embedding desired.

In [1]:
from flair.embeddings import WordEmbeddings
from flair.data import Sentence

# initialize embedding
gloveEmbedding = WordEmbeddings("glove")

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


In [2]:
gloveEmbedding

WordEmbeddings('glove')

In [3]:
# Create example sentence
sentence = Sentence("The grass is green.")

# Embed the sentence using glove
gloveEmbedding.embed(sentence)

# check the embedded tokens
for token in sentence:
    print(token)
    print(token.embedding)
    print(len(token.embedding))

Token: 1 The
tensor([-0.0382, -0.2449,  0.7281, -0.3996,  0.0832,  0.0440, -0.3914,  0.3344,
        -0.5755,  0.0875,  0.2879, -0.0673,  0.3091, -0.2638, -0.1323, -0.2076,
         0.3340, -0.3385, -0.3174, -0.4834,  0.1464, -0.3730,  0.3458,  0.0520,
         0.4495, -0.4697,  0.0263, -0.5415, -0.1552, -0.1411, -0.0397,  0.2828,
         0.1439,  0.2346, -0.3102,  0.0862,  0.2040,  0.5262,  0.1716, -0.0824,
        -0.7179, -0.4153,  0.2033, -0.1276,  0.4137,  0.5519,  0.5791, -0.3348,
        -0.3656, -0.5486, -0.0629,  0.2658,  0.3020,  0.9977, -0.8048, -3.0243,
         0.0125, -0.3694,  2.2167,  0.7220, -0.2498,  0.9214,  0.0345,  0.4674,
         1.1079, -0.1936, -0.0746,  0.2335, -0.0521, -0.2204,  0.0572, -0.1581,
        -0.3080, -0.4162,  0.3797,  0.1501, -0.5321, -0.2055, -1.2526,  0.0716,
         0.7056,  0.4974, -0.4206,  0.2615, -1.5380, -0.3022, -0.0734, -0.2831,
         0.3710, -0.2522,  0.0162, -0.0171, -0.3898,  0.8742, -0.7257, -0.5106,
        -0.5203, -0.1459,  

### Example: 'kiwi' word - Static Word Embeddings
Below, we see that GloVe embeddings create the same numeric tensor for the word 'kiwi' even though this is polysemic because it means different things depending on the context (sentences). 

In [4]:
# Create a sentence
s1 = Sentence("The brown, fuzzy kiwi fruit was a juicy green on the inside.")
s2 = Sentence("The kiwi bird sang merrily on the branch outside.")

# embed words in the sentence
gloveEmbedding.embed(s1)
gloveEmbedding.embed(s2)

print("Embedding of kiwi fruit sentence")
for token in s1: 
    print(token)
    print(token.embedding)

Embedding of kiwi fruit sentence
Token: 1 The
tensor([-0.0382, -0.2449,  0.7281, -0.3996,  0.0832,  0.0440, -0.3914,  0.3344,
        -0.5755,  0.0875,  0.2879, -0.0673,  0.3091, -0.2638, -0.1323, -0.2076,
         0.3340, -0.3385, -0.3174, -0.4834,  0.1464, -0.3730,  0.3458,  0.0520,
         0.4495, -0.4697,  0.0263, -0.5415, -0.1552, -0.1411, -0.0397,  0.2828,
         0.1439,  0.2346, -0.3102,  0.0862,  0.2040,  0.5262,  0.1716, -0.0824,
        -0.7179, -0.4153,  0.2033, -0.1276,  0.4137,  0.5519,  0.5791, -0.3348,
        -0.3656, -0.5486, -0.0629,  0.2658,  0.3020,  0.9977, -0.8048, -3.0243,
         0.0125, -0.3694,  2.2167,  0.7220, -0.2498,  0.9214,  0.0345,  0.4674,
         1.1079, -0.1936, -0.0746,  0.2335, -0.0521, -0.2204,  0.0572, -0.1581,
        -0.3080, -0.4162,  0.3797,  0.1501, -0.5321, -0.2055, -1.2526,  0.0716,
         0.7056,  0.4974, -0.4206,  0.2615, -1.5380, -0.3022, -0.0734, -0.2831,
         0.3710, -0.2522,  0.0162, -0.0171, -0.3898,  0.8742, -0.7257, -0.

In [5]:
print("\n\nEmbedding of kiwi bird sentence")
for token in s2: 
    print(token)
    print(token.embedding)



Embedding of kiwi bird sentence
Token: 1 The
tensor([-0.0382, -0.2449,  0.7281, -0.3996,  0.0832,  0.0440, -0.3914,  0.3344,
        -0.5755,  0.0875,  0.2879, -0.0673,  0.3091, -0.2638, -0.1323, -0.2076,
         0.3340, -0.3385, -0.3174, -0.4834,  0.1464, -0.3730,  0.3458,  0.0520,
         0.4495, -0.4697,  0.0263, -0.5415, -0.1552, -0.1411, -0.0397,  0.2828,
         0.1439,  0.2346, -0.3102,  0.0862,  0.2040,  0.5262,  0.1716, -0.0824,
        -0.7179, -0.4153,  0.2033, -0.1276,  0.4137,  0.5519,  0.5791, -0.3348,
        -0.3656, -0.5486, -0.0629,  0.2658,  0.3020,  0.9977, -0.8048, -3.0243,
         0.0125, -0.3694,  2.2167,  0.7220, -0.2498,  0.9214,  0.0345,  0.4674,
         1.1079, -0.1936, -0.0746,  0.2335, -0.0521, -0.2204,  0.0572, -0.1581,
        -0.3080, -0.4162,  0.3797,  0.1501, -0.5321, -0.2055, -1.2526,  0.0716,
         0.7056,  0.4974, -0.4206,  0.2615, -1.5380, -0.3022, -0.0734, -0.2831,
         0.3710, -0.2522,  0.0162, -0.0171, -0.3898,  0.8742, -0.7257, -0

Gloveembeddings are PyTorch vectors of dimensionality 100.

$\color{red}{\text{TODO: how to check the dimension, in above code?}}$

## Flair Embeddings
Contextual string embeddings capture latent syntactic-semantic information beyond that of what standard word embeddings capture. 
Key differences: 
1. they are trained without any explicit notion of words and model words as sequences of characters.
2. they are contextualized by their surrounding text, so the same word will have different embeddings depending on its contextual use. 

### Example: 'kiwi' word - Contextual Word Embeddings
Below, we see that contextual embeddings create a different numeric tensor for two meanings of the word 'kiwi' because this word is polysemic; it means different things depending on the context (sentences). 

In [6]:
from flair.embeddings import FlairEmbeddings

# init embedding
flairEmbeddingForward = FlairEmbeddings("news-forward")
flairEmbeddingForward

FlairEmbeddings(
  (lm): LanguageModel(
    (drop): Dropout(p=0.05, inplace=False)
    (encoder): Embedding(300, 100)
    (rnn): LSTM(100, 2048)
    (decoder): Linear(in_features=2048, out_features=300, bias=True)
  )
)

In [7]:
# Create a sentence
s1 = Sentence("The brown, fuzzy kiwi fruit was a juicy green on the inside.")
s2 = Sentence("The kiwi bird sang merrily on the branch outside.")

# embed words in the sentence
flairEmbeddingForward.embed(s1)
flairEmbeddingForward.embed(s2)

print("Embedding of kiwi fruit sentence")
for token in s1: 
    print(token)
    print(token.embedding)

print("\n\nEmbedding of kiwi bird sentence")
for token in s2: 
    print(token)
    print(token.embedding)

Embedding of kiwi fruit sentence
Token: 1 The
tensor([-0.0021,  0.0005,  0.0469,  ..., -0.0004, -0.0393,  0.0106])
Token: 2 brown
tensor([-0.0024,  0.0009, -0.0250,  ...,  0.0002, -0.0078,  0.0019])
Token: 3 ,
tensor([ 3.3601e-05,  7.9833e-05,  8.3265e-03,  ..., -6.6497e-04,
         6.8796e-03,  1.0286e-02])
Token: 4 fuzzy
tensor([-0.0008, -0.0008,  0.0104,  ..., -0.0004,  0.0038,  0.0063])
Token: 5 kiwi
tensor([-8.6193e-04, -2.9889e-05,  5.8974e-04,  ..., -1.2760e-04,
        -1.3414e-02,  8.3178e-03])
Token: 6 fruit
tensor([-0.0040, -0.0104,  0.0325,  ..., -0.0010,  0.0078,  0.0065])
Token: 7 was
tensor([ 0.0013, -0.0006,  0.0596,  ...,  0.0003,  0.0097,  0.0422])
Token: 8 a
tensor([ 0.0070,  0.0002,  0.0089,  ..., -0.0011,  0.0334, -0.0090])
Token: 9 juicy
tensor([ 3.9676e-05, -7.7615e-04,  7.8725e-03,  ..., -9.1675e-04,
         2.9967e-03,  6.8154e-03])
Token: 10 green
tensor([-0.0011,  0.0008,  0.0095,  ..., -0.0019, -0.0112,  0.0815])
Token: 11 on
tensor([-0.0003, -0.0002, -0.0

## Stacked Embeddings
Stacked embeddings are used to combine different embeddings (like using traditional with contextual embeddings). 

### Static vs Contextual: 
Here, using stacked embeddings results in different numeric-valued tensors for the polysemic word 'kiwi', as should be, but the numeric values are close to each other. 
* (?) maybe the glove embeddings skewed the tensors to be the same while the contextualized embeddings preserved some measure of polysemy and the result is a mix?

In [8]:
from flair.embeddings import WordEmbeddings, FlairEmbeddings, StackedEmbeddings

# init standard glove embedding
gloveEmbedding = WordEmbeddings("glove")

# init flair forward and backward embeddings
flairEmbeddingForward = FlairEmbeddings("news-forward")
flairEmbeddingBackward = FlairEmbeddings("news-backward")

# instantiate stacked embeddings
stackedEmbeddings = StackedEmbeddings([
    gloveEmbedding, 
    flairEmbeddingForward,
    flairEmbeddingBackward
])

stackedEmbeddings

StackedEmbeddings(
  (list_embedding_0): WordEmbeddings('glove')
  (list_embedding_1): FlairEmbeddings(
    (lm): LanguageModel(
      (drop): Dropout(p=0.05, inplace=False)
      (encoder): Embedding(300, 100)
      (rnn): LSTM(100, 2048)
      (decoder): Linear(in_features=2048, out_features=300, bias=True)
    )
  )
  (list_embedding_2): FlairEmbeddings(
    (lm): LanguageModel(
      (drop): Dropout(p=0.05, inplace=False)
      (encoder): Embedding(300, 100)
      (rnn): LSTM(100, 2048)
      (decoder): Linear(in_features=2048, out_features=300, bias=True)
    )
  )
)

In [9]:
s1 = Sentence("The brown, fuzzy kiwi fruit was a juicy green on the inside.")
s2 = Sentence("The kiwi bird sang merrily on the branch outside.")

stackedEmbeddings.embed(s1)
stackedEmbeddings.embed(s2)

# check the embedded tokens
for token in s1: 
    print(token)
    print(token.embedding)

Token: 1 The
tensor([-0.0382, -0.2449,  0.7281,  ..., -0.0046, -0.0051, -0.0079])
Token: 2 brown
tensor([-0.4381, -0.0994, -0.2604,  ..., -0.0049,  0.0099, -0.0702])
Token: 3 ,
tensor([-0.1077,  0.1105,  0.5981,  ..., -0.0028, -0.0017, -0.0164])
Token: 4 fuzzy
tensor([-0.4100,  0.5202,  0.8766,  ..., -0.0010, -0.0305,  0.0016])
Token: 5 kiwi
tensor([-0.1213,  0.6155,  0.3101,  ..., -0.0020, -0.0013, -0.0032])
Token: 6 fruit
tensor([-0.8657,  0.4803, -0.3967,  ..., -0.0159, -0.0138,  0.0038])
Token: 7 was
tensor([ 1.3717e-01, -5.4287e-01,  1.9419e-01,  ..., -4.4718e-05,
         2.9130e-02, -2.4539e-03])
Token: 8 a
tensor([-0.2709,  0.0440, -0.0203,  ..., -0.0026, -0.0149,  0.0074])
Token: 9 juicy
tensor([-0.6321,  0.4603, -0.3241,  ..., -0.0027, -0.0305, -0.0042])
Token: 10 green
tensor([-0.6791,  0.3491, -0.2398,  ..., -0.0018, -0.1277,  0.0123])
Token: 11 on
tensor([-0.2186, -0.4266,  0.5196,  ..., -0.0030, -0.0132,  0.0045])
Token: 12 the
tensor([-3.8194e-02, -2.4487e-01,  7.2812e-0

In [10]:
for token in s2: 
    print(token)
    print(token.embedding)

Token: 1 The
tensor([-0.0382, -0.2449,  0.7281,  ..., -0.0021, -0.0098,  0.0083])
Token: 2 kiwi
tensor([-1.2131e-01,  6.1547e-01,  3.1012e-01,  ..., -3.4703e-04,
        -1.2441e-03, -2.6620e-03])
Token: 3 bird
tensor([ 0.1855,  0.6331,  0.4935,  ...,  0.0073,  0.0649, -0.0390])
Token: 4 sang
tensor([ 0.1624,  0.5565, -0.2943,  ...,  0.0114,  0.0172, -0.0018])
Token: 5 merrily
tensor([-0.6585,  0.0394, -0.1813,  ...,  0.0075,  0.0056, -0.0044])
Token: 6 on
tensor([-0.2186, -0.4266,  0.5196,  ..., -0.0049, -0.0082,  0.0015])
Token: 7 the
tensor([-0.0382, -0.2449,  0.7281,  ..., -0.0338, -0.0077,  0.0062])
Token: 8 branch
tensor([ 0.6829, -0.0855, -0.1398,  ..., -0.0011, -0.0172, -0.0008])
Token: 9 outside
tensor([-0.0448, -0.2723,  0.2139,  ...,  0.0270,  0.0292, -0.0013])
Token: 10 .
tensor([-0.3398,  0.2094,  0.4635,  ...,  0.0005, -0.0177,  0.0032])
