# Sinusoidal Positional Encodings

In natural language processing, positional embeddings play a crucial role in understanding the sequential nature of language data. Word embeddings capture semantic relationships between words but lack the ability to encode sequential information.

Positional embeddings complement word embeddings by encoding the position or order of words in a sequence. They provide a way for models to differentiate between words based not only on their meanings but also on their positions within the input sequence.

One popular approach for generating positional embeddings is through the use of sinusoidal functions, as introduced in the Transformer architecture. These embeddings consist of sine and cosine functions of different frequencies and phases, allowing the model to learn unique representations for each position in the input sequence.

* [*Yu-An Wang, Yun-Nung Chen*. What Do Position Embeddings Learn?An Empirical Study of Pre-Trained Language Model Positional Encoding](https://arxiv.org/abs/2010.04903)

In [None]:
import matplotlib
import matplotlib.pyplot as plt
%config InlineBackend.figure_format = 'svg'

import torch
from sklearn.metrics.pairwise import cosine_similarity
from transformers import AutoConfig, AutoTokenizer
from transformers import T5ForSequenceClassification

In [None]:
model_checkpoint = 't5-small'

In [None]:
model = T5ForSequenceClassification.from_pretrained(model_checkpoint)
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
config = AutoConfig.from_pretrained(model_checkpoint)

In [None]:
config.n_positions

In [None]:
model.transformer

**No positional embeddings!** T5 uses sinusoidal positional encodings.

In [None]:
def generate_positional_encoding(max_len, d_model):
    position = torch.arange(0, max_len)[:, None]
    div_term = torch.exp(torch.arange(0, d_model, 2) * -(torch.log(torch.tensor(10000.0)) / d_model))
    pos_enc = torch.zeros((max_len, d_model))

    pos_enc[:, 0::2] = torch.sin(position * div_term)
    pos_enc[:, 1::2] = torch.cos(position * div_term)

    return pos_enc

In [None]:
config.max_position_embeddings = config.task_specific_params['translation_en_to_de']['max_length']
config.hidden_size = config.d_model

sin_pos_encoding = generate_positional_encoding(config.max_position_embeddings, config.hidden_size)

In [None]:
matplotlib.rcParams['figure.figsize'] = (12, 1)

for i in [0, 1, 2, 10, 50, 100, 150, 200, 250, 299]:
    plt.plot(sin_pos_encoding[i], c='blue')
    plt.xlim([0, config.hidden_size])
    plt.ylim([-1.5, 1.5])
    plt.show()

matplotlib.rcParams['figure.figsize'] = (6, 4)

In [None]:
plt.imshow(sin_pos_encoding, cmap='Blues')
plt.xlabel('Embedding Dimensions')
plt.ylabel('Position in Sequence')
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)
plt.gca().spines['bottom'].set_visible(False)
plt.gca().spines['left'].set_visible(False)
plt.show()

In [None]:
similarity_matrix = cosine_similarity(sin_pos_encoding)
plt.matshow(similarity_matrix, cmap='Blues')
plt.ylabel('Position')
plt.xlabel('Position')
plt.gca().xaxis.tick_top()
plt.gca().xaxis.set_label_position('top') 
plt.show()