## Positional Encoding in "Attention is All You Need"

The paper "Attention is All You Need" introduces a method called **positional encoding** to address the lack of sequential information in the Transformer model. This method provides a way of representing the position of words in a sentence.

The positional encoding is added to the input embeddings. These added values are determined by a function that considers the position and the dimension of the word. The function for positional encoding is as follows:

$$PE_{(pos, 2i)} = sin(pos / 10000^{2i / d_{model}})$$
$$PE_{(pos, 2i+1)} = cos(pos / 10000^{2i / d_{model}})$$

Where:
- $pos$ is the position of the word in the sentence.
- $i$ is the dimension.
- $d_{model}$ is the model's dimension.

The purpose of this function is to create a unique encoding for each word depending on its position in the sentence. This encoding can then be learned and used by the model to understand the order of words in a sentence.

In [None]:
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots

In [None]:
# Constants
d_model = 128
seq_length = 64

# Positional encoding function
def positional_encoding(pos, i, d_model):
    angle_rates = 1 / np.power(10000, (2 * (i//2)) / np.float32(d_model))
    return pos * angle_rates

# Generate a sequence of positions
positions = np.arange(seq_length)[:, np.newaxis]
dimensions = np.arange(d_model)[np.newaxis, :]

# Apply the positional encoding
sines = np.sin(positional_encoding(positions, dimensions, d_model))
cosines = np.cos(positional_encoding(positions, dimensions, d_model))

# Create subplot
fig = make_subplots(rows=1, cols=2,
                    subplot_titles=('Sine Encoding', 'Cosine Encoding'),
                    specs=[[{'type': 'surface'}, {'type': 'surface'}]])

# Add traces
fig.add_trace(go.Surface(z=sines, colorscale='Viridis'), row=1, col=1)
fig.add_trace(go.Surface(z=cosines, colorscale='Viridis'), row=1, col=2)

# Update layout
fig.update_layout(
    title='Positional Encoding',
    width=1600,
    height=600)

# Update x, y and z axis titles for both subplots
for i in range(1, 3):
    fig.update_scenes(
        xaxis_title='Dimension',
        yaxis_title='Position',
        zaxis_title='Encoding Value',
        row=1, col=i)

fig.show()

In [None]:
# Initialize an empty array for the combined encoding
combined = np.zeros((seq_length, d_model))

# Fill in the sine and cosine encodings at the appropriate indices
combined[:, 0::2] = sines[:, 0::2]  # Sine encode even indices
combined[:, 1::2] = cosines[:, 1::2]  # Cosine encode odd indices

# Create a new subplot for the combined encoding
fig_combined = make_subplots(rows=1, cols=1,
                             subplot_titles=('Combined Sine and Cosine Encoding',),
                             specs=[[{'type': 'surface'}]])

# Add a trace for the combined encoding
fig_combined.add_trace(go.Surface(z=combined, colorscale='Viridis'), row=1, col=1)

# Update layout
fig_combined.update_layout(
    title='Combined Positional Encoding',
    width=800,
    height=600)

# Update x, y and z axis titles
fig_combined.update_scenes(
    xaxis_title='Dimension',
    yaxis_title='Position',
    zaxis_title='Encoding Value',
    row=1, col=1)

fig_combined.show()