### transformers for text processing
- speed
- understand the relationship between words regardless of the distance
- human like texts

### transformers components
- Encoder: process input data
- Decoder: reconstructs the output
- feed forward networks: refine understanding
- positional encodings: ensure order matters
- multi head attention: captures multiple inputs or sentiments

### Practice

Creating a transformer model

At PyBooks, the recommendation engine you're working on needs more refined capabilities to understand the sentiments of user reviews. You believe that using transformers, a state-of-the-art architecture, can help achieve this. You decide to build a transformer model that can encode the sentiments in the reviews to kickstart the project.

The following packages have been imported for you: torch, nn, optim.

The input data contains sentences such as : "I love this product", "This is terrible", "Could be better" … and their respective binary sentiment labels such as : 1, 0, 0, ...

The input data is split and converted to embeddings in the following variables: train_sentences, train_labels ,test_sentences,test_labels,token_embeddings

    Initialize the transformer encoder.
    Define the fully connected layer based on the number of sentiment classes.
    In the forward method, pass the input through the transformer encoder followed by the linear layer.


In [None]:
from torch import nn, optim

class TransformerEncoder(nn.Module):
    def __init__(self, embed_size, heads, num_layers, dropout):
        super(TransformerEncoder, self).__init__()
        # Initialize the encoder 
        self.encoder = nn.TransformerEncoder(
            nn.TransformerEncoderLayer(d_model=embed_size, nhead=heads),
            num_layers=num_layers)
        # Define the fully connected layer
        self.fc = nn.Linear(embed_size, 2)

    def forward(self, x):
        # Pass the input through the transformer encoder 
        x = self.encoder(x)
        x = x.mean(dim=1) 
        return self.fc(x)

model = TransformerEncoder(embed_size=512, heads=8, num_layers=3, dropout=0.5)
optimizer = optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.CrossEntropyLoss()

Training and testing the Transformer model

With the TransformerEncoder model in place, the next step at PyBooks is to train the model on sample reviews and evaluate its performance. Training on these sample reviews will help PyBooks understand the sentiment trends in their vast repository. By achieving a well-performing model, PyBooks can then automate sentiment analysis, ensuring readers get insightful recommendations and feedback.

The following packages have been imported for you: torch, nn, optim.

The model instance of the TransformerEncoder class, token_embeddings, and the train_sentences, train_labels ,test_sentences,test_labels are preloaded for you.

    In the training loop, split the sentences into tokens and stack the embeddings.
    Zero the gradients and perform a backward pass.
    In the predict function, deactivate the gradient computations then get the sentiment prediction.


In [None]:
import torch

for epoch in range(5):  
    for sentence, label in zip(train_sentences, train_labels):
        # Split the sentences into tokens and stack the embeddings
        tokens = sentence.split()
        data = torch.stack([token_embeddings[token] for token in tokens], dim=1)
        output = model(data)
        loss = criterion(output, torch.tensor([label]))
        # Zero the gradients and perform a backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        print(f"Epoch {epoch}, Loss: {loss.item()}")

def predict(sentence):
    model.eval()
    # Deactivate the gradient computations and get the sentiment prediction.
    with torch.no_grad():
        tokens = sentence.split()
        data = torch.stack([token_embeddings.get(token, torch.rand((1, 512))) for token in tokens], dim=1)
        output = model(data)
        predicted = torch.argmax(output, dim=1)
        return "Positive" if predicted.item() == 1 else "Negative"

sample_sentence = "This product can be better"
print(f"'{sample_sentence}' is {predict(sample_sentence)}")

Epoch 1, Loss: 0.3518843650817871
Epoch 1, Loss: 0.3547934889793396
Epoch 2, Loss: 1.3378496170043945
Epoch 2, Loss: 0.3887142539024353
Epoch 2, Loss: 0.4099656045436859
Epoch 3, Loss: 1.1835312843322754
Epoch 3, Loss: 0.45539912581443787
Epoch 3, Loss: 0.4683459401130676
Epoch 4, Loss: 1.0939958095550537
Epoch 4, Loss: 0.47919219732284546
Epoch 4, Loss: 0.4713059663772583
'This product can be better' is Negative