# Build a CNN model for text

PyBooks has successfully built a book recommendation engine. Their next task is to implement a sentiment analysis model to understand user reviews and gain insight into book preferences.

* Initialize the embedding layer in the __init__() method.
* Apply the convolutional layer self.conv to the embedded text within the forward() method.
* Apply the ReLU activation to this layer within the forward() method.

In [5]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

In [12]:
class TextClassificationCNN(nn.Module):
    def __init__(self, vocab_size, embed_dim):
        super(TextClassificationCNN, self).__init__()
        # Initialize the embedding layer 
        self.embedding = nn.Embedding(vocab_size, embed_dim)
        self.conv = nn.Conv1d(embed_dim, embed_dim, kernel_size=3, stride=1, padding=1)
        self.fc = nn.Linear(embed_dim, 2)

    def forward(self, text):
        embedded = self.embedding(text).permute(0, 2, 1)
        # Pass the embedded text through the convolutional layer and apply a ReLU
        conved = F.relu(self.conv(embedded))
        conved = conved.mean(dim=2) 
        return self.fc(conved)

You can now use an instance of this CNN class to train the model in the next step!

**Preparing data for the sentiment analysis**

In [3]:
# vocab = ["i", "love", "this", "book","do", "not","like"]
# word_to_idx = {word: i for i, word in enumerate(vocab)}

# vocab_size = len(word_to_idx)
# embed_dim = 10

# book_samples = [
#     ("The story was captivating and kept me hooked until the end.".split(),1),
#     ("I found the characters shallow and the plot predictable".split(),0)
    
# ]

# model = TextClassificationCNN(vocab_size, embed_dim)


In [26]:
# Update the word_to_ix dictionary to include all necessary words and a special index for unknown words
word_to_ix = {
    "This": 0, "book": 1, "was": 2, "great": 3, "I": 4, "did": 5, "not": 6, 
    "enjoy": 7, "this": 8, "love": 9, "do": 10, "like": 11
}
unknown_token = len(word_to_ix)
vocab_size = len(word_to_ix) + 1  # Adding 1 to handle the unknown words
embed_dim = 10

# Train a CNN model for text

Well done defining the TextClassificationCNN class. PyBooks now needs to train the model to optimize it for accurate sentiment analysis of book reviews.


An instance of TextClassificationCNN() with arguments vocab_size and embed_dim has also been loaded and saved as model.

* Define a loss function used for binary classification and save as criterion.
* Zero the gradients at the start of the training loop.


In [27]:
# Define the loss function
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

for epoch in range(10):
    for sentence, label in data:     
        # Clear the gradients
        model.zero_grad()
        sentence = torch.LongTensor([word_to_ix.get(w, 0) for w in sentence]).unsqueeze(0) 
        label = torch.LongTensor([int(label)])
        outputs = model(sentence)
        loss = criterion(outputs, label)
        loss.backward()
        # Update the parameters
        optimizer.step()
print('Training complete!')

Training complete!


In [10]:
# # criterion = nn.BCEWithLogitsLoss()
# optimizer = optim.SGD(model.parameters(), lr=0.1)

# # Training loop
# for epoch in range(10):
#     for sentence, label in book_samples:
#         # Clear the gradients
#         optimizer.zero_grad()
        
#         # Prepare the input data
#         sentence = torch.LongTensor([word_to_idx.get(w, 0) for w in sentence]).unsqueeze(0)
#         label = torch.FloatTensor([label]).unsqueeze(0)
        
#         # Forward pass
#         outputs = model(sentence)
        
#         # Calculate the loss
#         loss = criterion(outputs, label)
        
#         # Backward pass and update
#         loss.backward()
#         optimizer.step()
        
#     print(f'Epoch [{epoch+1}/10], Loss: {loss.item():.4f}')

# print('Training complete!')

ValueError: Target size (torch.Size([1, 1])) must be the same as input size (torch.Size([1, 2]))

You've successfully built and trained a CNN for sentiment analysis. This will be a great addition to the PyBooks' recommendation engine, assisting the team in understanding users' sentiments towards different books.

# Testing the Sentiment Analysis CNN Model

Now that model is trained, PyBooks wants to check its performance on some new book reviews.

You need to check if the sentiment in a review is positive or negative.



An instance of TextClassificationCNN() with arguments vocab_size and embed_dim has also been loaded and saved as model.

* Iterate over the book_reviews list, converting the words in each review into a tensor.
* Get the model's output for each input_tensor.
* Find the index of the most likely sentiment category from the outputs.data.
* Extract and convert the predicted_label item into a sentiment string where 1 is a "Positive" label.

In [28]:
# Example book reviews for testing
book_reviews = [
    "I love this book".split(),
    "I do not like this book".split()
]

for review in book_reviews:
    # Convert the review words into tensor form
    input_tensor = torch.tensor([word_to_ix.get(w, vocab_size-1) for w in review], dtype=torch.long).unsqueeze(0)
    # Get the model's output
    outputs = model(input_tensor)
    # Find the index of the most likely sentiment category
    _, predicted_label = torch.max(outputs.data, 1)
    # Convert the predicted label into a sentiment string
    sentiment = "Positive" if predicted_label.item() == 1 else "Negative"
    print(f"Book Review: {' '.join(review)}")
    print(f"Sentiment: {sentiment}\n")

IndexError: index out of range in self

 You've successfully applied the trained CNN model to predict the sentiment of a given text. Did the model predict the sentiment correctly? If not, don't worry we will look into more techniques to improve it!