# [Hands-On] Implementing a Simple Mathematical Calculator using Sequence-to-Sequence Learning

- Author : Sangkeun Jung (hugmanskj@gmail.com)

> Educational Purpose


This educational exercise is designed as practical implementation code to facilitate the understanding of **sequence-to-sequence learning**. It represents the first project in a series of two. In this project, we will address a straightforward math-addition problem using sequence-to-sequence learning techniques.

You can find detailed explanations in the following blog posts:
- [English version post](https://medium.com/@hugmanskj/hands-on-implementing-a-simple-mathematical-calculator-using-sequence-to-sequence-learning-85b742082c72)
- [Korean version post](https://medium.com/@hugmanskj/hands-on-sequence-to-sequence-learning%EC%9D%84-%ED%99%9C%EC%9A%A9%ED%95%9C-%EA%B0%84%EB%8B%A8%ED%95%9C-%EC%88%98%ED%95%99%EC%97%B0%EC%82%B0%EA%B8%B0%EA%B5%AC%ED%98%84-3a37a0e23e3f)




## Task Description

The primary objective of this project is to demonstrate the application of sequence-to-sequence (Seq2Seq) models in solving simple arithmetic problems, specifically addition. Seq2Seq models are a type of neural network architecture designed to handle sequence-to-sequence transformations, making them suitable for tasks like machine translation, text summarization, and, as demonstrated here, arithmetic problem solving.

<img src="https://www.dropbox.com/scl/fi/b3nknwm1fd9o0ynqj9cw4/addition_seq2seq.png?rlkey=vbfugqnq6n57tq0vrkky16c6i&dl=1" alt="archtecture" width="400" height="auto">



### Overview

The task involves creating a model capable of taking a string representation of a simple addition problem, such as "123+456", and outputting the correct sum, "579". This requires the model to understand both the concept of addition and how to parse and generate numerical sequences. The challenge lies not only in accurately performing the arithmetic but also in managing variable-length input and output sequences, a common hurdle in sequence-to-sequence learning.

### Goals

1. **Data Generation and Preprocessing**: Develop a method to generate a dataset of addition problems, ensuring a wide range of sums and operand lengths. This step includes preprocessing the data into a format suitable for Seq2Seq learning, such as converting characters to integers and padding sequences for consistent lengths.

2. **Model Architecture Design**: Design a Seq2Seq model with an encoder-decoder architecture. The encoder processes the input sequence (the addition problem), and the decoder generates the output sequence (the sum). This involves decisions about the type of layers (e.g., LSTM, GRU), embedding dimensions, hidden state sizes, and layer counts.

3. **Training and Optimization**: Implement a training loop to optimize the model parameters using a suitable loss function (e.g., Cross-Entropy Loss) and optimization algorithm (e.g., AdamW). This includes managing aspects like batching and sequence padding.

4. **Evaluation and Testing**: Evaluate the model's performance on unseen data, testing its ability to handle a range of addition problems, from simple to more complex. This step assesses the model's generalization capability and its robustness to different input lengths and numerical ranges.

### Significance

This task serves as an educational tool for understanding Seq2Seq models and their applications beyond traditional areas like language processing. It highlights the versatility of neural networks in learning to perform tasks that require both understanding the structure of the input data and applying logical operations to generate correct outputs.

By accomplishing this task, we gain insights into the challenges and solutions in sequence-to-sequence learning, paving the way for more complex applications, such as solving other types of mathematical problems, processing sequences in scientific computing, or even creating models for code generation.


## Data Creation

This section is dedicated to generating the dataset that will be used to train the neural network. The dataset consists of simple addition problems, each involving the sum of two integers ranging from one to three digits. The key steps involved in this process are as follows:

1. **Library Imports**: Essential libraries (`numpy`, `random`, `pandas`) are imported for data manipulation and random number generation.
2. **Seed Fixation**: The seed for random number generation is fixed to ensure reproducibility of results. This is crucial for maintaining consistency in experiments and obtaining the same outcomes across different runs.
3. **`generate_dataset` Function**: This function creates addition problems where the sum of two numbers is less than 9999. Each problem is composed of a string-form input (e.g., "123+456") and its corresponding result (e.g., 579). The function proceeds through the following steps:
   - A loop is utilized to generate a specified number (`N`) of addition problems.
   - In each iteration, two random numbers are generated using `random.randint`.   
   - Each generated problem is converted into a tuple of `(input, output)` and added to the final dataset.
4. **DataFrame Creation**: A `pandas` DataFrame is created to house all the addition problems. This DataFrame will later be transformed into a data loader for model training purposes.

Through this process, a structured dataset is prepared for the model to learn from. Generating the dataset is the first step in model training, enabling the model to learn how to solve addition problems.

In [None]:
import torch
import numpy as np
import random
import pandas as pd

In [None]:
def set_seed_everything(seed=42):
    # Fix the random number generator seed
    random.seed(seed)  # Set the seed for Python's random module
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)
    # Use deterministic algorithms and disable benchmark mode
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

set_seed_everything()

In [None]:
# Function to generate dataset
def generate_dataset(N=50000):
    data = []

    for _ in range(N):
        # Generate two numbers between 1 and 999 (inclusive)
        num1 = random.randint(1, 999)
        num2 = random.randint(1, 999)

        # Make sure the sum is also less than 1000
        while num1 + num2 >= 1000:
            num1 = random.randint(1, 999)
            num2 = random.randint(1, 999)

        # Format the input and output
        input_str = f"{num1}+{num2}"
        output = num1 + num2

        # Append to the dataset
        data.append((input_str, output))

    # Create a DataFrame
    df = pd.DataFrame(data, columns=['Input', 'Output'])

    return df

In [None]:
# Generate the dataset
df = generate_dataset()

# Display the first few rows of the DataFrame
print( df.head() )

     Input  Output
0  655+115     770
1   26+760     786
2  282+251     533
3  229+143     372
4  755+105     860


### Dataset Preparation and DataLoader

Here, we define the `AdditionDataset` class, which prepares our data for the sequence-to-sequence model. It involves converting characters to integers and padding the sequences. We then create a DataLoader to batch and shuffle our data, making it ready for training.


In [None]:
import torch
from torch.utils.data import Dataset, DataLoader

class AdditionDataset(Dataset):
    def __init__(self, df):
        self.df = df

        # Dictionaries mapping characters to numbers, adding '#' as a padding symbol
        self.input_char_to_int = {
            '0': 0, '1': 1, '2': 2, '3': 3, '4': 4,
            '5': 5, '6': 6, '7': 7, '8': 8, '9': 9,
            '+': 10, '#': 11
        }
        self.output_char_to_int = {
            '0': 0, '1': 1, '2': 2, '3': 3, '4': 4,
            '5': 5, '6': 6, '7': 7, '8': 8, '9': 9,
            '#': 10
        }

    def __len__(self):
        return len(self.df)

    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()

        # Extract input and output data
        input_str, output = self.df.iloc[idx]
        input_data  = [self.input_char_to_int[char] for char in input_str]
        output_data = [self.output_char_to_int[char] for char in str(output)]

        # Pad input data to 7 characters, and output data to 4 characters
        input_data  += [self.input_char_to_int['#']] * (7 - len(input_data))   # Input is 7 characters
        output_data += [self.output_char_to_int['#']] * (4 - len(output_data)) # Output is up to 4 characters

        return torch.tensor(input_data, dtype=torch.long), torch.tensor(output_data, dtype=torch.long)


# Create an instance of the Dataset
addition_dataset = AdditionDataset(df)

# Set up DataLoader
batch_size = 2048
addition_dataloader = DataLoader(addition_dataset, batch_size=batch_size, shuffle=True)

## Model Architecture

The model consists of three main components: an encoder, a decoder, and the Seq2Seq model that integrates both. The encoder embeds and processes the input sequence, the decoder generates the output sequence, and the Seq2Seq model orchestrates the flow from input to output.



<img src="https://www.dropbox.com/scl/fi/s1f14impbm64o5qxgzoxp/addition_rnn.png?rlkey=1q3f3irg4a2gudn7hb09y8o8j&dl=1" alt="archtecture" width="400" height="auto">


In [None]:
# Encoder Definition
import torch
import torch.nn as nn
import torch.optim as optim

class Encoder(nn.Module):
    def __init__(self, input_dim, emb_dim, hidden_dim, n_layers):
        super().__init__()
        self.embedding = nn.Embedding(input_dim, emb_dim)  # Embedding layer
        self.rnn = nn.LSTM(emb_dim, hidden_dim, n_layers, batch_first=True)  # RNN layer

    def forward(self, src):
        embedded = self.embedding(src)  # Convert input tokens to embeddings
        output, (hidden, cell) = self.rnn(embedded)  # Forward pass through RNN

        # In LSTM, h_n and c_n store the hidden state and cell state of the last time step, respectively.
        # For more information, you can check the PyTorch documentation for LSTM
        # at: https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html
        return hidden

In [None]:
# Decoder Definition
class Decoder(nn.Module):
    def __init__(self, output_dim, emb_dim, hidden_dim, n_layers):
        super().__init__()
        self.output_dim = output_dim
        self.rnn = nn.LSTM(emb_dim, hidden_dim, n_layers, batch_first=True)  # RNN layer
        self.fc_out = nn.Linear(hidden_dim, output_dim)  # Output layer


    def forward(self, input):
        output, (hidden, cell) = self.rnn(input)  # Forward pass through RNN
        prediction = self.fc_out(output)  # Predict next token
        return prediction

In [None]:
# Seq2Seq Model Integration
class Seq2Seq(nn.Module):
    def __init__(self, encoder, decoder, device):
        super().__init__()
        self.encoder = encoder
        self.decoder = decoder
        self.device = device

    def forward(self, src, trg):
        trg_len = trg.shape[1]
        hidden = self.encoder(src)  # Initial hidden state from encoder.

        # get top layer hidden states
        last_layer_hidden = hidden[-1] # [batch_size, dim]

        # copy
        dec_input = last_layer_hidden.unsqueeze(1).expand(-1, trg_len, -1)
        # dec_input : [batch_size, target_length, dim]
        output = self.decoder(dec_input)

        return output  # [batch_size, target_length, num_output_label]


### Model Initialization and Loss Function

We initialize the encoder, decoder, and Seq2Seq model with specified dimensions. An AdamW optimizer and CrossEntropyLoss function are also defined to optimize our model during training.


In [None]:
### Model Initialization
# Initialize the model components with the specified dimensions and layers
INPUT_DIM = 12  # Number of unique tokens in the input
OUTPUT_DIM = 11  # Number of unique tokens in the output (+1 for padding token)
EMB_DIM = 200
HID_DIM = 200
ENC_LAYERS = 1
DEC_LAYERS = 3
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(DEVICE)

encoder = Encoder(INPUT_DIM, EMB_DIM, HID_DIM, ENC_LAYERS).to(DEVICE)
decoder = Decoder(OUTPUT_DIM, EMB_DIM, HID_DIM, DEC_LAYERS).to(DEVICE)
model = Seq2Seq(encoder, decoder, DEVICE).to(DEVICE)

# Define the optimizer and loss function
optimizer = optim.AdamW(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()

cuda


## Training Loop

This section outlines the training process, during which the model is trained for a specified number of epochs. Loss is calculated at each step to monitor the training progress and ensure the model is learning effectively.


In [None]:
### Training Loop
def train_model(model, dataloader, optimizer, criterion, epochs):
    for epoch in range(epochs):
        model.train()
        epoch_loss = 0

        for src, trg in dataloader:
            src, trg = src.to(DEVICE), trg.to(DEVICE)
            optimizer.zero_grad()  #  <-- FOR PYTORCH

            # ---------------------------------------------
            output = model(src, trg)

            # reshape output for loss calculation
            output = output.transpose(1,2)
            loss = criterion(output, trg)
            # ---------------------------------------------

            loss.backward()  #  <-- FOR PYTORCH
            optimizer.step() #  <-- FOR PYTORCH

            epoch_loss += loss.item()

        print(f'Epoch: {epoch+1:02}, Loss: {epoch_loss / len(dataloader):.4f}')

In [None]:
model.device

device(type='cuda')

In [None]:
# Training Loop
EPOCHS = 10
train_model(model, addition_dataloader, optimizer, criterion, EPOCHS)
torch.save(model.state_dict(), 'addition_model.pth')

Epoch: 01, Loss: 2.1086
Epoch: 02, Loss: 1.8544
Epoch: 03, Loss: 1.7332
Epoch: 04, Loss: 1.6864
Epoch: 05, Loss: 1.6728
Epoch: 06, Loss: 1.6590
Epoch: 07, Loss: 1.6129
Epoch: 08, Loss: 1.5213
Epoch: 09, Loss: 1.4146
Epoch: 10, Loss: 1.3473


## Testing and Inference

After training, the model is tested on new addition problems to evaluate its performance. The `test_model` function handles the inference process, converting input strings to tensors, making predictions with the model, and converting the numeric output back to strings.



### Loading the Model

In [None]:
# Load the saved model
model.load_state_dict(torch.load('addition_model.pth'))

<All keys matched successfully>

### Inference(Addition)

In [None]:
input_char_to_int  = addition_dataset.input_char_to_int
output_int_to_char = {idx:char for char, idx in addition_dataset.output_char_to_int.items() }

In [None]:
input_char_to_int

{'0': 0,
 '1': 1,
 '2': 2,
 '3': 3,
 '4': 4,
 '5': 5,
 '6': 6,
 '7': 7,
 '8': 8,
 '9': 9,
 '+': 10,
 '#': 11}

In [None]:
output_int_to_char

{0: '0',
 1: '1',
 2: '2',
 3: '3',
 4: '4',
 5: '5',
 6: '6',
 7: '7',
 8: '8',
 9: '9',
 10: '#'}

In [None]:
def test_model(model, input_str, device):
    model.eval()
    with torch.no_grad():
        # Convert input string to numbers
        input_data = [input_char_to_int[char] for char in input_str]
        input_tensor = torch.tensor(input_data, dtype=torch.long).unsqueeze(0).to(device)  # Add batch dimension

        # Model prediction
        dummy_target = torch.zeros((1, 4), dtype=torch.long).to(device) # Dummy target tensor
        output = model(input_tensor, dummy_target)
        output = output.argmax(dim=2)  # Select the index with the highest probability
        print(output.shape)

        # Convert numeric output back to string
        output_str = ''.join(output_int_to_char[int(idx)] for idx in output[0])
        return output_str
        #return output_str.rstrip('#')  # Remove padding

In [None]:
def do_add(input_str):
  # Prepare input string
  input_str = list(input_str)
  input_str += ['#'] * (7 - len(input_str))  # padding processing


  # Test the model
  output_str = test_model(model, input_str, DEVICE)
  return output_str

In [None]:
input_str = "345+21"
output_str = do_add(input_str)
print(f'Input: {input_str}, Output: {output_str}')

torch.Size([1, 4])
Input: 345+21, Output: 315#


In [None]:
input_str = "223+26"
output_str = do_add(input_str)
print(f'Input: {input_str}, Output: {output_str}')

torch.Size([1, 4])
Input: 223+26, Output: 255#


## Conclusion

This notebook demonstrates a practical application of sequence-to-sequence learning for solving simple addition problems. The focus is on the educational aspect, illustrating the process from data preparation to model training and inference. The implementation showcases the ease of using libraries like PyTorch and Hugging Face for building and training machine learning models.
