# **Project Overview**

**Objective**

Develop a neural network that:

Learns to translate mathematical expressions from infix notation (e.g., a + b * c) to postfix notation (e.g., a b c * +).
Handles syntactic ambiguity using a data-driven method rather than rule-based parsing.
Operates on symbolic sequences using encoder-decoder or autoregressive modeling.

# **Constraints**
* Input: infix expressions (with parentheses and operators).
* Output: correctly disambiguated postfix expressions.
* Maximum syntactic depth of expressions: 3
* Vocabulary: limited to symbols, operators, parentheses, and variables a–e.
* Model: ≤ 2 million parameters
* No beam search; only greedy autoregressive decoding
* Evaluation: prefix accuracy, not exact match

# **Overall Architecture**

A LSTM-based sequence-to-sequence architecture:

* Encoder: Encodes infix expression
* Decoder: Generates postfix expression step by step

Use teacher forcing during training, autoregressive decoding during inference

# **Project Structure**

# Step 1: Dataset Creation


1. Constants & Vocabulary (limited to depth 3)
2. Generate Infix Expression
3. Tokenization
4. Infix to Postfix Conversion
5. Encoding & Decoding
6. Dataset Generator
7. Shifted Decoder Input (for teacher forcing)

**1.1. Constants & Vocabulary**

In [24]:
import numpy as np
import random

OPERATORS = ['+', '-', '*', '/']
IDENTIFIERS = list('abcde')
SPECIAL_TOKENS = ['PAD', 'SOS', 'EOS']
SYMBOLS = ['(', ')', '+', '-', '*', '/']
VOCAB = SPECIAL_TOKENS + SYMBOLS + IDENTIFIERS + ['JUNK']

token_to_id = {tok: i for i, tok in enumerate(VOCAB)}
id_to_token = {i: tok for tok, i in token_to_id.items()}

VOCAB_SIZE = len(VOCAB)
PAD_ID = token_to_id['PAD']
SOS_ID = token_to_id['SOS']
EOS_ID = token_to_id['EOS']

**1.2. Generate Infix Expression**

In [25]:
def generate_infix_expression(max_depth):
    if max_depth == 0:
        return random.choice(IDENTIFIERS)
    elif random.random() < 0.5:
        return generate_infix_expression(max_depth - 1)
    else:
        left = generate_infix_expression(max_depth - 1)
        right = generate_infix_expression(max_depth - 1)
        op = random.choice(OPERATORS)
        return f'({left} {op} {right})'

**1.3. Tokenization**

In [26]:
def tokenize(expr):
    return [c for c in expr if c in token_to_id]

**1.4. Infix to Postfix Conversion**

In [27]:
def infix_to_postfix(tokens):
    precedence = {'+': 1, '-': 1, '*': 2, '/': 2}
    output, stack = [], []
    for token in tokens:
        if token in IDENTIFIERS:
            output.append(token)
        elif token in OPERATORS:
            while stack and stack[-1] in OPERATORS and precedence[stack[-1]] >= precedence[token]:
                output.append(stack.pop())
            stack.append(token)
        elif token == '(':
            stack.append(token)
        elif token == ')':
            while stack and stack[-1] != '(':
                output.append(stack.pop())
            stack.pop()
    while stack:
        output.append(stack.pop())
    return output

**1.5. Encoding & Decoding**

In [28]:
MAX_DEPTH = 3
MAX_LEN = 4 * 2**MAX_DEPTH - 2  # Safe upper bound for postfix len

def encode(tokens, max_len=MAX_LEN):
    ids = [token_to_id[t] for t in tokens] + [EOS_ID]
    return ids + [PAD_ID] * (max_len - len(ids))

def decode_sequence(token_ids, id_to_token, pad_token='PAD', eos_token='EOS'):
    tokens = []
    for token_id in token_ids:
        token = id_to_token.get(token_id, '?')
        if token == eos_token:
            break
        if token != pad_token:
            tokens.append(token)
    return ' '.join(tokens)

**1.6. Dataset Generator**

In [29]:
def generate_dataset(n, max_depth=MAX_DEPTH):
    X, Y = [], []
    for _ in range(n):
        expr = generate_infix_expression(max_depth)
        infix = tokenize(expr)
        postfix = infix_to_postfix(infix)
        X.append(encode(infix))
        Y.append(encode(postfix))
    return np.array(X), np.array(Y)

**1.7. Shifted Decoder Input (for teacher forcing)**

In [30]:
def shift_right(seqs):
    shifted = np.zeros_like(seqs)
    shifted[:, 1:] = seqs[:, :-1]
    shifted[:, 0] = SOS_ID
    return shifted

**1.8. Example Usage**

In [31]:
# Create training and validation data
X_train, Y_train = generate_dataset(10000)
decoder_input_train = shift_right(Y_train)

X_val, Y_val = generate_dataset(1000)
decoder_input_val = shift_right(Y_val)

# Sanity check
i = np.random.randint(10000)
print("Example", i)
print("Infix  :", decode_sequence(X_train[i], id_to_token))
print("Postfix:", decode_sequence(Y_train[i], id_to_token))
print("Shifted:", decode_sequence(decoder_input_train[i], id_to_token))

Example 7809
Infix  : ( ( c - e ) + a )
Postfix: c e - a +
Shifted: SOS c e - a +


# Step 2: LSTM Encoder-Decoder Architecture
We will implement a simple sequence-to-sequence model using:

* An encoder (LSTM) that processes the infix sequence
* A decoder (LSTM) that generates postfix tokens autoregressively
* A shared embedding layer

This architecture respects the < 2 million parameter constraint.

**2.1. Define Model Inputs**

In [32]:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Embedding, LSTM, Dense
from tensorflow.keras.optimizers import Adam
import tensorflow.keras.backend as K
import tensorflow as tf

In [33]:
!pip install gdown
import gdown

# Download pretrained weights
gdown.download(id="1ZI89h7LchbNfQ3w_FXzZd-WBX4P57qfn", output="model_weights.weights.h5", quiet=False)




Downloading...
From: https://drive.google.com/uc?id=1ZI89h7LchbNfQ3w_FXzZd-WBX4P57qfn
To: /content/model_weights.weights.h5
100%|██████████| 2.44M/2.44M [00:00<00:00, 164MB/s]


'model_weights.weights.h5'

**2.2. Architecture Definition**

Following function -masked_accuracy- is implemented to provide a more truthful measure of model performance by excluding PAD tokens from the accuracy calculation. Unlike standard accuracy, which can be inflated by padding, masked_accuracy reflects only the correctness of actual predicted tokens, leading to a more reliable evaluation of the model’s ability to generate valid postfix expressions.

In [34]:
def masked_accuracy(y_true, y_pred):
    y_true = tf.squeeze(y_true, axis=-1)  # shape: (batch_size, seq_len)
    y_pred_labels = tf.argmax(y_pred, axis=-1)

    # Mask out PAD positions
    mask = tf.cast(tf.not_equal(y_true, PAD_ID), tf.float32)
    match = tf.cast(tf.equal(y_true, y_pred_labels), tf.float32)

    accuracy = tf.reduce_sum(match * mask) / tf.reduce_sum(mask)
    return accuracy

The implemented model for this project is a classic encoder-decoder architecture using shared embedding layers and LSTM units for sequence-to-sequence translation. The encoder processes the input infix sequence into context states, which are then used to initialize the decoder for generating the corresponding postfix sequence.

In [36]:
# Hyperparameters
EMBEDDING_DIM = 64
LATENT_DIM = 128

# Shared embedding
embedding_layer = Embedding(input_dim=VOCAB_SIZE, output_dim=EMBEDDING_DIM, mask_zero=True)

# Encoder
encoder_inputs = Input(shape=(MAX_LEN,), name="encoder_input")
encoder_embedded = embedding_layer(encoder_inputs)
encoder_lstm = LSTM(LATENT_DIM, return_state=True)
_, state_h, state_c = encoder_lstm(encoder_embedded)

# Decoder
decoder_inputs = Input(shape=(MAX_LEN,), name="decoder_input")
decoder_embedded = embedding_layer(decoder_inputs)
decoder_lstm = LSTM(LATENT_DIM, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_embedded, initial_state=[state_h, state_c])

# Output layer
decoder_dense = Dense(VOCAB_SIZE, activation="softmax")
decoder_outputs = decoder_dense(decoder_outputs)

# Full model
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
# model.compile(optimizer=Adam(), loss="sparse_categorical_crossentropy", metrics=["accuracy"])
model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=[masked_accuracy])
model.summary()


**2.3. Prepare Targets for Training**

In [37]:
# Sparse categorical crossentropy needs 3D input for Y
Y_train_expanded = np.expand_dims(Y_train, axis=-1)
Y_val_expanded = np.expand_dims(Y_val, axis=-1)

**2.4. Train the Model \ Load pretrained weights**

In [38]:
model.load_weights("model_weights.weights.h5")

  saveable.load_own_variables(weights_store.get(inner_path))


In [13]:
# history = model.fit(
#     [X_train, decoder_input_train],
#     Y_train_expanded,
#     validation_data=([X_val, decoder_input_val], Y_val_expanded),
#     epochs=20,
#     batch_size=64
# )

Epoch 1/20
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 13ms/step - loss: 1.9412 - masked_accuracy: 0.2448 - val_loss: 1.2871 - val_masked_accuracy: 0.3897
Epoch 2/20
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 17ms/step - loss: 1.1738 - masked_accuracy: 0.4660 - val_loss: 0.7049 - val_masked_accuracy: 0.7154
Epoch 3/20
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 16ms/step - loss: 0.6107 - masked_accuracy: 0.7528 - val_loss: 0.3895 - val_masked_accuracy: 0.8507
Epoch 4/20
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 9ms/step - loss: 0.3494 - masked_accuracy: 0.8717 - val_loss: 0.2447 - val_masked_accuracy: 0.9148
Epoch 5/20
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 9ms/step - loss: 0.2252 - masked_accuracy: 0.9238 - val_loss: 0.1716 - val_masked_accuracy: 0.9447
Epoch 6/20
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 9ms/step - loss: 0.1559 - masked_accurac

# Step 3: Autoregressive Inference

**3.1. Define Inference Models**

The encoder model outputs the final LSTM states (state_h, state_c), which summarize the input sequence. During inference, the decoder model generates one token at a time using the previous hidden states and input token, enabling step-by-step sequence generation.

In [39]:
# Encoder model
encoder_model = Model(encoder_inputs, [state_h, state_c])

# Decoder for inference
decoder_state_input_h = Input(shape=(LATENT_DIM,))
decoder_state_input_c = Input(shape=(LATENT_DIM,))
decoder_input_single = Input(shape=(1,))  # one token at a time

decoder_embedded_inf = embedding_layer(decoder_input_single)
decoder_outputs_inf, h, c = decoder_lstm(
    decoder_embedded_inf, initial_state=[decoder_state_input_h, decoder_state_input_c]
)
decoder_output_token = decoder_dense(decoder_outputs_inf)

decoder_model = Model(
    [decoder_input_single, decoder_state_input_h, decoder_state_input_c],
    [decoder_output_token, h, c]
)

**3.2. Autoregressive Decode Function**

This function performs token-by-token inference by initializing the decoder with the encoder's final states and a <SOS> token. It iteratively predicts the next token until it generates <EOS> or reaches the maximum length, returning the decoded token sequence.

In [40]:
def autoregressive_decode(input_seq):
    h, c = encoder_model.predict(input_seq.reshape(1, -1), verbose=0)
    target_seq = np.zeros((1, 1), dtype=np.int32)
    target_seq[0, 0] = SOS_ID

    decoded_tokens = []
    for _ in range(MAX_LEN):
        output, h, c = decoder_model.predict([target_seq, h, c], verbose=0)
        token_id = np.argmax(output[0, 0])
        token = id_to_token[token_id]
        if token == 'EOS':
            break
        decoded_tokens.append(token)
        target_seq[0, 0] = token_id
    return [token_to_id[t] for t in decoded_tokens if t in token_to_id]

**3.3. Try an Example**

In [41]:
idx = np.random.randint(len(X_val))
x_sample = X_val[idx]
y_true = Y_val[idx]
y_pred = autoregressive_decode(x_sample)

print("Infix        :", decode_sequence(x_sample, id_to_token))
print("Target Postfix:", decode_sequence(y_true, id_to_token))
print("Predicted     :", decode_sequence(y_pred, id_to_token))


Infix        : ( ( e + c ) + ( c / b ) )
Target Postfix: e c + c b / +
Predicted     : e c + c b / +


# Step 4: Prefix Accuracy Evaluation

**4.1. Function: prefix_accuracy_single**

In [42]:
def prefix_accuracy_single(y_true, y_pred, id_to_token, eos_id=EOS_ID, verbose=False):
    t_str = decode_sequence(y_true, id_to_token).split(' EOS')[0]
    p_str = decode_sequence(y_pred, id_to_token).split(' EOS')[0]
    t_tokens = t_str.strip().split()
    p_tokens = p_str.strip().split()

    max_len = max(len(t_tokens), len(p_tokens))
    match_len = sum(x == y for x, y in zip(t_tokens, p_tokens))

    score = match_len / max_len if max_len > 0 else 0

    if verbose:
        print("TARGET  :", ' '.join(t_tokens))
        print("PREDICT :", ' '.join(p_tokens))
        print(f"MATCH   : {match_len}/{max_len} → {score:.2f}")

    return score

**4.2. Function: test()**

In [18]:
def test(n=20, rounds=10):
    results = []
    for r in range(rounds):
        print(f"Round {r+1}")
        X_test, Y_test = generate_dataset(n)
        scores = []
        for i in range(n):
            x = X_test[i]
            y_true = Y_test[i]
            y_pred = autoregressive_decode(x)
            score = prefix_accuracy_single(y_true, y_pred, id_to_token)
            scores.append(score)
        avg = np.mean(scores)
        print(f"  Average prefix accuracy: {avg:.3f}")
        results.append(avg)
    return np.mean(results), np.std(results)

**4.3. Run Evaluationt**

In [43]:
mean_score, std_dev = test(n=20, rounds=10)
print(f"\nFinal Prefix Accuracy: {mean_score:.3f} ± {std_dev:.3f}")

Round 1
  Average prefix accuracy: 0.981
Round 2
  Average prefix accuracy: 1.000
Round 3
  Average prefix accuracy: 0.986
Round 4
  Average prefix accuracy: 0.991
Round 5
  Average prefix accuracy: 0.997
Round 6
  Average prefix accuracy: 0.992
Round 7
  Average prefix accuracy: 0.977
Round 8
  Average prefix accuracy: 0.989
Round 9
  Average prefix accuracy: 1.000
Round 10
  Average prefix accuracy: 0.975

Final Prefix Accuracy: 0.989 ± 0.008


During this project, I implemented and evaluated multiple architectures for the infix-to-postfix translation task, including:

**Transformer-based Encoder-Decoder**
* Achieved: Final Prefix Accuracy = 1.000 ± 0.000
* Parameters: >1.5 million
* Pros: High accuracy
* Cons: High computational cost, slower training

**Seq2Seq with Luong-style Attention**
* Achieved: Final Prefix Accuracy = 1.000 ± 0.000
* Parameters: ~800,000
* Pros: High accuracy
* Cons: Moderate complexity, slower than non-attention models

**Final Chosen Model – LSTM-based Seq2Seq without Attention**
* Achieved: Final Prefix Accuracy = 0.989 ± 0.008
* Parameters: ~200,000
* Pros:
     * Lightweight and efficient
     * Fast training and inference
     * Easier to interpret and deploy
* Cons:
     * Slightly lower accuracy

**Conclusion:**

Despite the slightly lower accuracy, I chose the LSTM-based Seq2Seq model without attention due to its significantly reduced complexity, faster performance, and minimal resource requirements, making it ideal for real-world deployment scenarios.

In [20]:
# Generate new unseen test dataset
X_test_new, Y_test_new = generate_dataset(n=1000, max_depth=MAX_DEPTH)
def evaluate_on_dataset(X_data, Y_data, sample_count=20, verbose=False):
    scores = []
    for i in range(sample_count):
        x = X_data[i]
        y_true = Y_data[i]
        y_pred = autoregressive_decode(x)
        score = prefix_accuracy_single(y_true, y_pred, id_to_token, verbose=verbose)
        scores.append(score)
    return np.mean(scores), np.std(scores)
mean_new, std_new = evaluate_on_dataset(X_test_new, Y_test_new, sample_count=100, verbose=False)
print(f"New Test Set Prefix Accuracy: {mean_new:.3f} ± {std_new:.3f}")
for _ in range(5):
    i = np.random.randint(len(X_test_new))
    print(f"\nExample {i}")
    print("Infix       :", decode_sequence(X_test_new[i], id_to_token))
    print("True Postfix:", decode_sequence(Y_test_new[i], id_to_token))
    print("Predicted   :", decode_sequence(autoregressive_decode(X_test_new[i]), id_to_token))
    print('-' * 60)

New Test Set Prefix Accuracy: 0.994 ± 0.028

Example 800
Infix       : ( c / e )
True Postfix: c e /
Predicted   : c e /
------------------------------------------------------------

Example 982
Infix       : ( e * e )
True Postfix: e e *
Predicted   : e e *
------------------------------------------------------------

Example 672
Infix       : ( a - ( c * a ) )
True Postfix: a c a * -
Predicted   : a c a * -
------------------------------------------------------------

Example 743
Infix       : ( d * e )
True Postfix: d e *
Predicted   : d e *
------------------------------------------------------------

Example 994
Infix       : ( ( d / e ) - a )
True Postfix: d e / a -
Predicted   : d e / a -
------------------------------------------------------------
