# Task 1: Sentence Transformer Implementation  
### Purpose:  
Develop a model that encodes sentences into fixed-length embeddings.  
This serves as the foundational step before expanding into multi-task learning.

---

## Thought Process  

Task 1 seemed quite straightforward, just encoding sentences into embeddings. **But as I started working on it,** I realized that the choice of framework and model depends a lot on the use case and context. Sometimes, a simpler and faster model is better than a highly accurate but heavy one, especially when speed and efficiency matter more.  

**For this assignment, I thought making decisions while keeping Fetch Rewards in mind would be super interesting and relevant.** Since I already understand how it works from a user’s perspective, I wanted something that was **lightweight, fast, scalable, and well-suited for short receipt entries.**  

I went with `sentence-transformers` because it is built exactly for this kind of task. It handles **tokenization, pooling, and encoding** in one go, so I did not have to set all that up manually.  

- I considered using **raw PyTorch**, but that would mean coding the entire pipeline myself, which felt unnecessary for something this simple.  
- **Hugging Face’s transformers library** was another option, but using it directly would require manually defining tokenizers and pooling.  
- Since `sentence-transformers` is **PyTorch-based and leverages Hugging Face models under the hood**, it made the most sense.  

Beyond choosing **MiniLM** as the transformer backbone, I also had to decide **how to process the embeddings efficiently.**  

- I used **MiniLM’s built-in mean pooling** rather than max pooling or CLS token representation, as it provides a **balanced and stable sentence-level representation.**  
- **Since no specific training data was provided,** I opted to use MiniLM’s **pretrained embeddings directly,** prioritizing **speed and generalization** over task-specific adaptation.  
- While some models struggle with **noisy text or typos**, from my experience, receipt data is usually well-structured and does not commonly have such issues (I'm not covering complex use cases like promotional discounts in this task) Given this, **MiniLM’s pretrained embeddings should generalize well without requiring additional fine-tuning.**  

---

## Model Selection  

I picked **`all-MiniLM-L6-v2`** because:  

- It is **small** with **22 million parameters** and **384-dimensional embeddings** but still performs well.  
- I considered **BERT (110M, 768D)** and **DistilBERT (66M, 768D)**, but they seemed **overkill** for what I needed.  
- Fetch receipts typically have **short and structured text** like `"2x Apples $1.99"`, so I did not need a massive model for context-heavy sentences.  
- MiniLM is **optimized for sentence embeddings**, runs **faster**, and already has **mean pooling built in**, so I did not have to add anything extra.  

---

## Final Decision  

In the end, I **prioritized speed and efficiency over unnecessary complexity.**  
MiniLM gave me **solid embeddings without slowing things down**, which made it the best choice for this task.  



In [9]:
# Task 1: Sentence Transformer Implementation
# Purpose: Develop a model that encodes sentences into fixed-length embeddings.
# This serves as the foundational step before expanding into multi-task learning.

import torch
import torch.nn as nn
from torch.optim import Adam
from sentence_transformers import SentenceTransformer

# Collecting receipt entries from the user
sentences_task1 = []
print("Enter sentences to generate embeddings (type 'done' when finished):")

while True:
    sentence = input("> ")
    if sentence.lower() == 'done':
        break
    sentences_task1.append(sentence)

if not sentences_task1:
    print("No sentences provided.")
else:
    # Generating and displaying embeddings
    model_task1 = SentenceTransformer('all-MiniLM-L6-v2')
    embeddings = model_task1.encode(sentences_task1, batch_size=32, show_progress_bar=False)

    print("\nGenerated Sentence Embeddings:")
    print("-----------------------------")
    for i, (sentence, embedding) in enumerate(zip(sentences_task1, embeddings),1):
        print(f"Sentence {i}: {sentence}")
        print(f"Embedding (first 5 of 384 dimensions): {embedding[:5]}...\n")


Enter sentences to generate embeddings (type 'done' when finished):


>  Tomatoes 2kgs
>  Olive oil 500ml
>  Brown rice 1kg
>  Cheddar cheese 200g
>  done



Generated Sentence Embeddings:
-----------------------------
Sentence 1: Tomatoes 2kgs
Embedding (first 5 of 384 dimensions): [-0.07684907  0.02557426 -0.00254072  0.09589951 -0.04464617]...

Sentence 2: Olive oil 500ml
Embedding (first 5 of 384 dimensions): [-0.04770451  0.00663916  0.01213263  0.00963966  0.08413599]...

Sentence 3: Brown rice 1kg
Embedding (first 5 of 384 dimensions): [-0.04152735  0.02855367  0.01217135  0.07473338 -0.02057862]...

Sentence 4: Cheddar cheese 200g
Embedding (first 5 of 384 dimensions): [-0.06899679  0.01976784 -0.01123568  0.04121638 -0.07709885]...



# Task 2: Multi-Task Learning Expansion

While developing a multi-task learning (MTL) model, I decided to continue **keeping Fetch Rewards in mind**, ensuring that the approach **aligned with real-world receipt processing needs**. The goal was to make a single model capable of both classifying receipt items into categories and extracting numerical quantities, instead of handling these as separate tasks. Since both tasks require understanding the same structured receipt text, I thought sharing MiniLM’s embeddings across them would make the model more efficient and scalable.

---

# Multi-Task Learning Architecture

To achieve this, I kept MiniLM as the shared transformer backbone and added two task-specific heads:

- **Classification Head:** Assigns receipt items to one of five predefined categories (Fruit, Dairy, Bakery, Meat, and Other). This aligns with Fetch’s categorization system and ensures structured analysis of receipts.

- **Quantity Extraction Head:** Identifies numerical values in text, such as recognizing "2" in "2x Apples $1.99." I thought this might be useful for purchase tracking, optimizing spending insights, and enhancing Fetch’s reward calculations.

---

# Implementation Details

- A **fully connected classification layer (nn.Linear(384, 5))** to map MiniLM’s 384-dimensional embeddings to category logits, which are later converted into probabilities using **softmax**.
- A **regression layer (nn.Linear(384, 1))** to predict a single scalar value representing the quantity.
- A **shared MiniLM backbone** so that the model processes each receipt once while generating outputs for both tasks.
- **Independent task heads** to **minimize task interference**, ensuring that learning signals from one task do not negatively impact the other.

---
  
# Training Considerations

Since no labeled training data was provided, I used MiniLM’s pretrained embeddings directly instead of fine-tuning. Fine-tuning would have been valuable if I had access to a Fetch-specific dataset, but in this case, leveraging pretrained embeddings seemed like the most practical choice. In the end, this approach allows the model to classify receipt items and extract numerical quantities in a single forward pass, making it efficient, scalable, and well-suited for Fetch’s use case atleast for time being (Please note, training the heads is still necessary moving forward).

In [11]:
# Task 2: Multi-Task Learning Expansion
# Purpose: Extend MiniLM for multi-task learning with two heads:
# 1. Product category classification (Task A)
# 2. Quantity extraction (Task B)

# Importing required libraries again (to avoid running them out of order)
import torch  # For neural network operations
import torch.nn as nn  # Neural network module
from sentence_transformers import SentenceTransformer  # For MiniLM backbone

# Defining the multi-task model class
class MultiTaskModel(nn.Module):
    def __init__(self, num_categories=5):
        super(MultiTaskModel, self).__init__()
        #Loading the pre-trained MiniLM model
        self.backbone = SentenceTransformer('all-MiniLM-L6-v2')
        #Embedding size from MiniLM (fixed at 384 dimensions by defaut)
        self.embedding_dim = 384
        #Task A: Classification head (I chose 5 categories)
        self.classification_head = nn.Linear(self.embedding_dim, num_categories)
        #Task B: Regression head (quantity prediction)
        self.regression_head = nn.Linear(self.embedding_dim, 1)
    
    def forward(self, sentences):
        #Encoding sentences into embeddings
        embeddings = self.backbone.encode(sentences, convert_to_tensor=True, 
                                        batch_size=32, show_progress_bar=False)
        # Ensuring embeddings match device (e.g., CPU/GPU)
        embeddings = embeddings.to(self.classification_head.weight.device)
        # Task A: Output classification logits
        class_logits = self.classification_head(embeddings)
        # Task B: Output quantity prediction
        quantity_pred = self.regression_head(embeddings)
        return class_logits, quantity_pred

# Initializing the model with 5 categories
model_task2 = MultiTaskModel(num_categories=5)
category_labels = ["Fruit", "Dairy", "Bakery", "Meat", "Other"]

# Confirming the setup
print("MTL model initialized.")

# Using a hardcoded list of 5 receipt-like sentences for testing
sentences_task2 = [
    "2x Apples $1.99",    
    "Milk 1L $2.50",     
    "3x Bread $3.00",    
    "Chicken Breast 500g $5.99",  
    "4x Bananas $2.40"    
]

# Displaying the sentences to be processed for clarity
print("Processing the following receipt items:")
for i, sentence in enumerate(sentences_task2, 1):
    print(f"{i}. {sentence}")

# Get predictions (since no training has been performed on heads, they are expected to give random results)
model_task2.eval()
with torch.no_grad():
    class_logits, quantity_pred = model_task2(sentences_task2)
    class_probs = torch.softmax(class_logits, dim=1)
    class_indices = torch.argmax(class_probs, dim=1)
    
    # Displaying results
    print("\nMulti-Task Predictions:")
    print("----------------------")
    for i, (sentence, category_idx, qty) in enumerate(zip(sentences_task2, class_indices, quantity_pred)):
        category = category_labels[category_idx.item()]
        quantity = qty.item()
        print(f"Sentence {i+1}: {sentence}")
        print(f"Predicted Category: {category}")
        print(f"Predicted Quantity: {quantity:.2f}\n")

MTL model initialized.
Processing the following receipt items:
1. 2x Apples $1.99
2. Milk 1L $2.50
3. 3x Bread $3.00
4. Chicken Breast 500g $5.99
5. 4x Bananas $2.40

Multi-Task Predictions:
----------------------
Sentence 1: 2x Apples $1.99
Predicted Category: Meat
Predicted Quantity: -0.01

Sentence 2: Milk 1L $2.50
Predicted Category: Fruit
Predicted Quantity: -0.04

Sentence 3: 3x Bread $3.00
Predicted Category: Dairy
Predicted Quantity: -0.02

Sentence 4: Chicken Breast 500g $5.99
Predicted Category: Dairy
Predicted Quantity: -0.00

Sentence 5: 4x Bananas $2.40
Predicted Category: Meat
Predicted Quantity: 0.00



# Task 3: Adapting MiniLM for Multi-Task Learning

*Please refer to **Task_Approach** for detailed explanation*

# Task 4: Training Loop Implementation (Bonus Task)

For this task, I designed a **hypothetical training loop** for my **Task 2 multi-task learning (MTL) model**, which extends **MiniLM** with separate heads for **category classification** and **quantity extraction**. Since this is a conceptual exercise, I focused on **structuring an efficient MTL training process**, ensuring the model effectively handles **data, forward passes, and evaluation metrics**.

---

## Handling Hypothetical Data

Given the absence of a real dataset, I assumed a **small structured dataset** of five **receipt-like sentences**, each labeled with:

- **A category index** (e.g., `"Fruit"` mapped to class `0`).
- **A quantity value** (e.g., `"2x Apples"` → Quantity: `2`).

This setup **mimics real receipt data**, allowing simultaneous training of **classification and regression tasks**. **MiniLM’s tokenizer** processes inputs, and batch handling is omitted for simplicity, though in practice, a DataLoader would be used.

---

## Forward Pass & Model Structure

To ensure a **trainable architecture**, I made the following design choices:

- Used **AutoModel and a tokenizer** from **transformers** instead of `.encode()` (which isn’t differentiable).
- **MiniLM extracts 384D embeddings** from tokenized inputs.
- The embeddings are passed to **task-specific heads**:
  - **Classification Head** → Outputs logits for category prediction.
  - **Regression Head** → Predicts numerical quantities.

This setup ensures **MiniLM acts as a shared encoder**, while **task-specific heads** optimize for distinct learning objectives, improving efficiency.

---

## Loss Functions & Metrics

Each task requires a different **evaluation metric**:

- **Classification:** **Accuracy** – Measures how often the predicted category matches the actual label.
- **Regression:** **Mean Squared Error (MSE)** – Evaluates how far the predicted quantity deviates from the actual value.

For **loss calculation**:

- **Cross-Entropy Loss** for classification.
- **MSE Loss** for regression.
- **Total Loss** is computed as the sum of both losses with equal weighting **(ensuring both tasks contribute equally to learning, preventing one from dominating the training process).**

### Why Cross-Entropy for Classification?

I chose **Cross-Entropy Loss** because it is the standard loss function for **multi-class classification tasks**. Since the classification head outputs **logits** (raw scores before softmax), Cross-Entropy converts them into a probability distribution and penalizes incorrect predictions. It **amplifies large errors**, ensuring that misclassified samples contribute more to the loss, which helps improve learning.

### Why MSE for Regression?

For quantity extraction, I used **Mean Squared Error (MSE)** because it is a widely used loss function for **continuous numerical predictions**. MSE penalizes larger errors more heavily than smaller ones, making it effective for reducing variance in numerical outputs. Since the quantity values are numeric, MSE helps the model minimize **the difference between predicted and actual quantities** effectively.


In [13]:
# Task 4: Training Loop Implementation (Bonus Task)
# Purpose: Hypothetical training loop for Task 2 MTL model
# Extends MiniLM with classification and quantity extraction heads

# Importing libraries again to avoid running them out of order
import torch  # For neural network operations and tensors
import torch.nn as nn  # Neural network module
import torch.optim as optim  # Optimizer for training
from transformers import AutoModel, AutoTokenizer  # For differentiable MiniLM

# Defining the multi-task model's class
class MultiTaskModel(nn.Module):
    def __init__(self, num_categories=5):
        super(MultiTaskModel, self).__init__()
        #Loading pre-trained MiniLM backbone (differentiable version)
        self.backbone = AutoModel.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
        #Loading tokenizer for MiniLM
        self.tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
        #Embedding size (MiniLM’s hidden size that's default)
        self.embedding_dim = 384
        #Classification head (5 categories)
        self.classification_head = nn.Linear(self.embedding_dim, num_categories)
        #Regression head (quantity prediction)
        self.regression_head = nn.Linear(self.embedding_dim, 1)
    
    def forward(self, sentences):
        # Tokenizing sentences into input tensors (input_ids, attention_mask)
        inputs = self.tokenizer(sentences, return_tensors='pt', padding=True, truncation=True)
        # Forward pass through MiniLM backbone
        outputs = self.backbone(**inputs)
        # Mean pool over sequence length to get 384D embeddings per sentence
        embeddings = outputs.last_hidden_state.mean(dim=1)
        #Output classification logits
        class_logits = self.classification_head(embeddings)
        #Output quantity prediction
        quantity_pred = self.regression_head(embeddings)
        return class_logits, quantity_pred

# Initialize the model with 5 categories
model = MultiTaskModel(num_categories=5)
# Define category labels for reference (not used in training though, just for context)
category_labels = ["Fruit", "Dairy", "Bakery", "Meat", "Other"]

# Hypothetical dataset: 5 receipt-like sentences with labels
# Format: (sentence, category_index, quantity)
hypothetical_data = [
    ("2x Apples $1.99", 0, 2),      # Fruit (0), qty 2
    ("Milk 1L $2.50", 1, 1),        # Dairy (1), qty 1
    ("3x Bread $3.00", 2, 3),       # Bakery (2), qty 3
    ("Chicken Breast 500g $5.99", 3, 1),  # Meat (3), qty 1
    ("4x Bananas $2.40", 0, 4)      # Fruit (0), qty 4
]

# Extracting sentences and labels for training
sentences = [item[0] for item in hypothetical_data]
category_labels_tensor = torch.tensor([item[1] for item in hypothetical_data], dtype=torch.long)
quantity_labels_tensor = torch.tensor([item[2] for item in hypothetical_data], dtype=torch.float)

# Training setup
optimizer = optim.Adam(model.parameters(), lr=0.001)  # Using Adam optimizer for all trainable parameters
classification_loss_fn = nn.CrossEntropyLoss()        # Cross Entropy loss function for classification task
regression_loss_fn = nn.MSELoss()                     # MSE loss function for regression task
num_epochs = 5                                        # Number of epochs for hypothetical training 

# Training loop (hypothetical run)
for epoch in range(num_epochs):
    model.train()  # Set model to training mode (enables gradient tracking)
    optimizer.zero_grad()  # Reset gradients to zero

    # Forward pass: Get predictions for all sentences
    class_logits, quantity_pred = model(sentences)
    
    # Computing losses for both tasks
    class_loss = classification_loss_fn(class_logits, category_labels_tensor)
    qty_loss = regression_loss_fn(quantity_pred.squeeze(), quantity_labels_tensor)
    total_loss = class_loss + qty_loss  # Combine losses with equal weighting
    
    # Backward pass: Computing gradients
    total_loss.backward()
    
    # Updating model weights
    optimizer.step()
    
    # Calculate metrics (accuracy for classification, MSE for quantity)
    _, predicted_categories = torch.max(class_logits, 1)
    accuracy = (predicted_categories == category_labels_tensor).float().mean().item()
    mse = qty_loss.item()
    
    # Printing epoch progress (simulated output)
    print(f"Epoch {epoch+1}/{num_epochs}:")
    print(f"Classification Loss: {class_loss.item():.4f}, Quantity Loss: {qty_loss.item():.4f}")
    print(f"Accuracy: {accuracy:.4f}, MSE: {mse:.4f}")

# Indication of hypothetical training completion
print("Training complete (hypothetical run).")

Epoch 1/5:
Classification Loss: 1.6624, Quantity Loss: 7.0470
Accuracy: 0.0000, MSE: 7.0470
Epoch 2/5:
Classification Loss: 1.4747, Quantity Loss: 1.6029
Accuracy: 0.6000, MSE: 1.6029
Epoch 3/5:
Classification Loss: 1.3608, Quantity Loss: 6.2821
Accuracy: 0.4000, MSE: 6.2821
Epoch 4/5:
Classification Loss: 1.5832, Quantity Loss: 6.0803
Accuracy: 0.4000, MSE: 6.0803
Epoch 5/5:
Classification Loss: 1.4890, Quantity Loss: 2.2432
Accuracy: 0.4000, MSE: 2.2432
Training complete (hypothetical run).
