<a href="https://colab.research.google.com/github/ppujari/089_dog_breed_classifier/blob/master/coin_question_5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Question-5**
**Approach to training a Deep Neural Network for Categorical Data for Classification:**  
For this problem, I am dealing with a dataset that consists entirely of categorical features and a categorical target label. This means I need to:
1.	Preprocess categorical data properly using encoding techniques.
2.	Train a neural network (MLP and another architecture) to classify the categorical target.
3.	Explore constraints where all model parameters (weights and biases) must remain positive.


[IMPLEMENTATION 1]  
Training a Multilayer Perceptron (MLP) using only Pandas and NumPy
This approach implements a basic MLP classifier from scratch using only NumPy and Pandas, demonstrating:

Data encoding (One-hot encoding for categorical features)  
Forward pass (Activation functions & output computation)  
Backpropagation (Gradient computation & weight updates)  
Optimization (Stochastic Gradient Descent - SGD)  

1️⃣ Import Required Libraries

In [2]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.preprocessing import OneHotEncoder

2️⃣ Data Preparation  
We generate a synthetic dataset with categorical features and a categorical target.

In [3]:
# Generate a toy categorical dataset
data = pd.DataFrame({
    "feature1": ["A", "B", "A", "C", "B", "C", "A", "B"],
    "feature2": ["X", "Y", "X", "Y", "X", "X", "Y", "Y"],
    "target": ["Yes", "No", "No", "Yes", "Yes", "No", "Yes", "No"]
})

# One-hot encode features and target
X = pd.get_dummies(data[["feature1", "feature2"]], dtype=int).values
y = pd.get_dummies(data["target"], dtype=int).values  # One-hot encoding for categorical labels

# Set input/output dimensions
input_size = X.shape[1]  # Number of features after one-hot encoding
output_size = y.shape[1]  # Number of categories in target


3️⃣ Define Activation Functions  
ReLU activation for hidden layers  
Softmax activation for output layer  


In [4]:
def softmax(z):
    exp_z = np.exp(z - np.max(z, axis=1, keepdims=True))  # Prevent overflow
    return exp_z / np.sum(exp_z, axis=1, keepdims=True)

def relu(x):
    return np.maximum(0, x)

def relu_derivative(x):
    return (x > 0).astype(float)


4️⃣ Implement Forward & Backward Propagation  
Cross-entropy loss function

In [5]:
# Initialize weights & biases
np.random.seed(42)
hidden_size = 5  # Number of neurons in hidden layer
W1 = np.random.randn(input_size, hidden_size) * 0.01
b1 = np.zeros((1, hidden_size))
W2 = np.random.randn(hidden_size, output_size) * 0.01
b2 = np.zeros((1, output_size))

# Hyperparameters
learning_rate = 0.1
epochs = 1000

# Training loop
for epoch in range(epochs):
    # Forward Pass
    Z1 = np.dot(X, W1) + b1
    A1 = relu(Z1)
    Z2 = np.dot(A1, W2) + b2
    A2 = softmax(Z2)  # Output layer

    # Compute Loss (Cross-Entropy)
    loss = -np.mean(y * np.log(A2 + 1e-8))

    # Backpropagation
    dZ2 = A2 - y
    dW2 = np.dot(A1.T, dZ2) / len(X)
    db2 = np.sum(dZ2, axis=0, keepdims=True) / len(X)

    dA1 = np.dot(dZ2, W2.T)
    dZ1 = dA1 * relu_derivative(Z1)
    dW1 = np.dot(X.T, dZ1) / len(X)
    db1 = np.sum(dZ1, axis=0, keepdims=True) / len(X)

    # Gradient Descent Update
    W1 -= learning_rate * dW1
    b1 -= learning_rate * db1
    W2 -= learning_rate * dW2
    b2 -= learning_rate * db2

    if epoch % 100 == 0:
        print(f"Epoch {epoch}, Loss: {loss:.4f}")


Epoch 0, Loss: 0.3466
Epoch 100, Loss: 0.3463
Epoch 200, Loss: 0.3397
Epoch 300, Loss: 0.2838
Epoch 400, Loss: 0.2478
Epoch 500, Loss: 0.2396
Epoch 600, Loss: 0.2040
Epoch 700, Loss: 0.1415
Epoch 800, Loss: 0.1151
Epoch 900, Loss: 0.1050


[IMPLEMENTATION 2] Training a Deep Neural Network with PyTorch  
This implementation:  
✅ Uses PyTorch for building and training the model  
✅ Implements a Multi-Layer Perceptron (MLP) for classification  
✅ Includes data preparation, model definition, training, and evaluation  
✅ Constrains weights and biases to be positive

**Architecture Choice:**  
* Residual networks for better gradient flow  
* Batch normalization for training stability


In [8]:

# Generate synthetic categorical dataset
data = pd.DataFrame({
    "feature1": ["A", "B", "A", "C", "B", "C", "A", "B"],
    "feature2": ["X", "Y", "X", "Y", "X", "X", "Y", "Y"],
    "target": ["Yes", "No", "No", "Yes", "Yes", "No", "Yes", "No"]
})

#One-hot encode categorical features
encoder = OneHotEncoder(sparse_output=False)
X = encoder.fit_transform(data[["feature1", "feature2"]])
y = pd.get_dummies(data["target"], dtype=int).values  # One-hot encoding for target

# Convert to PyTorch tensors
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train, X_test = torch.tensor(X_train, dtype=torch.float32), torch.tensor(X_test, dtype=torch.float32)
y_train, y_test = torch.tensor(y_train, dtype=torch.float32), torch.tensor(y_test, dtype=torch.float32)

# Input/output dimensions
input_size = X_train.shape[1]
output_size = y_train.shape[1]


3️⃣ Define the MLP Model in PyTorch

In [9]:
class MLP(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(MLP, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, output_size)
        self.relu = nn.ReLU()  # Activation function
        self.softmax = nn.Softmax(dim=1)  # Output activation

    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.softmax(self.fc2(x))
        return x

# Initialize model
hidden_size = 5
model = MLP(input_size, hidden_size, output_size)

# Loss function & optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)


4️⃣ Training Loop

In [10]:
epochs = 500
for epoch in range(epochs):
    model.train()  # Set model to training mode

    # Forward pass
    outputs = model(X_train)
    loss = criterion(outputs, torch.argmax(y_train, dim=1))  # Cross-entropy loss

    # Backward pass
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    # Print loss every 100 epochs
    if epoch % 100 == 0:
        print(f"Epoch {epoch}, Loss: {loss.item():.4f}")


Epoch 0, Loss: 0.6692
Epoch 100, Loss: 0.4869
Epoch 200, Loss: 0.4526
Epoch 300, Loss: 0.4461
Epoch 400, Loss: 0.4437


5️⃣ Evaluate the Model

In [11]:
model.eval()  # Set model to evaluation mode
with torch.no_grad():
    test_outputs = model(X_test)
    predicted = torch.argmax(test_outputs, dim=1)
    actual = torch.argmax(y_test, dim=1)
    accuracy = (predicted == actual).sum().item() / len(y_test)

print(f"Test Accuracy: {accuracy:.2f}")


Test Accuracy: 0.50


[CONSTRAINING WEIGHTS & BIASES TO BE POSITIVE]  
PyTorch allows constraints via custom weight updates.

Option 1: Apply ReLU to Weights

In [12]:
def enforce_positive_weights():
    with torch.no_grad():
        for param in model.parameters():
            param.clamp_(min=0)  # Ensures all weights & biases stay positive

for epoch in range(epochs):
    model.train()
    outputs = model(X_train)
    loss = criterion(outputs, torch.argmax(y_train, dim=1))

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    enforce_positive_weights()  # Apply positivity constraint


Option 2: Use Non-Negative Parameterization

In [13]:
class PositiveLinear(nn.Module):
    def __init__(self, in_features, out_features):
        super(PositiveLinear, self).__init__()
        self.weight = nn.Parameter(torch.abs(torch.randn(out_features, in_features)))  # Init positive weights
        self.bias = nn.Parameter(torch.abs(torch.randn(out_features)))  # Init positive bias

    def forward(self, x):
        return nn.functional.linear(x, torch.abs(self.weight), torch.abs(self.bias))  # Ensure weights/bias remain positive

# Use in model
class ConstrainedMLP(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(ConstrainedMLP, self).__init__()
        self.fc1 = PositiveLinear(input_size, hidden_size)
        self.fc2 = PositiveLinear(hidden_size, output_size)
        self.relu = nn.ReLU()
        self.softmax = nn.Softmax(dim=1)

    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.softmax(self.fc2(x))
        return x

model = ConstrainedMLP(input_size, hidden_size, output_size)


**Final Thoughts**  
PyTorch is much easier to use and scales better for deep learning.  
Weight constraints can be enforced with clamp_() or custom layers.  
Alternative models like TabTransformer could improve performance on categorical data.  

We can extend this with regularization, dropout, or a different architecture (e.g., Transformer-based models for tabular data) also.
