# Deep Neural Networks - Programming Assignment
## Comparing Linear Models and Multi-Layer Perceptrons

**Student Name:** ___________________  
**Student ID:** ___________________  
**Date:** ___________________

---

## ‚ö†Ô∏è IMPORTANT INSTRUCTIONS

1. **Complete ALL sections** marked with `TODO`
2. **DO NOT modify** the `get_assignment_results()` function structure
3. **Fill in all values accurately** - these will be auto-verified
4. **After submission**, you'll receive a verification quiz based on YOUR results
5. **Run all cells** before submitting (Kernel ‚Üí Restart & Run All)

---

In [44]:
# Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler,LabelEncoder
import time
import warnings
warnings.filterwarnings('ignore')
# Set random seed for reproducibility
np.random.seed(42)
print('‚úì Libraries imported successfully')

‚úì Libraries imported successfully


## Section 1: Dataset Selection and Loading

**Requirements:**
- ‚â•500 samples
- ‚â•5 features
- Public dataset (UCI/Kaggle)
- Regression OR Classification problem

In [45]:
# TODO: Load your dataset
# Example: data = pd.read_csv('your_dataset.csv')
data = pd.read_csv('Housing.csv')

# Dataset information (TODO: Fill these)
dataset_name = "House Prices"  # e.g., "House Price prediction"
dataset_source = "Kaggle"  # e.g., "UCI ML Repository"
n_samples = 545     # Total number of rows
n_features = 12     # Number of features (excluding target)
problem_type = "Regression"  # "regression" or "binary_classification" or "multiclass_classification"

# Problem statement (TODO: Write 2-3 sentences)
problem_statement = "To predict the price of a house based on various features like area, bedrooms, bathrooms, etc."


# Primary evaluation metric (TODO: Fill this)
primary_metric = "mse,rmse,mae"  # e.g., "recall", "accuracy", "rmse", "r2"

# Metric justification (TODO: Write 2-3 sentences)
metric_justification = "MAE -Choosen since its robust to outliers, RMSE & MSE Square errors for senstivity to large errors"


print(f"Dataset: {dataset_name}")
print(f"Source: {dataset_source}")
print(f"Samples: {n_samples}, Features: {n_features}")
print(f"Problem Type: {problem_type}")
print(f"Primary Metric: {primary_metric}")
# Display the first 5 rows of the training dataset
data.head(5)

Dataset: House Prices
Source: Kaggle
Samples: 545, Features: 12
Problem Type: Regression
Primary Metric: mse,rmse,mae


Unnamed: 0,price,area,bedrooms,bathrooms,stories,mainroad,guestroom,basement,hotwaterheating,airconditioning,parking,prefarea,furnishingstatus
0,13300000,7420,4,2,3,yes,no,no,no,yes,2,yes,furnished
1,12250000,8960,4,4,4,yes,no,no,no,yes,3,no,furnished
2,12250000,9960,3,2,2,yes,no,yes,no,no,2,yes,semi-furnished
3,12215000,7500,4,2,2,yes,no,yes,no,yes,3,yes,furnished
4,11410000,7420,4,1,2,yes,yes,yes,no,yes,2,no,furnished


## Section 2: Data Preprocessing

Preprocess your data:
1. Handle missing values
2. Encode categorical variables
3. Split into train/test sets
4. Scale features

In [46]:
# TODO: Preprocess your data
#1. Handle missing values /Drop Missing values. 
data = data.dropna()

# 2. Separate features and target
X = data.drop('price', axis=1)
y = data['price'].values 

# 3.Encode categorical variables, for better classification
categorical_cols = ['mainroad', 'guestroom', 'basement', 'hotwaterheating', 
                    'airconditioning', 'prefarea']
for col in categorical_cols:
    le = LabelEncoder()
    X[col] = le.fit_transform(X[col])

# 4.One-hot encode furnishingstatus
X = pd.get_dummies(X, columns=['furnishingstatus'], drop_first=True)
X = X.values


# TODO: Train-test split(80-20)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# TODO: Feature scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Fill these after preprocessing
train_samples = X_train.shape[0]      # Number of training samples
test_samples = X_test.shape[0]       # Number of test samples
train_test_ratio = .8

print(f"Train samples: {train_samples}")
print(f"Test samples: {test_samples}")
print(f"Split ratio: {train_test_ratio:.1%}")



Train samples: 436
Test samples: 109
Split ratio: 80.0%


## Section 3: Baseline Model Implementation

Implement from scratch (NO sklearn models!):
- Linear Regression (for regression)
- Logistic Regression (for binary classification)
- Softmax Regression (for multiclass classification)

**Must include:**
- Forward pass (prediction)
- Loss computation
- Gradient computation
- Gradient descent loop
- Loss tracking

In [82]:
class BaselineModel:
    """
    Baseline linear model with gradient descent
    Implement: Linear/Logistic/Softmax Regression
    """
    def __init__(self, learning_rate=0.01, n_iterations=1000):
        self.lr = learning_rate
        self.n_iterations = n_iterations
        self.weights = None
        self.bias = None
        self.loss_history = []
    
    def fit(self, X, y,verbose=True):
        """
        TODO: Implement gradient descent training
        
        Steps:
        1. Initialize weights and bias
        2. For each iteration:
           a. Compute predictions (forward pass)
           b. Compute loss
           c. Compute gradients
           d. Update weights and bias
           e. Store loss in self.loss_history
        
        Must populate self.loss_history with loss at each iteration!
        """
        n_samples, n_features = X.shape
        
        # TODO: Initialize parameters
        self.weights = np.zeros(n_features)
        self.bias = 0
        
        # TODO: Implement gradient descent loop
           # Gradient descent loop
        for iteration in range(self.n_iterations):
            # Forward pass: predictions
            y_pred = X @ self.weights + self.bias

            # Compute loss (MSE)
            mse_loss = np.mean((y_pred - y) ** 2)
            self.loss_history.append(mse_loss)

            # Compute gradients
            dw = (2 / n_samples) * X.T @ (y_pred - y)
            db = (2 / n_samples) * np.sum(y_pred - y)

            # Update weights: w = w - lr * grad_w
            self.weights = self.weights - self.lr * dw
            self.bias = self.bias - self.lr * db
      
             # Print progress
            if verbose and (iteration % 100 == 0 or iteration == self.n_iterations - 1):
                print(f"Iteration {iteration:4d} | Loss: {mse_loss:.4f}")

        if verbose:
            print(f"\nTraining completed!")
            print(f"Final Loss: {self.loss_history[-1]:.4f}")

    def predict(self, X):
        """Make predictions"""
        return X @ self.weights + self.bias
        """
        TODO: Implement prediction
        
        For regression: return linear_output
        For classification: return class probabilities or labels
        """
        pass  # Replace with your implementation

print("‚úì Baseline model class defined")



‚úì Baseline model class defined


In [83]:
# Train baseline model
print("Training baseline model...")
baseline_start_time = time.time()

# TODO: Initialize and train your baseline model
# Loss not reducing after 250 iteration , hence kept iteration as 1000
baseline_model = BaselineModel(learning_rate=0.01, n_iterations=10000)
baseline_model.fit(X_train, y_train)

# TODO: Make predictions
baseline_predictions = baseline_model.predict(X_test)

baseline_training_time = time.time() - baseline_start_time
print(f"‚úì Baseline training completed in {baseline_training_time:.2f}s")
print(f"‚úì Loss decreased from {baseline_model.loss_history[0]:.4f} to {baseline_model.loss_history[-1]:.4f}")

Training baseline model...
Iteration    0 | Loss: 25234792406487.6133
Iteration  100 | Loss: 1370436805999.2383
Iteration  200 | Loss: 976712970341.8345
Iteration  300 | Loss: 968699659414.5281
Iteration  400 | Loss: 968397479641.6851
Iteration  500 | Loss: 968365022913.4082
Iteration  600 | Loss: 968359489336.0347
Iteration  700 | Loss: 968358442967.7993
Iteration  800 | Loss: 968358238867.8748
Iteration  900 | Loss: 968358198503.9960
Iteration 1000 | Loss: 968358190459.5011
Iteration 1100 | Loss: 968358188847.5945
Iteration 1200 | Loss: 968358188523.1134
Iteration 1300 | Loss: 968358188457.4926
Iteration 1400 | Loss: 968358188444.1562
Iteration 1500 | Loss: 968358188441.4310
Iteration 1600 | Loss: 968358188440.8707
Iteration 1700 | Loss: 968358188440.7549
Iteration 1800 | Loss: 968358188440.7308
Iteration 1900 | Loss: 968358188440.7256
Iteration 2000 | Loss: 968358188440.7246
Iteration 2100 | Loss: 968358188440.7242
Iteration 2200 | Loss: 968358188440.7242
Iteration 2300 | Loss: 9683

## Section 4: Multi-Layer Perceptron Implementation

Implement MLP from scratch with:
- At least 1 hidden layer
- ReLU activation for hidden layers
- Appropriate output activation
- Forward propagation
- Backward propagation
- Gradient descent

In [None]:
class MLP:
    """
    Multi-Layer Perceptron implemented from scratch
    """
    def __init__(self, architecture, learning_rate=0.01, n_iterations=1000):
        """
        architecture: list [input_size, hidden1, hidden2, ..., output_size]
        Example: [30, 16, 8, 1] means:
            - 30 input features
            - Hidden layer 1: 16 neurons
            - Hidden layer 2: 8 neurons
            - Output layer: 1 neuron
        """
        self.architecture = architecture
        self.lr = learning_rate
        self.n_iterations = n_iterations
        self.parameters = {}
        self.loss_history = []
        self.cache = {}
    
    def initialize_parameters(self):
        """
        TODO: Initialize weights and biases for all layers
        
        For each layer l:
        - W[l]: weight matrix of shape (n[l], n[l-1])
        - b[l]: bias vector of shape (n[l], 1)
        
        Store in self.parameters dictionary
        """
        np.random.seed(42)
        
        for l in range(1, len(self.architecture)):
            # TODO: Initialize weights and biases
            # self.parameters[f'W{l}'] = ...
            # self.parameters[f'b{l}'] = ...
            pass
    
    def relu(self, Z):
        """ReLU activation function"""
        return np.maximum(0, Z)
    
    def relu_derivative(self, Z):
        """ReLU derivative"""
        return (Z > 0).astype(float)
    
    def sigmoid(self, Z):
        """Sigmoid activation (for binary classification output)"""
        return 1 / (1 + np.exp(-np.clip(Z, -500, 500)))
    
    def forward_propagation(self, X):
        """
        TODO: Implement forward pass through all layers
        
        For each layer:
        1. Z[l] = W[l] @ A[l-1] + b[l]
        2. A[l] = activation(Z[l])
        
        Store Z and A in self.cache for backpropagation
        Return final activation A[L]
        """
        self.cache['A0'] = X
        
        # TODO: Implement forward pass
        # for l in range(1, len(self.architecture)):
        #     ...
        
        pass  # Replace with your implementation
    
    def backward_propagation(self, X, y):
        """
        TODO: Implement backward pass to compute gradients
        
        Starting from output layer, compute:
        1. dZ[l] for each layer
        2. dW[l] = dZ[l] @ A[l-1].T / m
        3. db[l] = sum(dZ[l]) / m
        
        Return dictionary of gradients
        """
        m = X.shape[0]
        grads = {}
        
        # TODO: Implement backward pass
        # Start with output layer gradient
        # Then propagate backwards through hidden layers
        
        pass  # Replace with your implementation
        
        return grads
    
    def update_parameters(self, grads):
        """
        TODO: Update weights and biases using gradients
        
        For each layer:
        W[l] = W[l] - learning_rate * dW[l]
        b[l] = b[l] - learning_rate * db[l]
        """
        # TODO: Implement parameter updates
        pass
    
    def compute_loss(self, y_pred, y_true):
        """
        TODO: Compute loss
        
        For regression: MSE
        For classification: Cross-entropy
        """
        pass  # Replace with your implementation
    
    def fit(self, X, y):
        """
        TODO: Implement training loop
        
        For each iteration:
        1. Forward propagation
        2. Compute loss
        3. Backward propagation
        4. Update parameters
        5. Store loss
        
        Must populate self.loss_history!
        """
        self.initialize_parameters()
        
        for i in range(self.n_iterations):
            # TODO: Training loop
            pass
        
        return self
    
    def predict(self, X):
        """
        TODO: Implement prediction
        
        Use forward_propagation and apply appropriate thresholding
        """
        pass  # Replace with your implementation

print("‚úì MLP class defined")

In [None]:
# Train MLP
print("Training MLP...")
mlp_start_time = time.time()

# TODO: Define your architecture and train MLP
mlp_architecture = []  # Example: [n_features, 16, 8, 1]
mlp_model = MLP(architecture=mlp_architecture, learning_rate=0.01, n_iterations=1000)
# mlp_model.fit(X_train_scaled, y_train)

# TODO: Make predictions
# mlp_predictions = mlp_model.predict(X_test_scaled)

mlp_training_time = time.time() - mlp_start_time
print(f"‚úì MLP training completed in {mlp_training_time:.2f}s")
print(f"‚úì Loss decreased from {mlp_model.loss_history[0]:.4f} to {mlp_model.loss_history[-1]:.4f}")

## Section 5: Evaluation and Metrics

Calculate appropriate metrics for your problem type

In [None]:
def calculate_metrics(y_true, y_pred, problem_type):
    """
    TODO: Calculate appropriate metrics based on problem type
    
    For regression: MSE, RMSE, MAE, R¬≤
    For classification: Accuracy, Precision, Recall, F1
    """
    metrics = {}
    
    if problem_type == "regression":
        # TODO: Calculate regression metrics
        pass
    elif problem_type in ["binary_classification", "multiclass_classification"]:
        # TODO: Calculate classification metrics
        pass
    
    return metrics

# Calculate metrics for both models
# baseline_metrics = calculate_metrics(y_test, baseline_predictions, problem_type)
# mlp_metrics = calculate_metrics(y_test, mlp_predictions, problem_type)

print("Baseline Model Performance:")
# print(baseline_metrics)

print("\nMLP Model Performance:")
# print(mlp_metrics)

## Section 6: Visualization

Create visualizations:
1. Training loss curves
2. Performance comparison
3. Additional domain-specific plots

In [None]:
# 1. Training loss curves
plt.figure(figsize=(14, 5))

plt.subplot(1, 2, 1)
# TODO: Plot baseline loss
# plt.plot(baseline_model.loss_history, label='Baseline', color='blue')
plt.xlabel('Iteration')
plt.ylabel('Loss')
plt.title('Baseline Model - Training Loss')
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(1, 2, 2)
# TODO: Plot MLP loss
# plt.plot(mlp_model.loss_history, label='MLP', color='red')
plt.xlabel('Iteration')
plt.ylabel('Loss')
plt.title('MLP Model - Training Loss')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# 2. Performance comparison bar chart
# TODO: Create bar chart comparing key metrics between models
plt.figure(figsize=(10, 6))

# Example:
# metrics = ['Accuracy', 'Precision', 'Recall', 'F1']
# baseline_scores = [baseline_metrics[m] for m in metrics]
# mlp_scores = [mlp_metrics[m] for m in metrics]
# 
# x = np.arange(len(metrics))
# width = 0.35
# 
# plt.bar(x - width/2, baseline_scores, width, label='Baseline')
# plt.bar(x + width/2, mlp_scores, width, label='MLP')
# plt.xlabel('Metrics')
# plt.ylabel('Score')
# plt.title('Model Performance Comparison')
# plt.xticks(x, metrics)
# plt.legend()
# plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Section 7: Analysis and Discussion

Write your analysis (minimum 200 words)

In [None]:
analysis_text = """
TODO: Write your analysis here (minimum 200 words)

Address these questions:
1. Which model performed better and by how much?
2. Why do you think one model outperformed the other?
3. What was the computational cost difference (training time)?
4. Any surprising findings or challenges you faced?
5. What insights did you gain about neural networks vs linear models?

Write your thoughtful analysis here. Be specific and reference your actual results.
Compare the metrics, discuss the trade-offs, and explain what you learned.
"""

print(f"Analysis word count: {len(analysis_text.split())} words")
if len(analysis_text.split()) < 200:
    print("‚ö†Ô∏è  Warning: Analysis should be at least 200 words")
else:
    print("‚úì Analysis meets word count requirement")

---
---

## ‚≠ê REQUIRED: Structured Output Function

### **DO NOT MODIFY THE STRUCTURE BELOW**

This function will be called by the auto-grader. Fill in all values accurately based on your actual results.

In [None]:
def get_assignment_results():
    """
    Return all assignment results in structured format.
    
    CRITICAL: Fill in ALL values based on your actual results!
    This will be automatically extracted and validated.
    """
    
    # Calculate loss convergence flags
    baseline_initial_loss = 0.0  # TODO: baseline_model.loss_history[0]
    baseline_final_loss = 0.0    # TODO: baseline_model.loss_history[-1]
    mlp_initial_loss = 0.0       # TODO: mlp_model.loss_history[0]
    mlp_final_loss = 0.0         # TODO: mlp_model.loss_history[-1]
    
    results = {
        # ===== Dataset Information =====
        'dataset_name': dataset_name,
        'dataset_source': dataset_source,
        'n_samples': n_samples,
        'n_features': n_features,
        'problem_type': problem_type,
        'problem_statement': problem_statement,
        
        # ===== Evaluation Setup =====
        'primary_metric': primary_metric,
        'metric_justification': metric_justification,
        'train_samples': train_samples,
        'test_samples': test_samples,
        'train_test_ratio': train_test_ratio,
        
        # ===== Baseline Model Results =====
        'baseline_model': {
            'model_type': '',  # 'linear_regression', 'logistic_regression', or 'softmax_regression'
            'learning_rate': 0.0,
            'n_iterations': 0,
            'initial_loss': baseline_initial_loss,
            'final_loss': baseline_final_loss,
            'training_time_seconds': baseline_training_time,
            
            # Metrics (fill based on your problem type)
            'test_accuracy': 0.0,      # For classification
            'test_precision': 0.0,     # For classification
            'test_recall': 0.0,        # For classification
            'test_f1': 0.0,            # For classification
            'test_mse': 0.0,           # For regression
            'test_rmse': 0.0,          # For regression
            'test_mae': 0.0,           # For regression
            'test_r2': 0.0,            # For regression
        },
        
        # ===== MLP Model Results =====
        'mlp_model': {
            'architecture': mlp_architecture,
            'n_hidden_layers': len(mlp_architecture) - 2 if len(mlp_architecture) > 0 else 0,
            'total_parameters': 0,     # TODO: Calculate total weights + biases
            'learning_rate': 0.0,
            'n_iterations': 0,
            'initial_loss': mlp_initial_loss,
            'final_loss': mlp_final_loss,
            'training_time_seconds': mlp_training_time,
            
            # Metrics
            'test_accuracy': 0.0,
            'test_precision': 0.0,
            'test_recall': 0.0,
            'test_f1': 0.0,
            'test_mse': 0.0,
            'test_rmse': 0.0,
            'test_mae': 0.0,
            'test_r2': 0.0,
        },
        
        # ===== Comparison =====
        'improvement': 0.0,            # MLP primary_metric - baseline primary_metric
        'improvement_percentage': 0.0,  # (improvement / baseline) * 100
        'baseline_better': False,       # True if baseline outperformed MLP
        
        # ===== Analysis =====
        'analysis': analysis_text,
        'analysis_word_count': len(analysis_text.split()),
        
        # ===== Loss Convergence Flags =====
        'baseline_loss_decreased': baseline_final_loss < baseline_initial_loss,
        'mlp_loss_decreased': mlp_final_loss < mlp_initial_loss,
        'baseline_converged': False,  # Optional: True if converged
        'mlp_converged': False,
    }
    
    return results

## Test Your Output

Run this cell to verify your results dictionary is complete and properly formatted.

In [None]:
# Test the output
import json

try:
    results = get_assignment_results()
    
    print("="*70)
    print("ASSIGNMENT RESULTS SUMMARY")
    print("="*70)
    print(json.dumps(results, indent=2, default=str))
    print("\n" + "="*70)
    
    # Check for missing values
    missing = []
    def check_dict(d, prefix=""):
        for k, v in d.items():
            if isinstance(v, dict):
                check_dict(v, f"{prefix}{k}.")
            elif (v == 0 or v == "" or v == 0.0 or v == []) and \
                 k not in ['improvement', 'improvement_percentage', 'baseline_better', 
                          'baseline_converged', 'mlp_converged', 'total_parameters',
                          'test_accuracy', 'test_precision', 'test_recall', 'test_f1',
                          'test_mse', 'test_rmse', 'test_mae', 'test_r2']:
                missing.append(f"{prefix}{k}")
    
    check_dict(results)
    
    if missing:
        print(f"‚ö†Ô∏è  Warning: {len(missing)} fields still need to be filled:")
        for m in missing[:15]:  # Show first 15
            print(f"  - {m}")
        if len(missing) > 15:
            print(f"  ... and {len(missing)-15} more")
    else:
        print("‚úÖ All required fields are filled!")
        print("\nüéâ You're ready to submit!")
        print("\nNext steps:")
        print("1. Kernel ‚Üí Restart & Clear Output")
        print("2. Kernel ‚Üí Restart & Run All")
        print("3. Verify no errors")
        print("4. Save notebook")
        print("5. Rename as: YourStudentID_assignment.ipynb")
        print("6. Submit to LMS")
        
except Exception as e:
    print(f"‚ùå Error in get_assignment_results(): {str(e)}")
    print("\nPlease fix the errors above before submitting.")

---

## üì§ Before Submitting - Final Checklist

- [ ] **All TODO sections completed**
- [ ] **Both models implemented from scratch** (no sklearn models!)
- [ ] **get_assignment_results() function filled accurately**
- [ ] **Loss decreases for both models**
- [ ] **Analysis ‚â• 200 words**
- [ ] **All cells run without errors** (Restart & Run All)
- [ ] **Visualizations created**
- [ ] **File renamed correctly**: YourStudentID_assignment.ipynb

---

## ‚è≠Ô∏è What Happens Next

After submission:
1. ‚úÖ Your notebook will be **auto-graded** (executes automatically)
2. ‚úÖ You'll receive a **verification quiz** (10 questions, 5 minutes)
3. ‚úÖ Quiz questions based on **YOUR specific results**
4. ‚úÖ Final score released after quiz validation

**The verification quiz ensures you actually ran your code!**

---

**Good luck! üöÄ**