<a href="https://colab.research.google.com/github/gpasxos/large-scale-optimization/blob/main/Quiz_2026.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# MNIST Quiz

You are asked to provide a class model that implements an ML model that trains on MNIST and can predict the right logit with high accuracy.

**Rules**

1. **Imports**: Only `numpy` may be imported
2. **No pre-computation**: Training must happen in `fit()` — no hardcoded weights
3. **Time limit**: Budget is 120s, hard stop at 180s
4. **Accuracy**: Minimum 97% required to pass

---

## Model Requirements

Your model must be a class with two methods:

| Method | Input | Output | Purpose |
|--------|-------|--------|---------|
| `model.fit(X, y)` | `X`: numpy array of shape `(n_samples, 784)` <br> `y`: numpy array of shape `(n_samples,)` | None | Train the interal ML model
| `model.predict(X)` | `X`: numpy array of shape `(n_samples, 784)` | numpy array of shape `(n_samples,)` | Output predictions which are used for accuracy evaluations

**Data Format**
- `X` contains flattened 28×28 grayscale images, normalized to [0, 1]
- `y` contains integer labels from 0 to 9

---

## Example Model Structure

```python
import numpy as np

class MyModel:
    def __init__(self):
        # Initialize hyperparameters (NOT pre-trained weights)
        pass
    
    def fit(self, X, y):
        # X: (n_samples, 784) - training images
        # y: (n_samples,) - labels 0-9
        # Train your model here
        pass
    
    def predict(self, X):
        # X: (n_samples, 784) - test images
        # Returns: (n_samples,) - predicted labels 0-9
        pass


# Tester Class

Below I provide a tester class which serves two purposes: (1) during your development work, it will help you self-evaluate your model and improve it, (2) during assessment, it will be used to evaluate your final submission.

**Notice!** The code runs differently for each seed, by breaking the dataset in a unique way per seed. At assessment time, I will use an unknown seed, therefore, you should aim to create a model that generalizes so that it can perform well with any seed.

**Usage of tester class**

```
tester = MNISTTester()
results = tester.test(your_model, seed=42, min_accuracy=0.85)
```

In [20]:
import numpy as np
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
import time

class MNISTTester:
    """
    MNIST Model Tester
    - Students develop with seed=42
    - Teacher grades with a different seed
    """

    def __init__(self):
        self._X = None
        self._y = None

    def _load_data(self):
        if self._X is None:
            print("Loading MNIST...")
            mnist = fetch_openml('mnist_784', version=1, as_frame=False, parser='auto')
            self._X = mnist.data / 255.0
            self._y = mnist.target.astype(int)
        return self._X, self._y

    def test(self, model, seed=42, min_accuracy=0.80, subset=None, max_time=500):
        """
        Test a model on MNIST.

        Parameters:
        -----------
        model : object with fit(X, y) and predict(X) methods
        seed : int - controls train/test split
        min_accuracy : float - minimum accuracy to pass
        subset : int or None - use smaller dataset for faster testing
        max_time : float - maximum allowed time in seconds (default 5 min)
        """
        X, y = self._load_data()

        # Optional subset for faster iteration
        if subset:
            np.random.seed(seed)
            idx = np.random.choice(len(X), subset, replace=False)
            X, y = X[idx], y[idx]

        # Split data based on seed
        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=0.15, random_state=seed, stratify=y
        )

        print(f"Seed: {seed} | Train: {len(X_train)} | Test: {len(X_test)} | Time limit: {max_time}s")

        start = time.time()

        # Train
        model.fit(X_train, y_train)
        train_time = time.time() - start

        if train_time > max_time:
            print(f"FAILED: Training took {train_time:.1f}s (limit: {max_time}s)")
            return {'accuracy': 0, 'passed': False, 'time': train_time, 'seed': seed}

        # Predict
        y_pred = np.array(model.predict(X_test)).flatten()
        total_time = time.time() - start

        if total_time > max_time:
            print(f"FAILED: Total time {total_time:.1f}s exceeded limit {max_time}s")
            return {'accuracy': 0, 'passed': False, 'time': total_time, 'seed': seed}

        # Evaluate
        accuracy = np.mean(y_pred == y_test)
        passed = accuracy >= min_accuracy and total_time <= max_time

        print(f"Accuracy: {accuracy*100:.2f}% | Time: {total_time:.1f}s | {'PASSED' if passed else 'FAILED'}")

        return {'accuracy': accuracy, 'passed': passed, 'time': total_time, 'seed': seed}

## Example of a simple submission

Here we provide an example of what the student should deliver. **Important:** The submission consists of a single block of code that can be pasted in this jupyter file.

This particular example builds a simple softmax regression with low performance, but the idea is to explain how your code should look like.

Make sure your code runs in no more than 2min and with accuracy at least 97%. To get top grade, try to beat 98.5%. When you will submit your code, I will simply copy-paste your code in the space below and execute it. Above, I will have chosen a different seed.

In [14]:
import numpy as np

class SoftmaxRegression:
    """
    Multi-class logistic regression using gradient descent.
    Simple linear model: no hidden layers, just input -> output.
    """

    def __init__(self, learning_rate=0.1, n_iterations=100):
        self.lr = learning_rate
        self.n_iterations = n_iterations
        self.W = None
        self.b = None

    def fit(self, X, y):
        n_samples, n_features = X.shape
        n_classes = 10

        # Initialize weights
        self.W = np.zeros((n_features, n_classes))
        self.b = np.zeros(n_classes)

        # One-hot encode labels
        y_onehot = np.zeros((n_samples, n_classes))
        y_onehot[np.arange(n_samples), y] = 1

        # Gradient descent
        for i in range(self.n_iterations):
            # Forward pass
            scores = X @ self.W + self.b
            probs = self._softmax(scores)

            # Gradients
            error = probs - y_onehot
            grad_W = (X.T @ error) / n_samples
            grad_b = np.mean(error, axis=0)

            # Update weights
            self.W -= self.lr * grad_W
            self.b -= self.lr * grad_b

    def predict(self, X):
        scores = X @ self.W + self.b
        return np.argmax(scores, axis=1)

    def _softmax(self, z):
        exp_z = np.exp(z - np.max(z, axis=1, keepdims=True))
        return exp_z / np.sum(exp_z, axis=1, keepdims=True)


tester = MNISTTester()
model = SoftmaxRegression(learning_rate=0.1, n_iterations=200)
results = tester.test(model, seed=42, min_accuracy=0.85)

Loading MNIST...
Seed: 42 | Train: 59500 | Test: 10500 | Time limit: 300s
Accuracy: 87.83% | Time: 75.3s | PASSED


It is important to create a model that generalizes, because the evaluation will happen with a seed that is unknown to you, which impacts the train-test process as shown below.

In [12]:
import numpy as np
from sklearn.model_selection import train_test_split

# Create a small example: 20 samples
data_indices = np.arange(20)

print("="*60)
print("How different seeds create different train/test splits")
print("="*60)
print("\nSample indices: 0-19")
print("█ = Train, ░ = Test\n")

for seed in [42, 123, 2024, 7]:
    train_idx, test_idx = train_test_split(
        data_indices, test_size=0.3, random_state=seed
    )

    # Create visual representation
    visual = ""
    for i in range(20):
        if i in train_idx:
            visual += "█"
        else:
            visual += "░"

    print(f"Seed {seed:4d}: {visual}  (Train: {sorted(test_idx)})")

print("\n" + "="*60)
print("Same seed = Same split (reproducible)")
print("="*60 + "\n")

for seed in [42, 42, 42]:
    train_idx, test_idx = train_test_split(
        data_indices, test_size=0.3, random_state=seed
    )
    visual = "".join(["█" if i in train_idx else "░" for i in range(20)])
    print(f"Seed {seed:4d}: {visual}")

How different seeds create different train/test splits

Sample indices: 0-19
█ = Train, ░ = Test

Seed   42: ░░███░██░██████░█░██  (Train: [np.int64(0), np.int64(1), np.int64(5), np.int64(8), np.int64(15), np.int64(17)])
Seed  123: ████░░█░░█████░██░██  (Train: [np.int64(4), np.int64(5), np.int64(7), np.int64(8), np.int64(14), np.int64(17)])
Seed 2024: ██████░████░░░█░███░  (Train: [np.int64(6), np.int64(11), np.int64(12), np.int64(13), np.int64(15), np.int64(19)])
Seed    7: ░░░██░█████░█████░██  (Train: [np.int64(0), np.int64(1), np.int64(2), np.int64(5), np.int64(11), np.int64(17)])

Same seed = Same split (reproducible)

Seed   42: ░░███░██░██████░█░██
Seed   42: ░░███░██░██████░█░██
Seed   42: ░░███░██░██████░█░██
