In [None]:
import random
import numpy as np
import matplotlib.pyplot as plt
!pip install tensorflow

# ```CIFAR-10``` Classifcation ! (This notebook is under construction...

## ```CIFAR-10``` consists of $60,000, 32x32$ color images in $10$ different classes, with $6,000$ images per class. The dataset is divided into $50,000$ training images and $10,000$ test images. The classes don't overlap. We will classify these images using a version of ![micrograd](https://github.com/mattsankner/micrograd) neural network built in the repo at the link!

### $10$ Classes in ```CIFAR-10```:
- Airplane
- Automobile
- Bird
- Cat
- Deer
- Dog
- Frog
- Horse
- Ship
- Truck

### Loading/Preprocessing CIFAR-10 Dataset:

In [2]:
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical

2024-07-24 22:51:04.279476: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [3]:
# Load CIFAR-10 dataset
(X_train, y_train), (X_test, y_test) = cifar10.load_data()

In [4]:
# Normalize the pixel values
X_train, X_test = X_train / 255.0, X_test / 255.0

In [5]:
# Flatten the input images
X_train = X_train.reshape(X_train.shape[0], -1)
X_test = X_test.reshape(X_test.shape[0], -1)

Below, we use ```one-hot encoding``` to convert categorical integer labels into binary vectors. ```to_categorical``` is a function used to convert integer labels into a one-hot encoded format.

Example:
Assume y_train contains the following class labels for $3$ images (instead of 10):

- ```y_train``` = [0,1,2]

Applying ```to_categorical(y_train,3)``` converts the labels into a one-hot encoded format:
```python y_train =
[
[1,0,0]   #class 0
[0,1,0]   #class 1
[0,0,1]   #class 2
]   
```

The only difference is we call ```to_categorical(y_train,10)``` !

In [6]:
# Convert labels to categorical
y_train, y_test = to_categorical(y_train, 10), to_categorical(y_test, 10)

### Defining the Multi-Layer Perceptron Model:

In [7]:
# Define the necessary imports for micrograd
from micrograd.engine import Value
from micrograd.mlp import Neuron, Layer, MLP

In [8]:
# Define a more complex MLP model for CIFAR-10
class MLPComplex(MLP):
    def __init__(self, nin, nouts):
        super().__init__(nin, nouts)
        self.dropout_rate = 0.2  # Adjusted dropout rate
        print(f"Initialized MLPComplex with {nin} inputs and {nouts} neurons in each layer")

    def forward(self, x):
        out = x
        for layer in self.layers:
            out = layer(out)
            if layer != self.layers[-1]:  # Don't apply ReLU activation and dropout to the output (last) layer
                out = [o.relu() for o in out]
                out = self.apply_dropout(out)
        return out
    
    def apply_dropout(self, activations):
        if self.dropout_rate > 0:
            for i in range(len(activations)):
                if random.random() < self.dropout_rate:
                    activations[i] = Value(0.0)
            print(f"Applied dropout with rate {self.dropout_rate}, {sum(1 for a in activations if a.data == 0)} neurons dropped")
        return activations

## MLP initialization:

```python
model = MLPComplex(3072, [1024, 512, 256, 10])
```

### Input Layer

- $3072$: Each ```CIFAR-10``` image is $32x32$ pixels with $3$ color channels. Flattening this image results in a vector of size $32 x 32 x 3 = 3072$.  Therefore, the input layer has $3072$ neurons.
  
### Hidden Layers
- $1024, 512, 256$: These are the numbers of neurons in the hidden layers. This was an empirical choice, but is a common way of doing it for ```DNN's```. Larger layers at the beginning generally help in capturing more complex patterns, and reducing size in subsequent layers helps in learning more abstract representations. 

### Output Layer
- $10$: the output layer has $10$ neurons corresponding to the $10$ classes in ```CIFAR-10```. Each neuron represents the probability of the input image belonging to a particular class.

In [9]:
# Initialize the model
model = MLPComplex(3072, [1024, 512, 256, 10])  # CIFAR-10 images are 32x32x3 = 3072 pixels
print("Parameter count:", len(model.parameters()))

Initialized MLPComplex with 3072 inputs and [1024, 512, 256, 10] neurons in each layer
Parameter count: 3805450


### Defining the Loss Function:

Here, as opposed to previous classification projects, we use ```categorical_cross_entropy()``` function instead of ```ReLU()``` to calculate the ```data_loss```. 
- ```target``` is this one-hot encoded vector representing the true class labels.
- ```output``` is the vector of predicted probabilities output by the model for each class
- ```target[i] * output[i].log()```: multiplies the target value (which is 1 for the correct class and 0 for others) by the logarithm of the predicted probability for that class.

Note: we negate the sum to get the ```cross entropy loss```, which we aim to minimize.

Example:

``` python
target = [0, 0, 1]
output = [Value(0.2), Value(0.3), Value(0.5)]
loss = -sum([target[i] * output[i].log() for i in range(len(target))])
# loss = -(0*log(0.2) + 0*log(0.3) + 1*log(0.5))
# loss = -log(0.5)
```

In the loss function, we then compute the cross entropy loss for each pair of the true label and predicted score in the batch. This results in a list of individual losses

In [10]:
# Define the categorical cross-entropy loss function
def categorical_cross_entropy(target, output):
    return -sum([target[i] * output[i].log() for i in range(len(target))])

In [11]:
# Loss function
def loss(batch_size=None):
    if batch_size is None:
        Xb, yb = X_train, y_train
    else:
        ri = np.random.permutation(X_train.shape[0])[:batch_size]
        Xb, yb = X_train[ri], y_train[ri]

    inputs = [list(map(Value, xrow)) for xrow in Xb]
    scores = [model.forward(x) for x in inputs]

    #yb is a batch of true labels (one-hot encoded)
    losses = [categorical_cross_entropy(yb[i], scores[i]) for i in range(len(yb))]
    
    data_loss = sum(losses) * (1.0 / len(losses))

    alpha = 1e-4
    reg_loss = alpha * sum((p * p for p in model.parameters()))
    total_loss = data_loss + reg_loss

    accuracy = [np.argmax(yb[i]) == np.argmax([s.data for s in scores[i]]) for i in range(len(yb))]
    return total_loss, sum(accuracy) / len(accuracy)

In [None]:
# Calculate initial loss and accuracy
total_loss, acc = loss()
print("Total loss:", total_loss.data, ", Accuracy:", acc * 100, "%")

### Training Loop with Backward Pass and Parameter Update

In [None]:
# Optimization (training loop)
for k in range(100):
    total_loss, acc = loss(batch_size=32)
    model.zero_grad()
    total_loss.backward()
    learning_rate = 0.01 - 0.0001 * k
    for p in model.parameters():
        p.data -= learning_rate * p.grad
    
    if k % 10 == 0:
        print(f"Step {k + 1} loss {total_loss.data}, accuracy {acc * 100}%")

### Evaluation on Test Data:

In [None]:
# Evaluate on test data
def evaluate():
    inputs = [list(map(Value, xrow)) for xrow in X_test]
    scores = [model.forward(x) for x in inputs]
    accuracy = [np.argmax(y_test[i]) == np.argmax([s.data for s in scores[i]]) for i in range(len(y_test))]
    return sum(accuracy) / len(accuracy)

In [None]:
test_acc = evaluate()
print(f"Test accuracy: {test_acc * 100}%")

### Visualizing Predictions:

In [None]:
# Visualize a few test images along with their predicted and true labels
fig, axes = plt.subplots(3, 3, figsize=(10, 10))
fig.suptitle('CIFAR-10 Predictions')
for i, ax in enumerate(axes.flat):
    img = X_test[i].reshape(32, 32, 3)
    true_label = np.argmax(y_test[i])
    pred_label = np.argmax([s.data for s in model.forward(list(map(Value, X_test[i])))])
    ax.imshow(img)
    ax.set_title(f'True: {true_label}, Pred: {pred_label}')
    ax.axis('off')
plt.show()

### Conclusion:

This complete code leverages the micrograd library to build and train an MLP model for CIFAR-10 image classification, including data preprocessing, model definition, training, evaluation, and visualization.