# Usage Guide: Automating PyTorch Loss Functions with Rubick on CIFAR-10 dataset

### Importing Libraries

Here we import `Rubick` library

In [6]:
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.optim as optim
import time
from rubick_v6 import Rubick

### Data Preparation

In [2]:
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=100,
                                         shuffle=False, num_workers=2)

100%|████████████████████████████████████████| 170M/170M [00:01<00:00, 98.3MB/s]


### Defining simple model architecture

In [3]:
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.net = nn.Sequential(
            nn.Conv2d(3, 32, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),

            nn.Conv2d(32, 64, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),

            nn.Flatten(),
            nn.Linear(64 * 8 * 8, 256),
            nn.ReLU(),
            nn.Linear(256, 10)
        )

    def forward(self, x):
        return self.net(x)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = SimpleCNN().to(device)

### Generating loss function using Rubick

We have defined the neural network architecture and prepared the data in the above code cells. Now we have to define the loss function based on which the model will be evaluated on in the training process.

We choose the `CodeLlama-7b-Instruct-hf` model as it performs well in coding and also while following instructions. 

As you can see in the output below, the model fails to generate a valid loss function in the first loop - the loss function fails the unit test on all three attempts. 

In the second loop, the model generates a valid loss function in the first attempt

In [7]:
model_id = "codellama/CodeLlama-7b-Instruct-hf"
token = "NONE"
prompt = "The task is to create a loss function for a 10-class image classification task on the CIFAR-10 dataset"

generator = Rubick(model_id, token, prompt)
generator.process_start()

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Starting loss function generation process
Here is initial code generated for loop:  0

Loss function code:
  import torch
import torch.nn as nn
import torch.nn.functional as F

class AutoLoss(nn.Module):
    def __init__(self):
        super(AutoLoss, self).__init__()

    def forward(self, x, y):
        return F.cross_entropy(x, y)

Test function code:
 from temp_code import AutoLoss

import unittest
import torch
import torch.nn as nn
import torch.nn.functional as F

class AutoLossTest(unittest.TestCase):
    def test_loss_function(self):
        loss_fn = AutoLoss()
        x = torch.randn(5, 3)
        y = torch.randint(0, 3, (5,))
        loss = loss_fn(x, y)
        self.assertTrue(loss.requires_grad)

if __name__ == '__main__':
    unittest.main()

[Attempt 1/3] Status: False
Error Output:
test_loss_function (temp_test.AutoLossTest) ... FAIL

FAIL: test_loss_function (temp_test.AutoLossTest)
----------------------------------------------------------------------
Traceback (most r

### Setting AutoLoss as the loss function

Here we assign the generated loss function `AutoLoss` to the variable `criterion` which will then be used for the rest of the training phase.

In [8]:
criterion = generator.AutoLoss().to(device)

optimizer = optim.Adam(model.parameters(), lr=0.001)

### Training loop

In [9]:
for epoch in range(5):  # 5 epochs
    running_loss = 0.0
    model.train()

    for i, (inputs, labels) in enumerate(trainloader):
        inputs, labels = inputs.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        if i % 100 == 99:
            print(f"[{epoch + 1}, {i + 1}] loss: {running_loss / 100:.3f}")
            running_loss = 0.0

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[1, 100] loss: 1.798
[1, 200] loss: 1.483
[1, 300] loss: 1.329
[1, 400] loss: 1.243
[1, 500] loss: 1.205
[1, 600] loss: 1.116
[1, 700] loss: 1.078


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[2, 100] loss: 0.973
[2, 200] loss: 0.935
[2, 300] loss: 0.929
[2, 400] loss: 0.910
[2, 500] loss: 0.912
[2, 600] loss: 0.869
[2, 700] loss: 0.863


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[3, 100] loss: 0.759
[3, 200] loss: 0.760
[3, 300] loss: 0.759
[3, 400] loss: 0.737
[3, 500] loss: 0.744
[3, 600] loss: 0.759
[3, 700] loss: 0.738


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[4, 100] loss: 0.621
[4, 200] loss: 0.587
[4, 300] loss: 0.630
[4, 400] loss: 0.613
[4, 500] loss: 0.619
[4, 600] loss: 0.634
[4, 700] loss: 0.607


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[5, 100] loss: 0.466
[5, 200] loss: 0.477
[5, 300] loss: 0.489
[5, 400] loss: 0.492
[5, 500] loss: 0.494
[5, 600] loss: 0.478
[5, 700] loss: 0.514


### Model Evaluation

In [10]:
correct = 0
total = 0
model.eval()
with torch.no_grad():
    for data in testloader:
        images, labels = data
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f"Accuracy on CIFAR-10 test images: {100 * correct / total:.2f}%")

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Accuracy on CIFAR-10 test images: 72.22%
