# HPC as solutions for AI: Pytorch

<p style='text-align: justify;'>
In this section, it will be shown how to optimize Pytorch models, accelerating training and execution using GPUs.
</p>    

The principal gols are:

* **Gain Proficiency** in understanding the architecture and functionality of deep learning models for image classification, specifically using the CIFAR-10 dataset.
* **Utilize** your GPU and the PyTorch library for the first time to accelerate the training of image classification models.
* **Familiarize** Yourself with the CIFAR-10 dataset by classifying its various classes.
* **Evaluate** and **Compare** the performance of your models on GPU and CPU to understand the benefits of GPU acceleration in AI tasks.

## The problem: resource-Intensive training and model scalability

<p style='text-align: justify;'>
As AI research progresses, deep neural networks have become a key method for tasks like image generation and language translation. However, the challenge of resource-intensive training arises as networks become more complex and demanding in performance.
</p>

<p style='text-align: justify;'>
As research and development in artificial intelligence have made remarkable strides in recent decades, driven largely by the use of deep neural networks. These networks are computational structures loosely inspired by the functioning of the human brain and are particularly well-suited for tasks that involve large volumes of data, such as pattern recognition in images, natural language processing, and more.
</p>

<p style='text-align: justify;'>
However, as the problems being addressed become more complex and performance demands increase, the need for computational resources also grows exponentially.Additionally, the scalability of these models becomes a concern as they grow in size and complexity. Maintaining and optimizing constantly expanding AI models becomes a challenging task for the research and development community.
</p>

## The solution: GPUs and Intel® PyTorch

<p style='text-align: justify;'>
Graphics Processing Units (GPUs), originally designed for gaming graphics,have emerged as indispensable tools for accelerating complex computations. Their parallel processing capabilities have revolutionized deep learning by reducing training times. Additionally, GPU clusters can be scaled horizontally for large-scale projects, enhancing performance and cost-efficiency, in that way, it addresses deep learning's computational bottlenecks, enabling faster training, real-time inference, scalability, and cost-effectiveness, driving innovation across various fields and promising a pivotal role in the future of AI. 
</p>
<p style='text-align: justify;'>
Using Libraries like Intel® PyTorch, a popular machine learning and AI framework, offers a flexible interface for designing, training, and evaluating neural networks using GPU, especially when harnessed with the computational prowess of GPUs (Graphics Processing Units).
</p>
<p style='text-align: justify;'>
Furthermore, Intel® PyTorch is well-equipped to take full advantage of the optimizations and hardware support provided by Intel® processors and Intel® GPUs. This synergy results in an even more efficient and performance-oriented machine learning experience. It enables practitioners to extract maximum computational throughput from their hardware infrastructure.
</p>

##  ☆ Challenge: Zoo breakout!☆ 

Recently, there was an unexpected incident at the local zoo, **Orange Grove Zoo**: all the animals escaped from their enclosures and are now roaming freely in the zoo. To deal with this situation, we need your help to locate and classify the escaped animals, distinguishing each animal class and identifying possible vehicles that may be in the same environment.

You have been assigned as the person responsible for developing a computer vision system capable of identifying and classifying the escaped animals, as well as identifying the presence of vehicles in the images. For this challenge, we will use the CIFAR-10 dataset and the PyTorch library to train a deep learning model.

CIFAR-10 and CIFAR-100 datasets provide a comprehensive collection of 32x32 pixel images, grouped into 10 and 100 distinct classes, respectively.

- [CIFAR-10 Dataset](https://www.cs.toronto.edu/~kriz/cifar.html): CIFAR-10 consists of 60,000 images, each belonging to one of the ten classes: airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. This dataset offers a diverse set of images representing everyday objects.

- [CIFAR-100 Dataset](https://www.cs.toronto.edu/~kriz/cifar.html): CIFAR-100 expands upon the CIFAR-10 concept, containing 60,000 images as well. However, it introduces a more challenging task by categorizing images into 100 classes. These classes include various subcategories such as fruits, animals, vehicles, and more.

a) **Create** deep neural network model utilizing the PyTorch library for the classification of animals and vehicles on a CPU and on a GPU using CIFAR-10 dataset.

b) **Conduct** a comparative analysis between models trained on CPU and GPU to highlight disparities in results.

c) Now, use the CIFAR-100 dataset for the classification of animals and vehicles on a GPU. Would it be a good decision to use a GPU or a CPU for the training process?

### ☆ Solution ☆

a) First, import all libraries:

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import time

Now, we have to define the processing device that the neural network will run on:

In [None]:
device = torch.device("cuda:0")

As part of the data preparation process, we create a ```transforms``` object to apply specific transformations to the data. These transformations are commonly used in training datasets to enhance data diversity and ready images for utilization in a deep learning model, such as a Convolutional Neural Network (CNN). I will provide a detailed explanation of each component:

In [None]:
transform = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.RandomCrop(32, padding=4),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

Following that, download the CIFAR10 dataset and load it into the code. Define the neural network as we have done in previous notebooks, and remember to move this network instance to the previously defined device.

In [None]:
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True, num_workers=4)

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.fc1 = nn.Linear(128 * 8 * 8, 512)
        self.fc2 = nn.Linear(512, 10)

    def forward(self, x):
        x = torch.relu(self.conv1(x))
        x = torch.max_pool2d(x, 2)
        x = torch.relu(self.conv2(x))
        x = torch.max_pool2d(x, 2)
        x = x.view(-1, 128 * 8 * 8)
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

net = Net()
net.to(device)

After that, train your neural network, as we did in previous notebooks:

In [None]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9)

gpu_start_time = time.time()

for epoch in range(10):  
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    print(f'Epoch {epoch+1}, Loss: {running_loss / len(trainloader)}')

gpu_end_time = time.time()

print(f'GPU Training time: {gpu_end_time - gpu_start_time}')

torch.save(net.state_dict(), 'cifar10_gpu_model.pth')

Repeat the same process, now with the cpu as the device:


In [None]:
device = torch.device("cpu")
net.to(device)

cpu_start_time = time.time()

for epoch in range(10):  
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    print(f'Epoch {epoch+1}, Loss: {running_loss / len(trainloader)}')

cpu_end_time = time.time()

print(f'CPU Training time: {cpu_end_time - cpu_start_time}')

torch.save(net.state_dict(), 'cifar10_cpu_model.pth')

c)  Using a CPU to train a neural network with this amount of data would be unfeasible due to the lengthy training time. Now, here is the modified training code to use CIFAR-100:

In [None]:
device = torch.device("cuda:0")

transform = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.RandomCrop(32, padding=4),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

trainset = torchvision.datasets.CIFAR100(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True, num_workers=4)

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.fc1 = nn.Linear(128 * 8 * 8, 512)
        self.fc2 = nn.Linear(512, 100) 
    def forward(self, x):
        x = torch.relu(self.conv1(x))
        x = torch.max_pool2d(x, 2)
        x = torch.relu(self.conv2(x))
        x = torch.max_pool2d(x, 2)
        x = x.view(-1, 128 * 8 * 8)
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

net = Net()
net.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9)

for epoch in range(10):  
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    print(f'Epoch {epoch+1}, Loss: {running_loss / len(trainloader)}')

print('Training Finished.')

torch.save(net.state_dict(), 'cifar100_model.pth')

## Summary

<p style='text-align: justify;'>
We explored training neural networks with PyTorch, comparing CPU and GPU performance on the CIFAR-10 and CIFAR-100 datasets. We gained insights into how hardware affects training efficiency. GPU usage significantly improved training speed due to its parallel computing optimization, especially beneficial for deep learning with large data and complex models.
</p>
<p style='text-align: justify;'>
This practice emphasized the importance of hardware choice in neural network training. Deep learning professionals should make informed decisions to optimize computing resources for efficiency.
</p>
<p style='text-align: justify;'>
In summary, this experiment compared PyTorch neural network training on CPU and GPU with CIFAR-10. Hardware choice significantly impacts training efficiency, highlighting the need for careful consideration in deep learning experiments.
</p>

## Clear the memory

Before moving on, please execute the following cell to clear up the CPU memory. This is required to move on to the next notebook.

In [None]:
#import IPython
#app = IPython.Application.instance()
#app.kernel.do_shutdown(True)

## Next

Congratulations, you have completed second part the learning objectives of this part of the course! As a final exercise, successfully complete an applied problem in the assessment in [_04-hpc-simulations-assessment.ipynb_](04-hpc-simulations-assessment.ipynb).