# High-performance computing applied in AI applications

<p style='text-align: justify;'>
High-performance computing, known as HPC, is a field of modern computing whose goal is to solve computational problems of high complexity and large volumes of data by operating by dividing complex problems into smaller parts that are processed simultaneously by several processors, accelerating the resolution time. HPC enables scientists, engineers, and researchers to perform highly detailed simulations, massive data analysis, and precise modeling that would be impractical or unachievable using conventional systems.
</p>

<p style='text-align: justify;'>
HPC systems are designed to handle large volumes of data and perform intensive calculations in a fraction of the time it would take on conventional computers. An HPC comprises one or several supercomputers of interconnected high-performance processors, large amounts of memory, and fast storage to handle intensive workloads.
</p>

<div style="text-align:center">
<img src="./images/figure01_ponte_vecchio.jpg" style="width: 500px;">
</div>



## ⊗ **Why use parallel computing?**

<p style='text-align: justify;'>   
Parallel computing is widely used in many areas such as scientific simulations, graphics rendering, big data analysis, machine learning, artificial intelligence, image processing and many more. There are several approaches to implementing parallel computing, of which we can include data parallelism, task parallelism, instruction parallelism, bit-level parallelism, thread-level parallelism, among others.
</p>

<div style="text-align:center">
<img src="./images/figure02_parallel_computing.png" style="width: 1000px;">
</div>

<p style='text-align: justify;'>   
One of the main gains parallel computing provides is the remarkable performance acceleration. You can get more work done in less time by running multiple tasks simultaneously. This aspect is particularly advantageous for solving complex problems that often involve intensive calculations or the analysis of vast data sets.
Parallel computing is essential for dealing with the growing data generated in our digital world. In machine learning and artificial intelligence, training complex models in parallel is critical for creating effective AI systems in areas such as pattern recognition, natural language processing, and computer vision.
</p>

<p style='text-align: justify;'> 
HPC and parallel computing can be used in several scenarios. Let's meet some of them below.
</p>

## ⊗ **HPC applied in AI**

<p style='text-align: justify;'> 
HPC plays a key role when it comes to applications using Artificial Intelligence, since a large computational power is needed to be able to train increasingly complex AI models and perform analyzes on massive data sets.
</p>

<div style="text-align:center">
<img src="./images/figure03_aurora_supercomputing.jpg" style="width: 500px;">
</div>
    
<p style='text-align: justify;'> 
A notable example is Intel's Aurora supercomputer, which plays a key role in research areas as diverse as neuroscience, aerospace simulation, universe exploration, and artificial intelligence. These surveys require an extremely high processing capacity,
making the application of structures such as HPC essential. Conducting research in this direction requires the use of computational algorithms capable of dealing with large volumes of data and that also have resources for implementing artificial intelligence solutions. For example, in neuroscience research,
Aurora can help simulate complex neural networks and analyze brain data at scale, leading to advances in understanding neurological disorders and developing more effective treatments.
</p>

<p style='text-align: justify;'>
In aerospace simulations and exploration of the universe, Aurora allows the modeling of complex phenomena, such as the behavior of planetary systems and aircraft flight dynamics, contributing to space exploration and the development of more advanced technologies. In summary, the intersection between HPC and life science research, space and AI drives scientific discoveries and technological advances that have the potential to transform our lives and our understanding of the world around us.
</p>  

## ⊗ **HPC uses cases**

<p style='text-align: justify;'> 
HPC is often used in fields where processing requirements are extraordinarily high and exceed the capabilities of conventional computer systems. Here are some examples of HPC use cases:
</p>
    
<div style="text-align:center">
<img src="./images/figure04_hpc_applications.png" style="width: 500px;">
</div>

* **Machine learning and artificial intelligence:** training complex machine learning models requires a lot of computing power. HPC allows you to train models faster and handle larger datasets, resulting in advances in AI, pattern recognition and data analysis;

* **Biomedical research:** HPC accelerates the virtual screening of molecules, assessing how they interact with target proteins. This streamlines the drug discovery process, saving time and resources;

* **Aerodynamics and flight simulation:** the aerospace industry uses HPC to simulate the behavior of aircraft,improve wing design, optimize fuel efficiency and study aerodynamics;

* **Exploration of natural resources and petroleum:** the simulation of oil and gas reservoirs, as well as the exploration of mineral resources, require complex models and intensive calculations. HPC helps make informed decisions about locating and exploiting these resources;

* **Particle physics:** particle physics research requires HPC to analyze data generated by particle accelerators such as the LHC (Large Hadron Collider);

* **Scientific research and simulations:** HPC allows the modeling of natural phenomena and processes that would be almost impossible to observe experimentally. For example, simulating particle interactions in a particle accelerator or simulating long-term weather processes.

<p style='text-align: justify;'> 
These are just a few examples of the many use cases for HPC. In general, it plays a crucial role in areas that need advanced computational capacity to solve complex problems, often driving innovation and scientific and technological progress.
</p>

## **HPC as solutions for AI**

<p style='text-align: justify;'> 
When we want to deal with a large volume of information in artificial intelligence applications, aiming to substantially reduce the time required to solve the problem, it is essential that specific software tools are implemented for this purpose. Let's now explore two of the most prominent libraries: <b>TensorFlow</b> and <b>PyTorch</b>. These tools play a central role in creating and training AI models in processing-intensive environments, let's get to know each one of them.
</p>

### ⊗ **Tensorflow**

<p style='text-align: justify;'>
<a href='https://www.tensorflow.org/api_docs' target='_blank'><em>Tensorflow</em></a> is an open-source library focused on high-performance numerical computing, especially suitable for training and deploying machine learning and deep learning models. It is used in various applications, from computer vision to natural language processing.
</p>
<p style='text-align: justify;'>
In our algorithms, we will be using a package called Keras, which is nothing more than a high-level API for building and training neural networks. Its main feature is to simplify and streamline the development of deep learning models.
</p>  
<p style='text-align: justify;'>
Let's see how we can access and utilize our GPU to train a simple neural network using TensorFlow.
</p>

#### **Checking GPU availability**

In [1]:
import tensorflow as tf

# Check for available GPUs
gpus = tf.config.experimental.list_physical_devices('GPU')

if gpus:
    # Configure GPU memory allocation dynamically
    for gpu in gpus:
        tf.config.experimental.set_memory_growth(gpu, True)

    # Display information about available GPU
    for i, gpu in enumerate(gpus):
        print(f"GPU {i + 1}: {gpu.name}")
else:
    print("No GPU available. Using CPU.")


No GPU available. Using CPU.


####  **Creating a neural network with Tensorflow**

In [15]:
import tensorflow as tf
from tensorflow import keras
import numpy as np

# Set the training data
X_train = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y_train = np.array([0, 1, 1, 0])

# create the model
model = keras.Sequential([
    keras.layers.Dense(units=1, input_dim=2, activation='sigmoid')
])

# compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model on the GPU
with tf.device('/GPU:0'):
    model.fit(X_train, y_train, epochs=20)

# rate the model
accuracy = model.evaluate(X_train, y_train)[1]
print(f'Model accuracy: {accuracy}')


Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Model accuracy: 0.75


### ⊗ **Pytorch**

<p style='text-align: justify;'> 
<a href='https://pytorch.org/docs/stable/index.html' target='_blank'><em>Pytorch</em></a> is an open source machine learning library known for its flexibility and ease of use, making it a popular choice among deep learning researchers and developers. Let's see the same code we did in tensorflow only now with Pytorch.
</p>

####  **Checking GPU availability**

In [18]:
import torch

# Check if a GPU is available
if torch.cuda.is_available():
    # Get the number of available GPUs
    num_gpus = torch.cuda.device_count()

    # Display information about available GPUs
    for i in range(num_gpus):
        print(f"GPU {i}: {torch.cuda.get_device_name(i)}")
else:
    print("No GPU available. Using CPU.")


No GPU available. Using CPU.


####  **Creating a neural network with Pytorch**

In [9]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np

# Define training data as tensors
X_train = torch.tensor([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=torch.float32)
y_train = torch.tensor([0, 1, 1, 0], dtype=torch.float32).view(-1, 1)

# create the model
model = nn.Sequential(
    nn.Linear(2, 1),
    nn.Sigmoid()
)

# Moving model and data to GPU if available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model.to(device)
X_train, y_train = X_train.to(device), y_train.to(device)

# Define the loss function and optimizer
criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

# train the model
for epoch in range(1000):
    optimizer.zero_grad()
    outputs = model(X_train)
    loss = criterion(outputs, y_train)
    loss.backward()
    optimizer.step()

# rate the model
with torch.no_grad():
    predicted = model(X_train)
    predicted = (predicted > 0.5).float()
    accuracy = (predicted == y_train).sum().item() / len(y_train)
    print(f'Model accuracy: {accuracy}')


Model accuracy: 0.5


##  ☆ Challenge: Zoo breakout!☆ 

<p style='text-align: justify;'> 
    Recently, there was an unexpected incident at the local zoo, <b>Orange Grove Zoo</b>: all the animals escaped from their enclosures and are now roaming freely in the zoo. To deal with this situation, we need your help to locate and classify the escaped animals, distinguishing each animal class and identifying possible vehicles that may be in the same environment.
</p>
<p style='text-align: justify;'> 
You have been assigned as the person responsible for developing a computer vision system capable of identifying and classifying the escaped animals, as well as identifying the presence of vehicles in the images. For this challenge, we will use the CIFAR-10 dataset and the tensorflow library to train a deep learning model.
</p>
CIFAR-10  datasets provide a comprehensive collection of 32x32 pixel images, grouped into 10 distinct classes.

- [CIFAR-10 Dataset](https://www.cs.toronto.edu/~kriz/cifar.html): CIFAR-10 consists of 60,000 images, each belonging to one of the ten classes: airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. This dataset offers a diverse set of images representing everyday objects.

a) **Create** deep neural network model utilizing the tensorflow library for the classification of animals and vehicles on a CPU using CIFAR-10 dataset,

b) **Measure** the algorithm execution time for CIFAR-10 with CPU.

c) **Justify** why it is more interesting to use tools like tensorflow and pytorch in conjunction with a GPU instead of a CPU?

### ☆ Solution ☆

#### ⊗ Importing Packages

In [10]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import time

#### ⊗ Verify the devices

It is very important, before trying to execute anything on any device, to verify if it is available and if pytorch can use it.

In [3]:
device = torch.device("cpu:0")

#### ⊗ Transformations to the data

<p style='text-align: justify;'> 
    As part of the data preparation process, we create a <b>transforms</b> object to apply specific transformations to the data. These transformations are commonly used in training datasets to enhance data diversity and ready images for utilization in a deep learning model, such as a Convolutional Neural Network (CNN).
    </p>

In [4]:
transform = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.RandomCrop(32, padding=4),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

#### ⊗ Downloading the Dataset

<p style='text-align: justify;'> 
Following that, download the CIFAR-10 dataset and load it into the code. Define the neural network as we have done in previous notebooks, and remember to move this network instance to the previously defined device.
</p>

In [None]:
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True, num_workers=4)

#### ⊗ Creating the Model

Now it is necessary to create the model for our neural network using pytorch.

In [5]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.fc1 = nn.Linear(128 * 8 * 8, 512)
        self.fc2 = nn.Linear(512, 10)

    def forward(self, x):
        x = torch.relu(self.conv1(x))
        x = torch.max_pool2d(x, 2)
        x = torch.relu(self.conv2(x))
        x = torch.max_pool2d(x, 2)
        x = x.view(-1, 128 * 8 * 8)
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

net = Net()
net.to(device)

Files already downloaded and verified


Net(
  (conv1): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv2): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (fc1): Linear(in_features=8192, out_features=512, bias=True)
  (fc2): Linear(in_features=512, out_features=10, bias=True)
)

#### ⊗ Training the network

Now the training of our neural network will be carried out.

In [11]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9)

device = torch.device("cpu")
net.to(device)

cpu_start_time = time.time()

for epoch in range(10):  
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
        
    print(f'Epoch {epoch+1}, Loss: {running_loss / len(trainloader)}')

cpu_end_time = time.time()

cpu_time = cpu_end_time - cpu_start_time

print(f"\nCPU Training time: {cpu_time:.2f} seconds or ({cpu_time / 60:.2f} minutes)")

torch.save(net.state_dict(), 'cifar10_cpu_model.pth')

Epoch 1, Loss: 0.4932993039908007
Epoch 2, Loss: 0.47943620257975195
Epoch 3, Loss: 0.47039098767063503
Epoch 4, Loss: 0.4571740362802735
Epoch 5, Loss: 0.4399987073505626
Epoch 6, Loss: 0.42618661684453335
Epoch 7, Loss: 0.4213465188470338
Epoch 8, Loss: 0.41050467756398196
Epoch 9, Loss: 0.40083302690854766
Epoch 10, Loss: 0.3957774586918409

CPU Training time: 811.17 seconds or (13.52 minutes)


#### ⊗ Results

<p style='text-align: justify;'> 
You may have noticed that we completed our training with only 10 epochs, and it took around <b>13.52 minutes!!!</b> That means it's a reasonably long time for a small number of epochs. Now, imagine if we increased it to 100 epochs or used a larger dataset like CIFAR-100! It would become impractical to perform this kind of task on conventional computing resources, such as a laptop with a CPU, for example. Therefore, it is necessary to rely on much greater computational power, which is provided by supercomputers and GPUs.
</p>

## Summary

<p style='text-align: justify;'>
In this notebook we have shown: 

- The definitios about HPC applied in AI applications,
- Parallel computing concepts,
- Some HPC uses cases, and solutions for AI using Tensorflow and Pytorch,
- An example of training a neural network on the CPU using pytorch and CIFAR-10.
</p>    

## Clear the memory

Before moving on, please execute the following cell to clear up the CPU memory. This is required to move on to the next notebook.

In [11]:
#import IPython
#app = IPython.Application.instance()
#app.kernel.do_shutdown(True)

## Next

In this section you learned the meaning of HPC applied in AI applications, and how we can use the processing speed of a GPU to improve the performance of our artificial intelligence algorithms. In the next notebook we will study HPC as solutions for AI using the tool call [_02-hpc-simulations-tensorflow.ipynb_](02-hpc-simulations-tensorflow.ipynb).