## AI Training/playground

In this tutorial, we briefly explain how to create and train a model using PyTorch. For inference, we'll show how to use the Intel AI software products to improve performance, apply quantization, or even use Intel GPU.

In the beginning, we need to import some default modules and PyTorch libraries:


In [None]:
import time, statistics
import torchvision
import torch
from torchvision.transforms import ToTensor
import torch.nn as nn
import torch.nn.functional as F

To reduce output size, we'll disable extra warnings:

In [None]:
import warnings
warnings.simplefilter(action='ignore')

# Training small NN model
We'll start from the simplest model: the neural network model and the simplest Convolutional model(LeNet).\
Both models will try to solve the image classification problem - try to recognize numbers from the image (MNIST dataset).\
First model will contrain three Linear layers with two activations(ReLU in our example):\
 A Linear layer in PyTorch is a fundamental building block in neural networks that performs a linear transformation on the input data, often represented as a fully connected layer.\
 The activation function, such as ReLU (Rectified Linear Unit), introduces non-linearity into the model, allowing it to learn more complex patterns. \
 ReLU specifically sets all negative values to zero and leaves positive values unchanged, which helps to mitigate the vanishing gradient problem and improve training efficiency.\
 \
Here you can see how you can create a class with model initialization and forward method, which will be used later in inference and training:

In [None]:
class NeuralNetwork300(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 300),
            nn.ReLU(),
            nn.Linear(300, 300),
            nn.ReLU(),
            nn.Linear(300, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

Convolutional model additionally to Linear will use Convolution layer, which applies convolution operations to the input data. \
It uses a set of learnable filters (or kernels) that slide over the input to produce feature maps, capturing local patterns such as edges, textures, and shapes, making it particularly effective for image and signal processing tasks.
![SegmentLocal](convolution.gif "segment")


In [None]:
class LeNet_1(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.pool = nn.AvgPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        
        self.flatten = nn.Flatten()
        
        self.fc1 = nn.Linear(16 * 4 * 4, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = self.flatten(x)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Now we'll define the train and evaluate functions, which expect the model, loss function and data for training and validation as arguments:


In [None]:
def evaluate(model, data_loader, loss_fn):
    model.eval()
    avg_vloss = 0.0
    acc = 0.0
    with torch.no_grad():
        for i, vdata in enumerate(data_loader):
            vinputs, vlabels = vdata
            voutputs = model(vinputs)
            vloss = loss_fn(voutputs, vlabels)
            avg_vloss += vloss

            _, indices = torch.max(voutputs, dim=1)
            acc += torch.sum(indices == vlabels)
    
    avg_vloss = avg_vloss / (i + 1)
    acc = acc / data_loader.dataset.data.shape[0]
    return acc, avg_vloss


def train(model, optimizer, loss_fn, training_loader, validation_loader, epochs):
    train_time = 0.0

    for epoch in range(epochs):
        model.train(True)
        running_loss = 0.
        last_loss = 0.
        start = time.time()
        for i, data in enumerate(training_loader):
            inputs, labels = data
        
            optimizer.zero_grad()
        
            outputs = model(inputs)
        
            loss = loss_fn(outputs, labels)
            loss.backward()
        
            optimizer.step()
            end = time.time()
            train_time += end - start
            
            running_loss += loss.item()
            if i % 1000 == 999:
                last_loss = running_loss / 1000
                print('  batch {} loss: {}'.format(i + 1, last_loss))
                running_loss = 0.
            start = time.time()
        
        vpass, vloss = evaluate(model, validation_loader, loss_fn)
        print(f'Epoch {epoch} train {last_loss} valid {vloss}')
    print(f"Training time: {train_time}")


The next step will be data loading, we'll use dataset MNIST, which is provided by torchvision package:


In [None]:
training_set = torchvision.datasets.MNIST(root='data', train=True, download=True, transform=ToTensor())
validation_set = torchvision.datasets.MNIST(root='data', train=False, download=True, transform=ToTensor())

training_loader = torch.utils.data.DataLoader(training_set, batch_size=16, shuffle=True)
validation_loader = torch.utils.data.DataLoader(validation_set, batch_size=16, shuffle=False)

loss_fn = torch.nn.CrossEntropyLoss()

Now we can run training for basic neural model:

In [None]:

model = NeuralNetwork300()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

num_epochs = 5
train(model, optimizer, loss_fn, training_loader, validation_loader, num_epochs)

start = time.time()
acc, loss = evaluate(model, validation_loader, loss_fn)
end = time.time()
print(f"Inference time: {end - start}, Validation Passrate: {acc}")

And for the convolutional one:

In [None]:
model = LeNet_1()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
train(model, optimizer, loss_fn, training_loader, validation_loader, num_epochs)

start = time.time()
acc, loss = evaluate(model, validation_loader, loss_fn)
end = time.time()
print(f"Inference time: {end - start}, Validation Passrate: {acc}")

The second(Convolutional) model works slower, but it solves the image classification problem better.

Now lets define function for inference benchmarking. We'll use it later to compare different software optimizations and solutions.

In [None]:
def inference_benchmark(model, data_loader, num_iter, warm_up, ov=False):
    if not ov:
        model.eval()
        current_device = model.conv1.weight.device
    with torch.no_grad():
        av_time = 0.0
        for it in range(num_iter):
            acc = 0.0
            ev_time = 0.0
            t0 = time.time()
            for i, vdata in enumerate(data_loader):
                vinputs, vlabels = vdata
                if not ov:
                    vinputs = vinputs.to(current_device)
                    vlabels = vlabels.to(current_device)
                voutputs = model(vinputs)
                ev_time += time.time() - t0
                if ov:
                    voutputs = torch.Tensor(voutputs[0])
                _, indices = torch.max(voutputs, dim=1)
                acc += torch.sum(indices == vlabels)
                t0 = time.time()
            if it >= warm_up:
                acc = acc / data_loader.dataset.data.shape[0]
                av_time += ev_time
                print(f"{it - warm_up} iteration: {ev_time} pass_rate: {acc}")
        av_time = av_time / (num_iter - warm_up)
        print(f"Average inference time: {av_time}")
        return av_time


In our bencmarking we'll compare PyTorch model with [Intel OpenVINO](https://docs.openvino.ai/2024/home.html) optimizations. To do it we need to convert model to OpenVINO format.\
Additionaly OpenVINO supports [nncf](https://docs.openvino.ai/2022.3/nncf_ptq_introduction.html) weigthts compression, which optimize model by reducing weights size and using additional HW optimizations such as AVX512 and AMX depending on device capabilities.

In [None]:
def create_ov_model(model, compressed=False, device="CPU"):
    from nncf import compress_weights
    import openvino as ov
    ov_model = ov.convert_model(model)
    core = ov.Core()
    if compressed:
        compressed_model = compress_weights(ov_model)
    compiled_model = core.compile_model(compressed_model if compressed else ov_model, device_name=device)
    return compiled_model


Now we can prepare multiple versions of our trained model. First one will be [Intel PyTorch Extension](https://github.com/intel/intel-extension-for-pytorch) library. The second will be OpenVINO CPU version.
If you have Intel HW with AMX/AVX512 or GPU(Intel or NVidia), then please comments in code to add such versions in your experiment.

In [None]:
import intel_extension_for_pytorch as ipex
ipex_model = ipex.optimize(model, inplace=False)

ov_model = create_ov_model(model)

models = {"default": model, "IPEX": ipex_model, "OV_CPU": ov_model}

# if you have configured GPU (like in https://docs.openvino.ai/2024/get-started/configurations/configurations-intel-gpu.html)
# then you can uncomment this lines:
# gpu_ov_model = create_ov_model(model, device="GPU")
# models["OV_GPU"] = gpu_ov_model

# if your device support int8 instructions, then you can try to use compression:
# compressed_ov_model = create_ov_model(model, compressed=True)
# models["OV_Compressed"] = compressed_ov_model

# And if you have configured Nvidia GPU
# then you can uncomment this lines to add it to testing:
# gpu_cuda_model = copy.deepcopy(model).to("cuda")
# models["CUDA"] = gpu_cuda_model

model_stats = {}
for m in models.keys():
    print(f"Testing model:{m}")
    model_stats[m] = inference_benchmark(models[m], validation_loader, 9, 1, m.startswith("OV"))

print(model_stats)


# Language model example

Lets move to more complex models - language models. In such case additionaly to PyTorch we'll use ```transformers``` library.
This library provides API to load pretrained language models.
In such models before applying model, we need to tokenize our input text by using special tokenizer.
In our example we'll use Bert model to get token(```[MASK]```) prediction.

In [None]:
from transformers import AutoTokenizer, AutoModelForMaskedLM


tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
model = AutoModelForMaskedLM.from_pretrained("google-bert/bert-base-uncased")

masked_text = "I should change my [MASK]!"
inputs = tokenizer(masked_text, return_tensors="pt")

mask_token_index = torch.where(inputs["input_ids"] == tokenizer.mask_token_id)[1]
logits = model(**inputs).logits
mask_token_logits = logits[0, mask_token_index, :]
top_3_tokens = torch.topk(mask_token_logits, 3, dim=1).indices[0].tolist()
for token in top_3_tokens:
    print(masked_text.replace(tokenizer.mask_token, tokenizer.decode([token])))


And now we'll move to the most popular type of language models right now - Large Language Models for text generation. We'll start from GPT2 model, which doesn't require to obtain any accesses.

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")
model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2")
input_text = "What should I change?"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, pad_token_id=tokenizer.eos_token_id, max_length=20)
print(tokenizer.decode(outputs[0]))

Like for Convolutional model let create small benchmark to compare performance of generative models:

In [None]:
def inference_generative(model, tokenizer, input_text, output_size, num_iter, warm_up):
    t = list()
    for i in range(num_iter):
        start = time.time()
        inputs = tokenizer(input_text, return_tensors="pt")
        outputs = model.generate(**inputs, pad_token_id=tokenizer.eos_token_id, max_length=output_size)
        end = time.time()
        if i >= warm_up:
            t.append(end - start)
            print(f"{i - warm_up}: {end - start}s : {tokenizer.decode(outputs[0])}")
    print(f"Average time: {statistics.fmean(t)}")
    return statistics.fmean(t)


We'll compare default approach with Intel SW optimization on the facebook's opt small model.
Additionaly we'll use transformers optimimum API to load model into OpenVINO format.

In [None]:
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-125m")
model = AutoModelForCausalLM.from_pretrained("facebook/opt-125m")

import intel_extension_for_pytorch as ipex
ipex_model = ipex.llm.optimize(model.eval(), dtype=torch.float32, inplace=False, deployment_mode=True)

from optimum.intel.openvino import OVModelForCausalLM
ov_model = OVModelForCausalLM.from_pretrained("facebook/opt-125m", export=True)

Now we can compare them by using benchmark. As previously mentioned: if you have Intel or Nvidia GPU, please check comments below.

In [None]:
models = {"default": model, "IPEX": ipex_model, "OV_CPU": ov_model}
model_stats = {}
# if you have configured GPU (like in https://docs.openvino.ai/2024/get-started/configurations/configurations-intel-gpu.html)
# then you can uncomment this lines:
# gpu_ov_model = OVModelForCausalLM.from_pretrained("facebook/opt-125m", export=True, device="GPU")
# models["OV_GPU"] = gpu_ov_model

# And if you have configured Nvidia GPU
# then you can uncomment this lines to add it to testing:
# gpu_cuda_model = AutoModelForCausalLM.from_pretrained("facebook/opt-125m", device_map = 'cuda')
# models["CUDA"] = gpu_cuda_model
for m in models.keys():
    print(f"Testing model:{m}")
    model_stats[m] = inference_generative(models[m], tokenizer, "How are you?", 20, 5, 1)

print(model_stats)

Additional materials: https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html
