## Getting Started with PyTorch on Intel GPUs

### Introduction:

Welcome to this Jupyter Notebook tutorial on getting started with PyTorch on Intel GPUs! In this notebook, we'll explore the exciting new features and capabilities of PyTorch, specifically the support for Intel GPUs (XPUs). We've already set up the environment and installed PyTorch for you, so you can dive right into learning and experimenting.

> **Note**: Ensure that you have selected `pytorch_2.5` as the Jupyter kernel before proceeding.

Throughout this notebook, we'll cover the basics of tensor operations, demonstrate how to check the device being used, and walk through a few example workloads to showcase the power of PyTorch on Intel GPUs. Let's get started!

**Step 1**: Checking PyTorch Version and Device

In [1]:
import os
os.environ["ZE_ENABLE_TRACING_LAYER"] = "1"

import torch

print(f"PyTorch Version: {torch.__version__}")

device = torch.device('xpu' if torch.xpu.is_available() else 'cpu')
print(f"Using device: {device}")

PyTorch Version: 2.5.0.dev20240911+xpu
Using device: xpu


**Step 2**: Let's see a basic Tensor Operation

In [2]:
# device selection
device = torch.device('xpu' if torch.xpu.is_available() else 'cpu')
print(f"Using device: {device}")

# Create a tensor on the XPU device
tensor = torch.ones(3, 4, device=device)
print(f"Tensor: {tensor}")
print(f"Tensor device: {tensor.device}")

# Matrix multiplication
mat1 = torch.randn(3, 4, device=device)
mat2 = torch.randn(4, 5, device=device)
result = torch.matmul(mat1, mat2)
print(f"Matrix multiplication result shape: {result.shape}")

Using device: xpu
Tensor: tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]], device='xpu:0')
Tensor device: xpu:0
Matrix multiplication result shape: torch.Size([3, 5])


**Step 3**: Example Workload - Image Classification with FP32 precision

In [3]:
import torchvision.models as models
import torchvision.transforms as transforms
from PIL import Image
import requests
from torchvision.models.resnet import ResNet18_Weights

import ssl
ssl._create_default_https_context = ssl._create_unverified_context

# device selection
device = torch.device('xpu' if torch.xpu.is_available() else 'cpu')
print(f"Using device: {device}")

# Get model
weights = ResNet18_Weights.DEFAULT
model = models.resnet18(weights=weights)
model = model.to(device)
imagenet_classes = weights.meta["categories"]

# Prepare the input image
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/4/4d/Cat_November_2010-1a.jpg/1200px-Cat_November_2010-1a.jpg"
input_image = Image.open(requests.get(image_url, stream=True).raw)
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
input_tensor = preprocess(input_image)
input_batch = input_tensor.unsqueeze(0).to(device)

# infer
model = model.eval()
with torch.no_grad():
    output = model(input_batch)

_, predicted = torch.max(output, 1)
class_index = predicted.item()
class_label = imagenet_classes[class_index]

print(f"Predicted class: {class_index}")
print(f"Class label: {class_label}")

Using device: xpu
Predicted class: 285
Class label: Egyptian cat


We used an image of a cat, but if you are not getting the `Class label` as a cat it is just a reminder that even the best models can benefit from a little fine-tuning to help them stay on track. With a bit of training, we can help ResNet18 regain its cat-detecting superpowers and avoid any future blunders! 😺

**Step 4**: Example Workload - Sentiment Analysis inference

Let's see another example with LSTMs and inference after model compilation using `torch.compile`

In [4]:
import torch.nn as nn

# device selection
device = torch.device('xpu' if torch.xpu.is_available() else 'cpu')
print(f"Using device: {device}")

# Define a simple sentiment analysis model (expected to have a trained model, for now we will use this model as an example)
class SentimentModel(nn.Module):
    def __init__(self, vocab_size, embed_dim, hidden_dim, output_dim):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim)
        self.lstm = nn.LSTM(embed_dim, hidden_dim, batch_first=True)
        self.fc = nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        x = self.embedding(x)
        _, (hidden, _) = self.lstm(x)
        out = self.fc(hidden.squeeze(0))
        return out


vocab_size = 10000
embed_dim = 100
hidden_dim = 512
output_dim = 2

model = SentimentModel(vocab_size, embed_dim, hidden_dim, output_dim)
model = model.to(device)
print(f"\nModel before compilation: \n{model}\n")
model = torch.compile(model)  # compile model
print("-"*72)
print(f"\nModel after compilation: \n{model}")

input_text = torch.randint(0, vocab_size, (1, 20), device=device)
output = model(input_text)
sentiment = torch.argmax(output, dim=1)
print(f"Sentiment score: {sentiment.item()}")

Using device: xpu

Model before compilation: 
SentimentModel(
  (embedding): Embedding(10000, 100)
  (lstm): LSTM(100, 512, batch_first=True)
  (fc): Linear(in_features=512, out_features=2, bias=True)
)

------------------------------------------------------------------------

Model after compilation: 
OptimizedModule(
  (_orig_mod): SentimentModel(
    (embedding): Embedding(10000, 100)
    (lstm): LSTM(100, 512, batch_first=True)
    (fc): Linear(in_features=512, out_features=2, bias=True)
  )
)
Sentiment score: 1


**Step 5**: Transfer Learning - Vision (ResNet18) using Auto Mixed Precision (we can use torch.float16 or torch.bfloat16)

In [5]:
import torch
import torch.nn as nn
import torchvision.models as models
import torchvision.transforms as transforms
import torchvision.datasets as datasets
from PIL import Image
import requests
import tqdm

import ssl
ssl._create_default_https_context = ssl._create_unverified_context

# device selection
device = 'xpu' if torch.xpu.is_available() else 'cpu'
print(f"Using device: {device}")

# Use CIFAR10 dataset 
train_dataset = datasets.CIFAR10(root='~/data', train=True, download=True, transform=transforms.Compose([
    transforms.Resize(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
]))
test_dataset = datasets.CIFAR10(root='~/data', train=False, download=True, transform=transforms.Compose([
    transforms.Resize(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
]))

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=128, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=128, shuffle=False)
train_len = len(train_loader)

# Load the pre-trained ResNet18 model and move it to an `xpu` device.
weights = ResNet18_Weights.DEFAULT
imagenet_classes = weights.meta["categories"]
model = models.resnet18(weights=weights)
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs, 10) 

# optimizer and loss
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)
num_epochs = 1

# set model to train and move model and criterion to `device`
model = model.train()
model = model.to(device)
criterion = criterion.to(device)

# Training
for epoch in range(num_epochs):
    running_loss = 0.0
    print(f"Initiating training: Epoch {epoch}")
    for i, data in enumerate(train_loader):
        inputs, labels = data[0].to(device), data[1].to(device)
        optimizer.zero_grad()
        with torch.autocast(device_type=device, dtype=torch.bfloat16, enabled=True):  # using torch.bfloat16
            outputs = model(inputs)
            loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
        if (i + 1) % 10 == 0:
            iteration_loss = loss.item()
            print(f"Iteration [{i+1}/{train_len}], Loss: {iteration_loss:.4f}")
    epoch_loss = running_loss / (i + 1)
    print(f"Epoch {epoch + 1} completed, Epoch Loss: {epoch_loss:.4f}")

# Evaluate
model = model.eval()
correct = 0
total = 0
with torch.no_grad():
    for data in test_loader:
        images, labels = data[0].to(device), data[1].to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
print(f"Accuracy on test images: {100 * correct / total:.2f}%")

Using device: xpu
Files already downloaded and verified
Files already downloaded and verified
Initiating training: Epoch 0
Iteration [10/391], Loss: 1.2851
Iteration [20/391], Loss: 0.8821
Iteration [30/391], Loss: 0.6356
Iteration [40/391], Loss: 0.6047
Iteration [50/391], Loss: 0.5314
Iteration [60/391], Loss: 0.4241
Iteration [70/391], Loss: 0.4907
Iteration [80/391], Loss: 0.3260
Iteration [90/391], Loss: 0.2822
Iteration [100/391], Loss: 0.2285
Iteration [110/391], Loss: 0.3030
Iteration [120/391], Loss: 0.3085
Iteration [130/391], Loss: 0.2870
Iteration [140/391], Loss: 0.3984
Iteration [150/391], Loss: 0.2316
Iteration [160/391], Loss: 0.2748
Iteration [170/391], Loss: 0.2755
Iteration [180/391], Loss: 0.2155
Iteration [190/391], Loss: 0.3599
Iteration [200/391], Loss: 0.1832
Iteration [210/391], Loss: 0.2960
Iteration [220/391], Loss: 0.1651
Iteration [230/391], Loss: 0.2840
Iteration [240/391], Loss: 0.1997
Iteration [250/391], Loss: 0.1497
Iteration [260/391], Loss: 0.2326
It

Congratulations! You've now explored PyTorch with Intel GPU support using this Jupyter Notebook. We covered the basics of tensor operations, checked the device being used, and walked through a couple of example workloads.

Feel free to experiment further, modify the code snippets, and explore more advanced topics.

For more information and examples on Pytorch on Intel GPUs, please refer to this [link](https://pytorch.org/docs/main/notes/get_start_xpu.html).

Happy learning and coding with PyTorch on Intel GPUs!