# PAPI profiling PyTorch on CPU and GPU

In this tutorial we will see how `CyPAPI` can be used to profile computation executed by PyTorch on CPU as well as GPU.

The computation to profile is simply
- Create two 1000x1000 matrices populated with random numbers
- Perform matrix multiplication and get the resulting matrix

In [None]:
from cypapi import *
pyPAPI_library_init()

In [None]:
import torch

## Running on the CPU

In [None]:
eventset = PyPAPI_EventSet()

In [None]:
eventset.cleanup()
eventset.add_named_event('perf::INSTRUCTIONS')
eventset.add_named_event('perf::CPU-CYCLES')

In [None]:
# Set random seed for reproducibility
eventset.start()
torch.manual_seed(42)

# Generate random matrices
matrix_A = torch.rand(1000, 1000)
matrix_B = torch.rand(1000, 1000)

# Perform matrix multiplication
result = torch.mm(matrix_A, matrix_B)

# Measure events and print
values = eventset.stop()
print(values)

# Print the matrices and the result
print("Matrix A:")
print(matrix_A)
print("\nMatrix B:")
print(matrix_B)
print("\nMatrix multiplication result:")
print(result)


## Running on the GPU

In [None]:
evtsetgpu = PyPAPI_EventSet()

In [None]:
evtsetgpu.cleanup()
evtsetgpu.add_named_event('cuda:::dram__bytes_read.sum:device=0')
evtsetgpu.add_named_event('cuda:::sm__warps_launched.sum:device=0')

In [None]:
# Set random seed for reproducibility
torch.manual_seed(42)

# Check if a GPU is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

evtsetgpu.start()
# Generate random matrices on the GPU
matrix_A = torch.rand(1000, 1000, device=device)
matrix_B = torch.rand(1000, 1000, device=device)

# Perform matrix multiplication
result = torch.mm(matrix_A, matrix_B)

# Transfer the result back to CPU if needed
result_cpu = result.to("cpu")

valuesgpu = evtsetgpu.stop()
print(valuesgpu)

# Print the matrices and the result
print("Matrix A:")
print(matrix_A)
print("\nMatrix B:")
print(matrix_B)
print("\nMatrix multiplication result:")
print(result_cpu)
