<a href="https://colab.research.google.com/github/samaid/pyhpc-tutorial/blob/main/notebooks/9_1_nvmath-python_interop.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 9.1. `nvmath-python`: Interoperability with CPU and GPU tensor libraries
The goal of this exercise is to demonstrate how easy it is to plug `nvmath-python` into existing projects that rely on popular CPU or GPU array libraries, such as NumPy, CuPy, and PyTorch, or how easy it is to start a new project where `nvmath-python` is used alongside array libraries.

### Pure CuPy implementation

This example demonstrates basic matrix multiplication of CuPy 2D arrays using `matmul`:

In [1]:
import cupy as cp

# Prepare sample input data for matrix matmul
n, m, k = 2000, 4000, 5000
a = cp.random.rand(n, k)
b = cp.random.rand(k, m)

# Perform matrix multiplication
result = cp.matmul(a, b)

# Print the result
print(result)

# Print CUDA device for each array
print(a.device)
print(b.device)
print(result.device)

CUDARuntimeError: cudaErrorInsufficientDriver: CUDA driver version is insufficient for CUDA runtime version

### Using `nvmath-python` alongside CuPy

This is a slight modification of the above example, where matrix multiplications is done using corresponding `nvmath-python` implementation.

Note that `nvmath-python` supports multiple frameworks, including CuPy. It uses framework's memory pool and the current stream for seamless integration. The result of each operation is a tensor of the same framework that was used to pass the inputs. It is also located on the same device as the inputs.

In [None]:
# The same matrix multiplication as in the previous example but using nvmath-python
import nvmath

# Perform matrix multiplication
result = nvmath.linalg.advanced.matmul(a, b)

# Print the result
print(result)

# Print CUDA device for each array
print(a.device)
print(b.device)
print(result.device)


As we can see, the code looks essentially the same. If one measures the performance of above implementations, it will be nearly identical.

This is because CuPy and `nvmath-python` (as well as PyTorch) all use CUDA-X Math Libraries as the engine. It is up to a user, which library to choose for solving the above matrix multiplication problem.

In the next examples we will demonstrate a few examples, where `nvmath-python` may become essential in reaching peak levels of performance.