In [1]:
import numpy as np
import matplotlib.pyplot as plt
import torch
torch.__version__

'2.0.1+cu117'

## PyTorch Tensors

PyTorch Tensors are just multi-dimensional arrays. You can go back and forth between these and numpy ndarray.

In [2]:
A = np.random.rand(2,2)
A

array([[0.16208704, 0.98401256],
       [0.87045688, 0.3129406 ]])

In [3]:
B = torch.Tensor(A)
B

tensor([[0.1621, 0.9840],
        [0.8705, 0.3129]])

To put a tensor on gpu, use cuda(). Note that you must have an NVIDIA GPU in your computer to be able to do this successfully.

In [4]:
torch.cuda.is_available()

True

In [5]:
Bcuda = B.cuda()
print(Bcuda)
Bcpu = B.cpu()
print(Bcpu)

tensor([[0.1621, 0.9840],
        [0.8705, 0.3129]], device='cuda:0')
tensor([[0.1621, 0.9840],
        [0.8705, 0.3129]])


In [6]:
B.to(device='cuda')

tensor([[0.1621, 0.9840],
        [0.8705, 0.3129]], device='cuda:0')

To move a tensor back to CPU, you can use device='cpu'

## Linear Algebra Functions¶

PyTorch provides access to a variety of BLAS and LAPACK-type routines - see [documentation here](https://pytorch.org/docs/stable/torch.html#blas-and-lapack-operations). These do not follow the BLAS/LAPACK naming conventions

[torch.addmv](https://pytorch.org/docs/stable/generated/torch.addmv.html#torch-addmv) is roughly equivalent to axpy, and performs $ Ax + y $

In [7]:
m = 100
n = 100
device = torch.device('cuda') # 'cuda' or 'cpu'

Anp = np.random.randn(m,n)
xnp = np.random.randn(n)
ynp = np.random.randn(m)

A = torch.Tensor(Anp).to(device=device)
x = torch.Tensor(xnp).to(device=device)
y = torch.Tensor(ynp).to(device=device)

z = torch.addmv(y, A, x)

Let’s look at the timing difference between CPU and GPU

In [8]:
m = 1000
n = 1000

Anp = np.random.randn(m,n)
xnp = np.random.randn(n)
ynp = np.random.randn(m)
print("numpy")
%time z = ynp + Anp @ xnp


for device in ('cpu', 'cuda'):
    print(f"\ndevice = {device}")
    A = torch.Tensor(Anp).to(device=device)
    x = torch.Tensor(xnp).to(device=device)
    y = torch.Tensor(ynp).to(device=device)
    z = torch.addmv(y, A, x)

    %time z = torch.addmv(y, A, x)
    
    

numpy
CPU times: user 8.23 ms, sys: 9.47 ms, total: 17.7 ms
Wall time: 2.96 ms

device = cpu
CPU times: user 23.3 ms, sys: 19.8 ms, total: 43.2 ms
Wall time: 6.13 ms

device = cuda
CPU times: user 303 µs, sys: 107 µs, total: 410 µs
Wall time: 88.2 µs


[torch.mv](https://pytorch.org/docs/stable/generated/torch.mv.html#torch.mv) performs matrix-vector products

In [9]:
Anp = np.random.randn(m,n)
xnp = np.random.randn(n)

print("numpy")
%time z = Anp @ xnp


for device in ['cpu', 'cuda']:
    print(f"\ndevice = {device}")
    A = torch.Tensor(Anp).to(device=device)
    x = torch.Tensor(xnp).to(device=device)
    y = torch.mv(A, x)

    %time y = torch.mv(A, x)

numpy
CPU times: user 4.91 ms, sys: 1.77 ms, total: 6.68 ms
Wall time: 1.1 ms

device = cpu
CPU times: user 22.4 ms, sys: 3.39 ms, total: 25.7 ms
Wall time: 4.53 ms

device = cuda
CPU times: user 521 µs, sys: 199 µs, total: 720 µs
Wall time: 110 µs


[torch.mm](https://pytorch.org/docs/stable/generated/torch.mm.html#torch.mm) performs matrix-matrix multiplications

In [10]:
m = 1000
n = 1000

Anp = np.random.randn(m,n)
Bnp = np.random.randn(n, n)

print("numpy")
%time C = Anp @ Bnp

for device in ['cpu', 'cuda']:
    print(f"\ndevice = {device}")
    A = torch.Tensor(Anp).to(device=device)
    B = torch.Tensor(Bnp).to(device=device)
    C = torch.mm(A, B) # run once to warm up

    %time C = torch.mm(A, B)

numpy
CPU times: user 125 ms, sys: 31.4 ms, total: 156 ms
Wall time: 23.1 ms

device = cpu
CPU times: user 24.2 ms, sys: 0 ns, total: 24.2 ms
Wall time: 6.54 ms

device = cuda
CPU times: user 221 µs, sys: 0 ns, total: 221 µs
Wall time: 224 µs


## Batch operations

Where PyTorch (and GPUs in general) really shine are in batch operations. We get extra efficiency if we do a bunch of multiplications with matrices of the same size.

For matrix-matrix multiplcation, the function is [torch.bmm](https://pytorch.org/docs/stable/generated/torch.bmm.html#torch.bmm)

Because tensors are row-major, we want the batch index to be the first index. In the below code, the batch multiplication is equivalent to
```
for i in range(k):
    C[i] = A[i] @ B[i]
```    

In [11]:
n = 512 # matrix size
k = 32 # batch size

Anp = np.random.randn(k, n, n)
Bnp = np.random.randn(k, n, n)
# see numpy matmul documentation for how this performs batch multiplication
print("numpy")
%time C = np.matmul(Anp, Bnp)

for device in ['cpu', 'cuda']:
    print(f"\ndevice = {device}")
    A = torch.randn(k, n, n).to(device=device)
    B = torch.randn(k, n, n).to(device=device)
    C = torch.bmm(A, B) # run once to warm up

    %time C = torch.bmm(A, B)

numpy
CPU times: user 897 ms, sys: 791 ms, total: 1.69 s
Wall time: 250 ms

device = cpu
CPU times: user 194 ms, sys: 11.1 ms, total: 205 ms
Wall time: 56.3 ms

device = cuda
CPU times: user 263 µs, sys: 121 µs, total: 384 µs
Wall time: 393 µs


## Sparse Linear Algebra¶

PyTorch also supports sparse tensors in [torch.sparse](https://pytorch.org/docs/stable/sparse.html). Tensors are stored in [COOrdinate format](https://caam37830.github.io/book/02_linear_algebra/sparse.html#coordinate-format).

In [12]:
i = torch.LongTensor([[0, 1, 1],
                      [2, 0, 2]])
v = torch.FloatTensor([3, 4, 5])
torch.sparse.FloatTensor(i, v, torch.Size([2,3])).to_dense()

tensor([[0., 0., 3.],
        [4., 0., 5.]])

indices are stored in a 2 x nnz tensor of Long (a datatype that stores integers). Values are stored as floats.