# Pytorch

[Pytorch](https://pytorch.org/) is the most popular open-source, modern deep learning library out there and what we will use in this tutorial session. Other popular libraries include [Tensorflow](https://www.tensorflow.org/), [JAX](https://docs.jax.dev/en/latest/quickstart.html), [MindSpore](https://www.mindspore.cn/en/), etc. ...

![market-share](https://softwaremill.com/user/pages/blog/229.ml-engineer-comparison-of-pytorch-tensorflow-jax-and-flax/image2.png?g-2f9bce85)

*Trends of paper implementations grouped by framework: Comparison of  PyTorch vs. TensorFlow ([viso.ai](https://viso.ai/deep-learning/pytorch-vs-tensorflow/))*

All of those libraries works in similar fashion. Common things you have to learn include:

1. Array data types (_tensor_)
2. Data loading tools (streamline prepping data into appropraite types from input files)
3. Implementation of a machine learning (ML) model

In this notebook, we cover the basics of the first item.

<a href="datatype"></a>
## 1. Tensor data types in PyTorch
In `PyTorch`, we use `torch.Tensor` objects to represent data arrays. It is a lot like `numpy.ndarray` objects, but not quite the same. `torch` provide an application programming interface (API) to easily convert data between `numpy.ndarray` and `torch.Tensor`. Let's play a little bit.

In [1]:
import numpy as np
import torch

# Set the seed to always get the same result from RNG calls in this notebook
SEED = 123
np.random.seed(SEED)    # Setting the seed for reproducibility
torch.manual_seed(SEED) # This is how you do for torch!

<torch._C.Generator at 0x719fd9d712b0>

... yep, that's how we set pytorch random number seed!

### Creating a torch.Tensor

`PyTorch` provides constructors similar to numpy (named the same way wherever possible to avoid users having to look-up function names). Here are some examples.

In [2]:
# Tensor of 0s = numpy.zeros
t = torch.zeros(2,3)
print('torch.zeros:\n', t)

# Tensor of 1s = numpy.ones
t = torch.ones(2, 3)
print('\ntorch.ones:\n',t)

# Tensor from a sequential integers = numpy.arange
t = torch.arange(0, 6, 1).reshape(2,3).float()
print('\ntorch.arange:\n',t)

# Normal distribution centered at 0.0 and sigma=1.0 = numpy.rand.randn
t = torch.randn(2, 3)
print('\ntorch.randn:\n',t)

torch.zeros:
 tensor([[0., 0., 0.],
        [0., 0., 0.]])

torch.ones:
 tensor([[1., 1., 1.],
        [1., 1., 1.]])

torch.arange:
 tensor([[0., 1., 2.],
        [3., 4., 5.]])

torch.randn:
 tensor([[-0.1115,  0.1204, -0.3696],
        [-0.2404, -1.1969,  0.2093]])


... or you can create from a simple list, tuple, and numpy arrays.

In [3]:
# Create a 10x10 numpy array
data_np = np.zeros([10,10],dtype=np.float32)

# Fill it with something
np.fill_diagonal(data_np, 1.)
print('Numpy data\n',data_np)

# Create torch.Tensor from the numpy array
data_torch = torch.Tensor(data_np)
print('\ntorch.Tensor data\n',data_torch)

# One can make also from a list
data_list = [1, 2, 3]
data_list_torch = torch.Tensor(data_list)
print('\nPython list :', data_list)
print('torch.Tensor:', data_list_torch)

Numpy data
 [[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]

torch.Tensor data
 tensor([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])

Python list : [1, 2, 3]
torch.Tensor: tensor([1., 2., 3.])


Converting back from `torch.Tensor` to a numpy array can be easily done

In [4]:
# Bringing back into numpy array
data_np = data_torch.numpy()
print('\nNumpy data (converted back from torch.Tensor)\n',data_np)


Numpy data (converted back from torch.Tensor)
 [[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]


Ordinary operations that can be performed on an array in `numpy` also exist in `torch`.

In [5]:
# Mean, standard deviation and sum
print('mean', data_torch.mean(), 'std',data_torch.std(), 'sum', data_torch.sum())

mean tensor(0.1000) std tensor(0.3015) sum tensor(10.)


We see the return of those functions (`mean`,`std`,`sum`) are tensor objects. If you would like a single scalar value, you can call `item` function.

In [6]:
# Mean, standard deviation and sum
print('mean', data_torch.mean().item(), 'std',data_torch.std().item(), 'sum', data_torch.sum().item())

mean 0.10000000149011612 std 0.30151134729385376 sum 10.0


### Basic linear algebra operations
Common operations include element-wise addition, substraction, multiplication, matrix multiplication, and reshaping. Read the [documentation](https://pytorch.org/docs/stable/tensors.html) to find the right function for what you want to do!

In [7]:
# Start by initializing two matrices of zeros
data_a = np.zeros([3,3], dtype=np.float32)
data_b = np.zeros([3,3], dtype=np.float32)

# Fill the first one's diagonal with ones
np.fill_diagonal(data_a, 1.)

# Fill the second ones' first row with ones
data_b[0] = 1.

# Print the matrices
print('Two numpy matrices')
print(data_a)
print(data_b,'\n')

# Cast them to torch.Tensor objects
torch_a = torch.Tensor(data_a)
torch_b = torch.Tensor(data_b)

print('\nAdding a scalar to all elements:')
print(torch_a + 1)

print('\ntorch.Tensor matrix addition:')
print(torch_a + torch_b)

print('\ntorch.Tensor matrix substraction:')
print(torch_a - torch_b)

print('torch.Tensor element-wise multiplication:')
print(torch_a*torch_b)

print('\ntorch.Tensor matrix multiplication:')
print(torch_a@torch_b)

Two numpy matrices
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
[[1. 1. 1.]
 [0. 0. 0.]
 [0. 0. 0.]] 


Adding a scalar to all elements:
tensor([[2., 1., 1.],
        [1., 2., 1.],
        [1., 1., 2.]])

torch.Tensor matrix addition:
tensor([[2., 1., 1.],
        [0., 1., 0.],
        [0., 0., 1.]])

torch.Tensor matrix substraction:
tensor([[ 0., -1., -1.],
        [ 0.,  1.,  0.],
        [ 0.,  0.,  1.]])
torch.Tensor element-wise multiplication:
tensor([[1., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])

torch.Tensor matrix multiplication:
tensor([[1., 1., 1.],
        [0., 0., 0.],
        [0., 0., 0.]])


### Reshaping

You can access the tensor shape via `.shape` attribute, just like numpy

In [8]:
print(torch_a)
print('torch_a shape:', torch_a.shape)
print('The 0th dimension size:', torch_a.shape[0])

tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])
torch_a shape: torch.Size([3, 3])
The 0th dimension size: 3


Similarly, there is a `reshape` function

In [9]:
print(torch_a.reshape(1,9))
torch_a.reshape(1,9).shape

tensor([[1., 0., 0., 0., 1., 0., 0., 0., 1.]])


torch.Size([1, 9])

... and you can also use -1 in the same way you used for numpy (infer one dimension from the others)

In [10]:
print(torch_a.reshape(-1,3))
torch_a.reshape(-1,3).shape

tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])


torch.Size([3, 3])

### Indexing (Slicing)

We can use a similar indexing trick like we tried with a numpy array

In [11]:
torch_a

tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])

Fetch a specific row...

In [12]:
torch_a[0]

tensor([1., 0., 0.])

...or a specific column

In [13]:
torch_a[:, 1]

tensor([0., 1., 0.])

One can also generate a mask based on a boolean condition...

In [14]:
mask = torch_a == 0.
mask

tensor([[False,  True,  True],
        [ True, False,  True],
        [ True,  True, False]])

...which can than be used to slice an array (access or modify specific elements of it):

In [15]:
torch_a[mask]

tensor([0., 0., 0., 0., 0., 0.])

### GPU acceleration
Putting `torch.Tensor` on GPU is very easy, provided you have access to a CUDA-enabled GPU, i.e. an Nvidia device. This can be checks easily using:

In [16]:
torch.cuda.is_available()

False

Or by checking the number of visible CUDA devices:

In [17]:
torch.cuda.device_count()

0

If the above commands return `True` and an integer strictly above `0`, we can move torch.Tensor to GPU!
- `t.cuda()` will place the tensor on the first available GPU device
- `t.to('cuda')` will place the tensor on the first available GPU device
- `t.to(device=0)` will place the tensor on the `rank=0` (first in the `nvidia-smi` list)

The latter command allows to place the data on a specific GPU, if several of them are available. This is of particular interest when training a model across multiple GPUs for faster convergence.

If there are no visible GPUs, the rest of the notebook will not execute.

Create two arrays with an identical data type, shape, and values.

In [18]:
# Create a large 1000x1000 matrix
data_np = np.zeros([1000, 1000],dtype=np.float32)
data_cpu = torch.Tensor(data_np).to('cpu')
# data_gpu = torch.Tensor(data_np).to('cuda')

Use jupyter notebook's profiling tools to estimate how long it takes to compute the fifth power of the matrix on CPU

In [19]:
%%timeit
mean = (data_cpu ** 5).mean().item()

273 ms ± 10 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


... and next on GPU

In [20]:
# %%timeit
# mean = (data_gpu ** 5).mean().item()

... which is more than x10 faster than the cpu counter part :)

But there's a catch you should be aware! Preparing a data on GPU does take time because data needs to be sent to GPU, which could take some time. Let's compare the time it takes to create a tensor on CPU v.s. GPU.

In [21]:
%%timeit
data_np = np.zeros([1000, 1000], dtype=np.float32)
data_cpu = torch.Tensor(data_np).cpu()

63.8 μs ± 359 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [22]:
# %%timeit
# data_np = np.zeros([1000, 1000], dtype=np.float32)
# data_gpu = torch.Tensor(data_np).cuda()

As you can see, it takes nearly 10 times longer to create this particular data tensor on our GPU. This speed depends on many factors including your hardware configuration (e.g. CPU-GPU communication via PCI-e or NVLINK). It makes sense to move computations that take longer than the data transfer time to be performed on GPU.