# PyTorch Tutorial
(Chapter 2. Preliminaries D2L)

Index

* [PyTorch](#PyTorch)
* [Tensors](#Tensors)
* [Operations on Tensors](#Operations-on-Tensors)
* [Tensors: Scenic views of storage*](#Tensors:-Scenic-views-of-storage-*)
* [CUDA](#CUDA)


\* sections that can be skipped on first reading

# PyTorch

* Integration with the rest of the scientific libraries in Python, such as [SciPy](https://www.scipy.org), [Scikit-learn](https://scikit-learn.org), and [Pandas](https://pandas.pydata.org)
* Compared to NumPy arrays, PyTorch tensors perform very fast operations on graphical processing units (GPUs)
  * distribute operations on multiple devices or machines
  * keep track of the graph of computations that created them
  * important features when implementing a modern deep learning library.

# Tensors 

* Tensors are a specialized __data structure__ that are very similar to __arrays__ and __matrices__
* In __PyTorch__, we use tensors to encode the __inputs__ and __outputs__ of a model, as well as the model’s __parameters__
* Tensors are similar to __NumPy__ ndarrays, except that tensors can run on __GPUs__
* Tensors are also optimized for __automatic differentiation__
* The __dimensionality__ of a tensor coincides with the __number of indexes__ used to refer to scalar values within the tensor

![tensors](https://raw.githubusercontent.com/giulianogrossi/imgs/main/pyTorch_tutorial_imgs/tensors.png)

## The essence of tensors

* Python __lists__ or __tuples__ of numbers are collections of Python objects that are __individually allocated__ in memory
* PyTorch __tensors__ or __NumPy arrays__ are views over (typically) contiguous __memory blocks__ containing unboxed C numeric types rather than Python objects
* Each element is (in general) a 32-bit (4-byte) __float__ or __int__ 

![memory](https://raw.githubusercontent.com/giulianogrossi/imgs/main/pyTorch_tutorial_imgs/memory.png)

# Tensors: Scenic views of storage 

* Values in tensors are allocated in __contiguous chunks__ of memory managed by `torch.Storage` instances
* A storage is a __one-dimensional array__ of numerical data: a contiguous block of memory containing numbers of a given type, i.e.
  * `float` (32 bits repre- senting a floating-point number)
  * `int64` (64 bits representing an integer).

![storage](https://raw.githubusercontent.com/giulianogrossi/imgs/main/pyTorch_tutorial_imgs/storage.png)

## Tensor metadata
* In order to index into a storage, tensors rely on a few pieces of information that, together with their storage, unequivocally define them:
 * __size__: is a tuple indicating how many elements across each dimension
 * __offset__: is the index in the storage corresponding to the first element in the tensor. 
 * __stride__: is the number of elements in the storage that need to be skipped over to obtain the next element along each dimension


![memory](https://raw.githubusercontent.com/giulianogrossi/imgs/main/pyTorch_tutorial_imgs/stride.png)

# Initializing a Tensor



In [None]:
import torch
import numpy as np

1. With **$N$ sequential values** starting from 0 

In [None]:
# Tensor with sequential values 
N = 12
x = torch.arange(N) #, dtype=torch.float32) #check for possible data types
x

tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

2.  With **Random or Constant** values
    * Typically initialization are either with zeros, ones, some other constants, or numbers randomly sampled from a specific distribution. 
        * `zeros` allows to initialize the tensors explicitly to  0 
        * `ones` allows to initialize the tensors explicitly to  1 
        * `randn` initializes the matrix randomly sampling the elements values from a standard Gaussian (normal) distribution with a mean of 0 and a standard deviation of 1

      (``shape`` is a tuple of tensor dimensions)


In [None]:
shape = (3,4)
zeros_tensor = torch.zeros(shape)
ones_tensor = torch.ones(shape)
rand_tensor = torch.rand(shape)

print(f"Zeros Tensor: \n {zeros_tensor}")
print(f"Ones Tensor: \n {ones_tensor} \n")
print(f"Random Tensor: \n {rand_tensor} \n")

3. Specifying the exact values in a **List** and then converting it TO a Tensor

In [None]:
#List TO tensor
data = [[1, 2],[3, 4]]
x_data = torch.tensor(data) 
x_data

tensor([[1, 2],
        [3, 4]])


4. **From NumPy** arrays TO Tensors


In [None]:
# Numpy TO tensor
np_array = np.array(data)
x_np = torch.from_numpy(np_array)

5. From another **Tensor**
    * The new tensor retains the properties (shape, datatype) of the argument tensor, unless explicitly overridden



In [None]:
x_ones = torch.ones_like(x_data) # retains the properties of x_data
print(f"Ones Tensor: \n {x_ones} \n")

x_rand = torch.rand_like(x_data, dtype=torch.float) # overrides the datatype of x_data
print(f"Random Tensor: \n {x_rand} \n")

Ones Tensor: 
 tensor([[1, 1],
        [1, 1]]) 

Random Tensor: 
 tensor([[0.0993, 0.3592],
        [0.5374, 0.9289]]) 



# Attributes of a Tensor


* Tensor attributes describe their __shape__, __datatype__, and the __device__ on which they are stored
* For __device__ specification see [CUDA](#CUDA) section


In [None]:
x = torch.rand(3,4)

print(f"Shape of tensor: {x.shape}")
print(f"Datatype of tensor: {x.dtype}")
print(f"Device tensor is stored on: {x.device}")

Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


# Operations on Tensors


* Over __100 tensor operations__, including arithmetic, linear algebra, matrix manipulation (transposing, indexing, slicing), sampling and more are comprehensively described [here](https://pytorch.org/docs/stable/torch.html)
* Each of these operations can be __run on the GPU__
* By default, tensors are created on the CPU and we need to __explicitly move tensors to the GPU__ 
* Keep in mind that copying __large tensors__ across devices can be __expensive__ in terms of time and memory!

  * **Elementwise operations**

Some of the simplest and most useful operations
are the **elementwise** operations.
These apply a standard scalar operation
to each element of an array.
For functions that take two arrays as inputs,
elementwise operations apply some standard binary operator
on each pair of corresponding elements from the two arrays.
We can create an elementwise function from any function
that maps from a scalar to a scalar.

In mathematical notation, we would denote such
a *unary* scalar operator (taking one input)
by the signature $f: \mathbb{R} \rightarrow \mathbb{R}$.
This just means that the function is mapping
from any real number ($\mathbb{R}$) onto another.
Likewise, we denote a *binary* scalar operator
(taking two real inputs, and yielding one output)
by the signature $f: \mathbb{R}, \mathbb{R} \rightarrow \mathbb{R}$.
Given any two vectors $\mathbf{u}$ and $\mathbf{v}$ *of the same shape*,
and a binary operator $f$, we can produce a vector
$\mathbf{c} = F(\mathbf{u},\mathbf{v})$
by setting $c_i \gets f(u_i, v_i)$ for all $i$,
where $c_i, u_i$, and $v_i$ are the $i^\mathrm{th}$ elements
of vectors $\mathbf{c}, \mathbf{u}$, and $\mathbf{v}$.
Here, we produced the vector-valued
$F: \mathbb{R}^d, \mathbb{R}^d \rightarrow \mathbb{R}^d$
by **lifting** the scalar function to an elementwise vector operation.

The common standard arithmetic operators
(`+`, `-`, `*`, `/`, and `**`)
have all been **lifted to elementwise operations
for any identically-shaped tensors of arbitrary shape**.
In the following example, we use commas to formulate a 5-element tuple,
where each element is the result of an elementwise operation.


In [None]:
# Standard arithmetic operations (+, -, *, /, **) automatically *lifted* to elementwise opeations
x = torch.tensor([1,2,4,8]) #TRY setting dtype= torch.float32  OR dtype= torch.uint8
y = torch.tensor([0,3,5,7])
x+y, x-y, x*y, x/y, x**y

(tensor([ 1.,  5.,  9., 15.]),
 tensor([ 1., -1., -1.,  1.]),
 tensor([ 0.,  6., 20., 56.]),
 tensor([   inf, 0.6667, 0.8000, 1.1429]),
 tensor([1.0000e+00, 8.0000e+00, 1.0240e+03, 2.0972e+06]))

In [None]:
# Other operation applied elementwise

z = torch.exp(x)
z

tensor([2.7183e+00, 7.3891e+00, 5.4598e+01, 2.9810e+03])

* **Linear algebra operations**



In [None]:
# These are 4 equivalent ways to computes the matrix multiplication between two tensors x and y. 

x = torch.tensor([1,2,4,8], dtype= torch.float32)
y = torch.tensor([0,3,5,7], dtype= torch.float32)

z1 = x @ y
print(z1)

z2 = x.matmul(y)
print(z2)

z3 = torch.matmul(x, y)
print(z3)

z4 = torch.rand_like(x)
torch.matmul(x, y, out=z4)
print(z4)

tensor(82.)
tensor(82.)
tensor(82.)
tensor(82.)


  app.launch_new_instance()


* **Broadcasting mechanism**

Allows to apply elementwise op. to matrices of different shapes:
replicate the rows and colums to make the 2 matrices of the same shape,
then apply the elementwise operation

In [None]:
a = torch.arange(3).reshape((3, 1))
b = torch.arange(2).reshape((1, 2))
C = a + b

print(f" a: {a}, \n b: {b}, \n => a+b: {C} ")

A = torch.arange(6).reshape((3, 2))
b = torch.arange(2).reshape((1, 2))
D = A + b

print(f" A: {A}, \n b: {b}, \n => A+b: {D}")

 a: tensor([[0],
        [1],
        [2]]), 
 b: tensor([[0, 1]]), 
 => a+b: tensor([[0, 1],
        [1, 2],
        [2, 3]]) 
 A: tensor([[0, 1],
        [2, 3],
        [4, 5]]), 
 b: tensor([[0, 1]]), 
 => A+b: tensor([[0, 2],
        [2, 4],
        [4, 6]])


# Joining tensors

* You can use ``torch.cat`` to concatenate a sequence of tensors along a given dimension

In [None]:
#concatenate 1D vectors:
t1 = torch.cat([x, y], dim=0)
print(t1)
print(x.shape, y.shape, t1.shape)

#concatenate matrix:
X = torch.arange(12, dtype=torch.float32).reshape((3,4))
Y = torch.tensor([[2.0, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
T2= torch.cat((X, Y), dim=0)
T3 = torch.cat((X, Y), dim=1)
print(T2)
print(T3)
print(T2.shape, T3.shape)

tensor([1., 2., 4., 8., 0., 3., 5., 7.])
torch.Size([4]) torch.Size([4]) torch.Size([8])
tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.],
        [ 2.,  1.,  4.,  3.],
        [ 1.,  2.,  3.,  4.],
        [ 4.,  3.,  2.,  1.]])
tensor([[ 0.,  1.,  2.,  3.,  2.,  1.,  4.,  3.],
        [ 4.,  5.,  6.,  7.,  1.,  2.,  3.,  4.],
        [ 8.,  9., 10., 11.,  4.,  3.,  2.,  1.]])
torch.Size([6, 4]) torch.Size([3, 8])


# Standard numpy-like indexing and slicing

* Tensors use indexing notation, which also applies to standard Python lists

<img src="https://raw.githubusercontent.com/giulianogrossi/imgs/main/pyTorch_tutorial_imgs/indexing.jpg" alt="Drawing" style="width: 70%"/>

In [None]:
some_list = list(range(6)) 
print(some_list[:]) 
print(some_list[1:4])
print(some_list[1:])
print(some_list[:4]) 
print(some_list[:-1])
print(some_list[1:4:2])

[0, 1, 2, 3, 4, 5]
[1, 2, 3]
[1, 2, 3, 4, 5]
[0, 1, 2, 3]
[0, 1, 2, 3, 4]
[1, 3]


In [None]:
x = torch.ones(4, 4)
print('First row: ',x[0])
print('First column: ', x[:, 0])
print('Last column:', x[..., -1])
x[:,1] = 0
print(x)

First row:  tensor([1., 1., 1., 1.])
First column:  tensor([1., 1., 1., 1.])
Last column: tensor([1., 1., 1., 1.])
tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])


# Single-element tensors 

* If you have a one-element tensor, for example by aggregating all values of a tensor into one value, you can convert it to a Python numerical value using ``item()``



In [None]:
agg = x.sum()
agg_item = agg.item()  
print(agg_item, type(agg_item))

15.0 <class 'float'>


# Tensor element types

* Numbers in Python are objects
* Lists in Python are meant for sequential collections of objects
* The Python interpreter is slow compared to optimized, compiled code
* Data science libraries rely on NumPy or introduce __dedicated data structures__ like PyTorch tensors, which provide efficient __low-level implementations__ of numerical data structures and __related operations__ on them, wrapped in a convenient high-level API

## Data type

* The `dtype` argument to tensor constructors specifies the __numerical data type__ that will be contained in the tensor
* Here’s a list of the possible values for the __dtype argument__: 
  * `torch.float32` or `torch.float`: 32-bit floating-point
  * `torch.float64` or `torch.double`: 64-bit, double-precision floating-point 
  * `torch.float16` or torch.half: 16-bit, half-precision floating-point 
  * `torch.int8`: signed 8-bit integers 
  * `torch.uint8`: unsigned 8-bit integers  
  * `torch.int16` or torch.short: signed 16-bit integers 
  * `torch.int32` or torch.int: signed 32-bit integers  
  * `torch.int64` or torch.long: signed 64-bit integers 
  * `torch.bool`: Boolean

## Managing a tensor’s dtype attribute

* In order to allocate a tensor of the right numeric type, we can __specify__ the __proper__ `dtype` as an argument to the constructor

In [None]:
double_points = torch.ones(10, 2, dtype=torch.double) 
short_points = torch.tensor([[1, 2], [3, 4]], dtype=torch.short)
print(short_points.dtype)
print(double_points.dtype)

torch.int16
torch.float64


In [None]:
# casting method, such as
double_points = torch.zeros(10, 2).double() 
short_points = torch.ones(10, 2).short()
print(short_points.dtype)
print(double_points.dtype)

torch.int16
torch.float64


In [None]:
# the more convenient "to" method:
double_points = torch.zeros(10, 2).to(torch.double) 
short_points = torch.ones(10, 2).to(dtype=torch.short)
print(short_points.dtype)
print(double_points.dtype)

torch.int16
torch.float64


# In-place operations

Observe:  x = x + 2 allocate new memory 

In [None]:
before = id(x) #x address before the operation
x = x+2 
after = id(x) #x address after the operation
print(before == after)

False



* Operations that store the result into the operand are called in-place. They are denoted by a ``_`` suffix
* For example: ``x.copy_(y)``, ``x.t_()``, will change ``x``


In [None]:
# In place using the suffix _before = id(x)
print(x, "\n")
x.add_(5)
after = id(x)
print(x)
print(before == after) 

tensor([[20., 19., 20., 20.],
        [20., 19., 20., 20.],
        [20., 19., 20., 20.],
        [20., 19., 20., 20.]]) 

tensor([[25., 24., 25., 25.],
        [25., 24., 25., 25.],
        [25., 24., 25., 25.],
        [25., 24., 25., 25.]])
True


In [None]:
# In place using the assignement to preallocated x as x[:]
before = id(x)
x[:]  = x +  5
after = id(x)
print(before == after)

True


In [None]:
# In place using the operator +=
before = id(x)
x  +=  5
after = id(x)
print(before == after)

## Note
<div class="alert alert-info">
<p>In-place operations save some memory, but can be problematic when computing derivatives because of an immediate loss of history. Hence, their use is discouraged.</p>
</div>

# Bridge with NumPy

* Tensors on the CPU and NumPy arrays can share their underlying memory locations, and changing one will change	the other



Tensor to NumPy array

In [None]:
t = torch.ones(5)
print(f"t: {t}")
n = t.numpy()
print(f"n: {n}")

t: tensor([1., 1., 1., 1., 1.])
n: [1. 1. 1. 1. 1.]


NOTE: t and n **share the underlying memory**: 
==> A change in the tensor reflects in the NumPy array

In [None]:
t.add_(1)
print(f"t: {t}")
print(f"n: {n}")

t: tensor([3., 3., 3., 3., 3.])
n: [3. 3. 3. 3. 3.]


## NumPy array to Tensor

In [None]:
n = np.ones(5)
t = torch.from_numpy(n)

Again, changes in the NumPy array reflects in the tensor.



In [None]:
np.add(n, 1, out=n)
print(f"t: {t}")
print(f"n: {n}")

t: tensor([2., 2., 2., 2., 2.], dtype=torch.float64)
n: [2. 2. 2. 2. 2.]


# CUDA

* PyTorch __tensors__ also can be __stored__ on a graphics processing unit (__GPU__) in order to perform massively parallel, fast computations
* All __operations__ that will be performed on the tensor will be carried out using __GPU-specific routines__ that come with PyTorch
* To call API CUDA use pycuda or numba

In [None]:
# Check if GPU is available
print(torch.cuda.is_available())

## Managing a tensor’s device attribute

* PyTorch Tensor also has the notion of __device__, which is where on the computer the tensor data is placed

In [None]:
points_gpu = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]], device='cuda')
points_gpu

* We could instead __copy a tensor__ created on the CPU onto the GPU using the to method

In [None]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
points_gpu = points.to(device='cuda')
points_gpu

## Multiple GPUs

* If our machine has __more__ than one __GPU__, we can also decide on which GPU we allocate the tensor by passing a __zero-based integer__ identifying the GPU on the machine

In [None]:
points_gpu = points.to(device='cuda:0')
points_gpu

In [None]:
# We move our tensor to the GPU if available
if torch.cuda.is_available():
  tensor = tensor.to('cuda')

* Any operation performed on the tensor, such as multiplying all elements by a constant, is carried out on the GPU

In [None]:
# Multiplication performed on the CPU
points = 2 * points

# Multiplication performed on the GPU
points_gpu = 2* points.to(device='cuda')
points_gpu

* Note that the `points_gpu` tensor is not brought back to the CPU once the result has been computed
* In order to move the __tensor back__ to the CPU, we need to provide a __cpu argument__ to the `to` method

In [None]:
points_cpu = points_gpu.to(device='cpu')
points_cpu.device

* We can also use the __shorthand methods__ cpu and cuda instead of the to method to achieve the same goal

In [None]:
points_gpu = points.cuda() # default to GPU index 0
points_gpu = points.cuda(0) 
points_cpu = points_gpu.cpu()
