<a href="https://colab.research.google.com/github/nmacdonald347/MLforNLP1/blob/main/Copy_of_LAB_9_1_pytorch_tensor_basics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **PyTorch Basics Tutorial**

The following tutorial is a slightly modified of the [pytorch basics colab notebook](https://colab.research.google.com/drive/1-L2LJmiV_rgCtzsIJOMfF_9fwJuxsj09), from [Diyi Yang's NLP course](https://www.cc.gatech.edu/classes/AY2020/cs7650_spring/).

The notebook is itself adapted from the [DeepLearningForNLPInPytorch tutorial](https://github.com/rguthrie3/DeepLearningForNLPInPytorch/blob/master/Deep%20Learning%20for%20Natural%20Language%20Processing%20with%20Pytorch.ipynb) and the [Official PyTorch tutorials](https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html#sphx-glr-beginner-blitz-tensor-tutorial-py).

PyTorch is a python library for deep learning, mainly developed by FaceBook, in open source. It is a python re-implementation of torch, a package initially developed at IDIAP (Switwerland) by Collobert, Bengio and Mariéthoz.
 PyTorch uses underlying C libraries, for more speed.

A PyTorch characteristic is that the backpropagation algorithm can be applied to a **dynamically defined computation graph**, whereas TensorFlow (Google's library for deep learning) was originally developped using static computation graphs.


## if running on your computer : torch installation

see precise installation command at https://pytorch.org/get-started/locally/

 - your computer has CPU only
 - no need for torchvision nor torchaudio

**Via Anaconda/Miniconda:**  
`conda install pytorch -c pytorch`

**Via pip:**  
`pip3 install torch`


In [None]:
import torch
print(torch.__version__)

2.1.0+cu118


## 1 the Tensor type in torch
Tensors are similar to NumPy’s ndarrays, with the addition being that Tensors can also be used on a GPU to accelerate computing.

### 1.1 Creating Tensors from data
Tensors can be created from Python lists with the `torch.Tensor()` function.

In [None]:
# Example with 1-D data
data = [1.0, 2.0, 3.0]
tensor = torch.Tensor(data)
print("Example with 1-D data")
print(tensor)

# Example with 2-D data
data = [[1., 2., 3.], [4., 5., 6]]
tensor = torch.Tensor(data)
print("\nExample with 2-D data")
print(tensor)

# Example with 3-D data
data = [[[1.,2.], [3.,4.]],
        [[5.,6.], [7.,8.]]]
tensor = torch.Tensor(data)
print("\nExample with 3-D data")
print(tensor)

Example with 1-D data
tensor([1., 2., 3.])

Example with 2-D data
tensor([[1., 2., 3.],
        [4., 5., 6.]])

Example with 3-D data
tensor([[[1., 2.],
         [3., 4.]],

        [[5., 6.],
         [7., 8.]]])


### 1.2 Initializing an empty Tensor
An uninitialized matrix is declared, but does not contain definite known values before it is used. When an uninitialized matrix is created, whatever values were in the allocated memory at the time will appear as the initial values.


In [None]:
# Construct a 2x3 matrix, uninitialized
x = torch.empty(2, 3)
print(x)

tensor([[-1.3921e-05,  3.3344e-41, -1.3926e-05],
        [ 3.3344e-41,  3.5032e-44,  8.3997e-33]])


### 1.3 Randomly initialized Tensor



In [None]:
x = torch.rand(2, 3)
print(x)

tensor([[0.5805, 0.0918, 0.4665],
        [0.9227, 0.2297, 0.6936]])


### 1.4 Tensor with zeros or ones

In [None]:
# Create a matrix of all zeros
x = torch.zeros(2, 3)
print("Matrix of zeros")
print(x)

# Create a matrix of all zeros and explicitly set data type to be long int
x = torch.zeros(2, 3, dtype=torch.long)
print("\nMatrix of zeros typecasted to long")
print(x)

x = torch.ones(2, 3, dtype=torch.long)
print("\nMatrix of ones typecasted to long")
print(x)


Matrix of zeros
tensor([[0., 0., 0.],
        [0., 0., 0.]])

Matrix of zeros typecasted to long
tensor([[0, 0, 0],
        [0, 0, 0]])

Matrix of ones typecasted to long
tensor([[1, 1, 1],
        [1, 1, 1]])


### 1.5 Create Tensor based on existing Tensor

In [None]:
x = x.new_ones(2, 3, dtype=torch.double)      # new_* methods take in sizes
print(x)

x = torch.randn_like(x, dtype=torch.float)    # override dtype!
print(x)                                      # result has the same size

tensor([[1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)
tensor([[ 1.3917, -1.3365, -0.3134],
        [ 1.8803, -0.1613, -0.3868]])


### 1.6 Size of a Tensor
`torch.Size` is in fact a tuple, so it supports all tuple operations.

In [None]:
# Example with 1-D data
data = [1.0, 2.0, 3.0]
tensor = torch.Tensor(data)
print("Example with 1-D data")
print(tensor)
print(tensor.size())

# Example with 2-D data
data = [[1., 2., 3.], [4., 5., 6]]
tensor = torch.Tensor(data)
print("\nExample with 2-D data")
print(tensor)
print(tensor.size())

# Example with 3-D data
data = [[[1.,2.], [3.,4.]],
        [[5.,6.], [7.,8.]]]
tensor = torch.Tensor(data)
print("\nExample with 3-D data")
print(tensor)
print(tensor.size())
print(tensor.size()[0]) # taille de la 1ere dim

Example with 1-D data
tensor([1., 2., 3.])
torch.Size([3])

Example with 2-D data
tensor([[1., 2., 3.],
        [4., 5., 6.]])
torch.Size([2, 3])

Example with 3-D data
tensor([[[1., 2.],
         [3., 4.]],

        [[5., 6.],
         [7., 8.]]])
torch.Size([2, 2, 2])
2


### 1.7 Operations with Tensors
Most operations are very similar to NumPy.

In [None]:
# Addition
x = torch.Tensor([ 1., 2., 3. ])
y = torch.Tensor([ 4., 5., 6. ])

# using arithmetic operation
z = x + y
print(z)

# using method
print(torch.add(x, y))

# using method and providing an output tensor as argument
output = torch.empty(2, 3)
torch.add(x, y, out=output)
print(output)

tensor([5., 7., 9.])
tensor([5., 7., 9.])
tensor([5., 7., 9.])


  torch.add(x, y, out=output)


In [None]:
# In-place addition

x = torch.Tensor([ 1., 2., 3. ])
y = torch.Tensor([ 4., 5., 6. ])

y.add_(x)
print(y)

tensor([5., 7., 9.])


Any operation that mutates a tensor in-place is post-fixed with an underscore `_`. For example: `x.copy_(y)`, `x.t_()`, will change `x`.

See [the PyTorch official documentation](http://pytorch.org/docs/torch.html) for a complete list of the massive number of operations available to you.  They expand beyond just mathematical operations.


In [None]:
# Indexing

x = torch.Tensor([[1., 2., 3.], [4., 5., 6]])
print(x)
print(x[:, 1]) # Gets column with index 1

tensor([[1., 2., 3.],
        [4., 5., 6.]])
tensor([2., 5.])


### 1.8 Reshaping Tensors

In [None]:
#@@ view in pytorch is the equivalent of reshape in numpy
x = torch.tensor(range(16)).view(4,4)
print(x)
y = x.view(16)
print(y)
z = x.view(-1, 8)  # the size -1 is inferred from other dimensions
print(z)
print(x.size(), y.size(), z.size())

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15]])
tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])
tensor([[ 0,  1,  2,  3,  4,  5,  6,  7],
        [ 8,  9, 10, 11, 12, 13, 14, 15]])
torch.Size([4, 4]) torch.Size([16]) torch.Size([2, 8])


In [None]:
#@@ underlying data is NOT copied
x[1,2] = 17
print(x)
print(z)

tensor([[ 0,  1,  2,  3],
        [ 4,  5, 17,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15]])
tensor([[ 0,  1,  2,  3,  4,  5, 17,  7],
        [ 8,  9, 10, 11, 12, 13, 14, 15]])


In [None]:
#@@ .shape is equivalent to .size()
print(z.shape)
print(x.shape)

torch.Size([2, 8])
torch.Size([4, 4])


### Changing the number of axis of a tensor

In [None]:
#@@ view can hence also be used to add a 1-sized axis
x = torch.randn(3)
print(x)
print(x.shape)

print(x.view(3,1))
print(x.view(3,1).shape)

# if -1 can be used, the size on this axis will be automatically computed
print(x.view(-1,3))
print(x.view(-1,3).shape)

print(x.view(-1,3,1))
print(x.view(-1,3,1).shape)

tensor([-1.5573,  0.7979, -1.9616])
torch.Size([3])
tensor([[-1.5573],
        [ 0.7979],
        [-1.9616]])
torch.Size([3, 1])
tensor([[-1.5573,  0.7979, -1.9616]])
torch.Size([1, 3])
tensor([[[-1.5573],
         [ 0.7979],
         [-1.9616]]])
torch.Size([1, 3, 1])


In [None]:
# unsqueeze adds an axis of size 1
x = torch.randn(3)
print(x)
print(x.shape)

# add an axis as first axis (cf. dim=0)
y= x.unsqueeze(dim=0)
print(y)
print(y.shape)

# add an axis as second axis
z= x.unsqueeze(dim=1)
print(z)
print(z.shape)

tensor([-1.1234,  0.3913, -0.6254])
torch.Size([3])
tensor([[-1.1234,  0.3913, -0.6254]])
torch.Size([1, 3])
tensor([[-1.1234],
        [ 0.3913],
        [-0.6254]])
torch.Size([3, 1])


In [None]:
# squeeze removed all axis having size 1
print(list(range(6)))

z = torch.tensor(range(6))
print("z", z)
y = z.view(1, -1)
print("y", y)
x = torch.tensor(range(6)).view(2,1,3,1)
print("x", x)
# squeeze will remove any axis of size 1
print(x.squeeze())
# you can specify the axis to squeeze
print(x.squeeze(1).shape)

# trying to squeeze a axis of size > 1 will do nothing
print(x.squeeze(0).shape)

[0, 1, 2, 3, 4, 5]
z tensor([0, 1, 2, 3, 4, 5])
y tensor([[0, 1, 2, 3, 4, 5]])
x tensor([[[[0],
          [1],
          [2]]],


        [[[3],
          [4],
          [5]]]])
tensor([[0, 1, 2],
        [3, 4, 5]])
torch.Size([2, 3, 1])
torch.Size([2, 1, 3, 1])


### Getting the values of tensor cells

In [None]:
#@@ getting the plain value in a tensor's cell
x = torch.randn(2,3)
print(x)
print(x[1,0])
print(x[1,0].size())
print(x[1,0].item())

tensor([[ 0.9118,  1.9291,  0.6433],
        [-0.8046, -0.2035, -0.7048]])
tensor(-0.8046)
torch.Size([])
-0.8045608401298523


### 1.9 Converting to and from NumPy
Converting a Torch Tensor to a NumPy array and vice versa is a breeze.

The Torch Tensor and NumPy array will **share their underlying memory locations** (if the Torch Tensor is on CPU), and **changing one will change the other**.



In [None]:
a = torch.ones(5)
print("Original a:", a)

b = a.numpy()
print("Original b:", b)

a.add_(1)
print("New a:", a)
print("New b:", b)

Original a: tensor([1., 1., 1., 1., 1.])
Original b: [1. 1. 1. 1. 1.]
New a: tensor([2., 2., 2., 2., 2.])
New b: [2. 2. 2. 2. 2.]


In [None]:
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)
print(b)

[2. 2. 2. 2. 2.]
tensor([2., 2., 2., 2., 2.], dtype=torch.float64)


### 1.10. CUDA Tensors

(cf. wiki page of **CUDA** https://en.wikipedia.org/wiki/Compute_Unified_Device_Architecture :
"CUDA (Compute Unified Device Architecture) is a GPGPU technology (General-Purpose Computing on Graphics Processing Units), namely a technology using a graphical processor (GPU) to execute computations, instead of the central processing unit (CPU)."

Tensors can be moved onto any device using the `.to` method.


### Enable GPU on your Colab notebook

Runtime > Change runtime type > Hardware accelerator => select "T4 GPU", which is enough


In [None]:
# Try to run this cell with both GPU support and without
import torch
print("CUDA available?", torch.cuda.is_available())

# We will use ``torch.device`` objects to move tensors in and out of GPU
if torch.cuda.is_available():
    device = torch.device("cuda")          # a CUDA device object
    x = torch.Tensor([1.0, 2.0, 3.0])
    y = torch.ones_like(x, device=device)  # directly create a tensor on GPU
    x = x.to(device)                       # or just use strings ``.to("cuda")``
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))       # ``.to`` can also change dtype together!

CUDA available? False


## 2 OPTIONAL for now: Autograd: Automatic Differentiation

("Differentiation" = computation of partial derivatives)

This part is optional, and meant to understand how pytorch computes (or rather accumulates) gradients behind the scene.
See also this blog : https://community.paperspace.com/t/pytorch-basics-understanding-autograd-and-computation-graphs/741
which gives a good tutorial on automatic differentiation in pytorch (despite spelling errors and a few mistakes signalled in the comments).


The autograd package provides automatic differentiation for all operations on Tensors. It is a define-by-run framework, which means that the backpropagation will depend on how your code is run, and that every single iteration can be different.

``torch.Tensor`` is the central class of the package. If we have a tensor x
and set its attribute ``.requires_grad`` as ``True``, it **starts to track all operations made using it**.
If for instance z is computed using x, then calling ``z.backward()`` will **automatically update the gradient of z with respect to x**, and accumulate it into the ``x.grad`` attribute: it will add partial derivatives of z with respect to each $x_i$ to the .grad attribute of x**.


To **stop a tensor from tracking history**, you can call ``.detach()`` to detach
it from the computation history, and to prevent future computation from being
tracked.

To **prevent tracking history (and using memory)**, you can also wrap the code block
in ``with torch.no_grad():``. This can be particularly helpful when **evaluating** a
model because the model may have trainable parameters with `requires_grad=True`,
but for which we don't need the gradients.

In [None]:
import torch
x = torch.ones(2, 2, requires_grad=True)
print(x)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)


In [None]:
y = x + 2
print(y)
print(y.requires_grad)

# y was created as a result of an operation, so it has a grad_fn and requires_grad=True

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)
True


In [None]:
z = y * y * 3
out = z.mean()

print(z)
print(out)

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>)
tensor(27., grad_fn=<MeanBackward0>)


``.requires_grad_( ... )`` changes an existing Tensor's ``requires_grad``
flag in-place. The input flag defaults to ``False`` if not given.

Tensors built using at least one tensor with requires_grad == True also get requires_grad == True.

In [None]:
a = torch.randn(2, 2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)

b = torch.randn(2,2)
print(b.requires_grad)

c = (a * b).sum()
print(c.grad_fn)
print(c.requires_grad)

False
True
False
<SumBackward0 object at 0x7dd03bcf3a30>
True


Let us consider a complete example now.

Let :   
$z_i = 3(x_i+2)^2$  
$out = \frac{1}{4}\sum_i z_i$

$\frac{\partial out}{\partial x_i} = \frac{1}{4}\frac{\partial z_i}{\partial x_i} = \frac{1}{4}\times 3 \times 2(x_i+2) = \frac{3}{2}(x_i+2)$  


**When $x_i=1$**:

$z_i\bigr\rvert_{x_i=1} = 27$

$\frac{\partial out}{\partial x_i}\bigr\rvert_{x_i=1} = \frac{9}{2} = 4.5$.


**When $x_i=0$**:

$z_i\bigr\rvert_{x_i=0} = 12$

$\frac{\partial out}{\partial x_i}\bigr\rvert_{x_i=0} = \frac{9}{2} = 3$.


In [None]:

x = torch.tensor( [[0,1],[1,1]], dtype=float, requires_grad=True)
print(x)
y = x + 2
z = y * y * 3
out = z.mean()

out.backward()
print(x)
print(y)
print(z)
print(out)
print("x gradient:")
print(x.grad)


tensor([[0., 1.],
        [1., 1.]], dtype=torch.float64, requires_grad=True)
tensor([[0., 1.],
        [1., 1.]], dtype=torch.float64, requires_grad=True)
tensor([[2., 3.],
        [3., 3.]], dtype=torch.float64, grad_fn=<AddBackward0>)
tensor([[12., 27.],
        [27., 27.]], dtype=torch.float64, grad_fn=<MulBackward0>)
tensor(23.2500, dtype=torch.float64, grad_fn=<MeanBackward0>)
x gradient:
tensor([[3.0000, 4.5000],
        [4.5000, 4.5000]], dtype=torch.float64)


In [None]:
x = torch.tensor( [[0,1],[1,1]], dtype=float, requires_grad=True)
print(x)
y = x + 2
z = y * y * 3
out = z.mean()

out.backward()
print(x.grad)

# again
y = x + 2
z = y * y * 3
out = z.mean()

out.backward()
print(x.grad)


tensor([[0., 1.],
        [1., 1.]], dtype=torch.float64, requires_grad=True)
tensor([[3.0000, 4.5000],
        [4.5000, 4.5000]], dtype=torch.float64)
tensor([[6., 9.],
        [9., 9.]], dtype=torch.float64)


We see that every time a variable is back propogated through, the gradient will be **accumulated instead of being replaced**.

After parameter updates, one needs to reset gradients to zero
(see the `nn.Module.zero_grad()` call in the M1_ML2_TD7.3 notebook)


You can stop autograd from tracking history and accumulating gradients by wrapping code with `torch.no_grad()`:

In [None]:
with torch.no_grad():
    t = (10 * x + 3).mean()
    # although x has requires_grad == True
    print(x.requires_grad)
    # this new tensor has requires_grad == False
    print(t.requires_grad)

# => to be used when testing on validation or test data

True
False
