<font color="">**Introduction to PyTorch**</font>  
<font color="">**Tensors in PyTorch**</font>  
<u>**Lecturer**</u>: <font color="DeepSkyBlue">*Khiem Nguyen*</font> -- *James Watt School of Engeering - University of Glasgow*

# PyTorch

**Note** 

This notebook is inspired by the tutorial series [Tutorial on Pytorch](https://pytorch.org/tutorials/beginner/basics/tensorqs_tutorial.html). However, if you read along, you will find there are more detailed explanation and useful information. The online tutorial assumes that you understand NumPy and Python very well, thus provides not so much explanation.

## Background

PyTorch is a machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella. It is one of the most popular deep learining frameworks, alongside others such as TensorFlow, offering free and open-source software. As the name prefix **Py** suggests, PyTorch has Python interface which is the primary focus of development. However, PyTorch also has a C++ interface and is mostly developed in the C++ langage base. for performance purposes.

Meta (Facebook in the past) operates both PyTorch. In September 2022, Meta announced that PyTroch would be governed by the independent PyTorch Foundation, a newly created subsididary of the Linux Foundation. This is apparently a good news as Linux is unarguably the most widely used operation system in many electrical devices. In fact, PyTorch is unarguably the most widely used machine learning framework at the point of writing this Jupyter Notebook.

## Tensors

In simplified langauge, tensors are a specialized data structure that are very similar to arrays and matrices. A tensor is a mathematical tool which is widely used in various research areas of physics and engineering. If you learn, for example, general relativity, quantum mechanics, fluid mechanics, solid mechanics, you will definitely learn about Tensors. As it is widely used in various branches of research, it is not a surprise you see it in machine learning too. In mathematics, a tensor is an algebraic object that describes a multi-linear relationship betwen sets of algebraic objects related to a vector space. For example, a matrix is just a representation of a mapping from one multi-dimensional space to another multi-dimensional space. In a multi-dimensional space, a tensor can be represented by a numerical array. Of course, one cannot learn a complex mathematical concept by just reading Wikpedia, but here it is for your own interest: [Wikipedia Link](https://en.wikipedia.org/wiki/Tensor)

**Important Remark**

Although it is a complex mathematical concept, it is safe for us in this course and many engineering courses to view a tensor as a multi-dimensional array just like an array in *NumPy*. In this notebook, we assume that you are somewhat familiar with ndarrays defined in NumPy. You will see that tensors and ndarrays are very similar but not identical.

### Tensor initialization

We define a tensor just like we define a NumPy array. We can also create tensors from NumPy arrays by converting an ndarray into a tensor.

In [2]:
import torch
import numpy as np

In [3]:
data = [[1, 2], [3, 4]]         
x_data = torch.tensor(data)     # look just like x = np.array(data)

In [4]:
np_array = np.array(data)           # create a numpy array
x_np = torch.from_numpy(np_array)   # convert the numpy array to a tensor
print(x_np)

tensor([[1, 2],
        [3, 4]])


Just like NumPy, we can create tensors using the same functions as in NumPy. The following are some functions you should find familiar from NumPy experience. It is best to illustrate this by examples.

In [5]:
ones_tensor = torch.ones(size=(2, 3))       # Note the keyword argument "size", its counterpart in NumPy is "shape"
zeros_tensor = torch.zeros((2, 3))          # We don't have to use keyword argument though.
rand_tensor = torch.rand(size=(2, 3))

# You can also define the shape as tuple with the last empty element
shape = (2, 3,)
two_tensor = 2 * torch.ones(size=shape)     # the multiplication is element-wise operated

print(f"ones_tensor =\n{ones_tensor}\n")
print(f"zeros_tensor =\n{zeros_tensor}\n")
print(f"rand_tensor =\n{rand_tensor}\n")
print(f"two_tensor =\n{two_tensor}")

ones_tensor =
tensor([[1., 1., 1.],
        [1., 1., 1.]])

zeros_tensor =
tensor([[0., 0., 0.],
        [0., 0., 0.]])

rand_tensor =
tensor([[0.3206, 0.5622, 0.9485],
        [0.8694, 0.5978, 0.2762]])

two_tensor =
tensor([[2., 2., 2.],
        [2., 2., 2.]])


Just like in NumPy, we have `torch.ones_like(data)`, `torch.zeros_like(data)` and `torch.rand_like(data)` in Torch.
- `torch.ones_like()` create a tensor of all elements with value $1$ with the shape/size taken from the shape of data.

In [6]:
x = torch.tensor([[1, 2, 4], [4.4, 5.6, 6.5]], dtype=torch.int16)     # You can specify the data type too.
x_ones = torch.ones_like(x)

# if you don't define the dtype, it will give error as it will infer from 
# the data type of x as integer. However, is not possible for a number 
# between 0 and 1 to be an integer.
x_rand = torch.rand_like(x, dtype=torch.float32)    

print(f"x =\n{x}\n")            # the floating values are then round up to integers.
print(f"x_rand =\n{x_rand}\n")

x =
tensor([[1, 2, 4],
        [4, 5, 6]], dtype=torch.int16)

x_rand =
tensor([[0.7990, 0.4156, 0.7186],
        [0.8311, 0.4549, 0.4889]])



### Attributes of tensor

Tensor has various attributes. Three commonly used attributes are **shape**, **datatype**, and **device** on which they are stored. We can create tensor on CPU or GPU. If the tensor is defined on CPU, the device value is `'cpu'` and if the tensor is defined on CUDA, the device value is `'cuda'`

In [7]:
tensor = torch.rand(3, 4)
print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}")
print(f"Device tensor is stored on: {tensor.device}")

Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


In [8]:
# My local machine (laptop) has Nvidia GPU. I also install pytorch on CUDA.
# Therefore, I can define a tensor on CUDA.
tensor = torch.rand(size=(3, 4), device='cuda')         
print(f"Device tensor is stored on: {tensor.device}")

Device tensor is stored on: cuda:0


## Operations on Tensors

Over $100$ tensor operations, including arithmetic, linear algebra, matrix manipulation such as transposing, indexing, sclicing, sampling and more are comprehensively described [here](https://pytorch.org/docs/stable/torch.html). We will go through some of the most basic and commonly used operations. It is best to learn them by examples. For more information, it is advisable to google the syntax or to use the Python documentation with the question mark `function_name?` syntax

By default, tensors are created on the CPU. We need to explicitly move tensors to the GPU using .to method (after checking for GPU availability). Keep in mind that copying large tensors across devices can be expensive in terms of time and memory!

#### Standar numpy-like indexing and slicing

In [9]:
tensor = torch.ones((4, 4))
print(f"First row: {tensor[0]}")
print(f"First column: {tensor[:, 0]}")
print(f"Last column: {tensor[..., -1]}")

First row: tensor([1., 1., 1., 1.])
First column: tensor([1., 1., 1., 1.])
Last column: tensor([1., 1., 1., 1.])


In [10]:
# Change all the values in the second column to 0
tensor[:,1] = 0
print(tensor)

tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])


**Joining tensors** &nbsp; We can use `torch.cat()` to concatenate a sequence of tensors along a given dimension. Another joining operator that is subtly different from `torch.cat()` is `torch.stack()`. It is always good to have a look at the online documentation at one point in your life 😄. You can easily google them; here are two examples:
- [torch.cat](https://pytorch.org/docs/stable/generated/torch.stack.html)
- [torch.stack](https://pytorch.org/docs/stable/generated/torch.stack.html)

In [11]:
t1 = torch.cat([tensor, tensor, tensor], dim=1)
print(t1)

tensor([[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.]])


### Arithmetic operations

Just like NumPy, all of the standard arithmetic operations are executed in the element-wise fashion. We also have matrix multiplication between two tensors (2D tensor).

In [12]:
a = torch.tensor([1, 2, 3])
b = torch.tensor([4, 5, 6])
print(f"a + b = {a + b}")
print(f"a - b = {a - b}")
print(f"a * b = {a * b}")
print(f"a / b = {a / b}")

a + b = tensor([5, 7, 9])
a - b = tensor([-3, -3, -3])
a * b = tensor([ 4, 10, 18])
a / b = tensor([0.2500, 0.4000, 0.5000])


We can also use functions for operations just like in NumPy. Of course, it is more convenient to use the operators than to use object methods.

In [13]:
print(f"a + b = {a.add(b)}")
print(f"a - b = {a.sub(b)}")
print(f"a * b = {a.mul(b)}")
print(f"a / b = {a.div(b)}")

a + b = tensor([5, 7, 9])
a - b = tensor([-3, -3, -3])
a * b = tensor([ 4, 10, 18])
a / b = tensor([0.2500, 0.4000, 0.5000])


In [14]:
a = torch.tensor([[1, 2, 3], [4, 5, 6]],  dtype=torch.float64)  # if you don't specify the data type, it will be inferred as integer.
b = torch.ones(size=(3, 4))
print(f"a @ b =\n{a.matmul(b)}")
print(f"a @ b =\n{a @ b}")

RuntimeError: expected m1 and m2 to have the same dtype, but got: double != float

### Aggregation

I don't think I need to write "just like NumPy" anymore, because there are really a lot of things like NumPy. Before going to examples, let me repeat that we can perform aggregation operations by using the object methods acting on the objects themself, or by using the library functions receiving objects as input arguments.

In [14]:
x = torch.ones(size=(2, 3, 4))
print(f"sum(x)  = {torch.sum(x)}\n")
print(f"x.sum() = {x.sum()}\n")

# We can perform aggregation along particular dimensions
print(f"x.sum(dim=2) =\n{x.sum(dim=2)}\n")
print(f"torch.sum(x, dim=2) =\n{torch.sum(x, dim=2)}\n")

sum(x)  = 24.0

x.sum() = 24.0

x.sum(dim=2) =
tensor([[4., 4., 4.],
        [4., 4., 4.]])

torch.sum(x, dim=2) =
tensor([[4., 4., 4.],
        [4., 4., 4.]])



**Single-element tensors** &nbsp; If we have a one-element tensor, for example, by aggregating all values of a tensor into one value (as in the above example), we can convert it to a Python numerical value using `item()` 

In [15]:
x = torch.ones(size=(2, 3))
agg = x.sum()
agg_item = agg.item()
print(agg, "--", type(agg))
print(agg_item, "--", type(agg_item))

tensor(6.) -- <class 'torch.Tensor'>
6.0 -- <class 'float'>


**In-place operations** &nbsp; Operations that store the result into the operand are called in-place. They are denoted by a `_` suffix. For example: `x.copy_(y)`, `x.t_()`, will change `x`.

In [16]:
x = torch.ones(size=(2, 3))
x.add_(5)       # this is essentially equal to x += 5
print(x)

x = torch.ones(size=(2, 3))
x += 5
print(x)

tensor([[6., 6., 6.],
        [6., 6., 6.]])
tensor([[6., 6., 6.],
        [6., 6., 6.]])


### Bridge with NumPy

Tensors on the CPU and NumPy arrays can share their underlying memory locations, and changing one will change the other.

#### Tensor to NumPy array

In [17]:
x_torch = torch.ones(5)     # this is a torch tensor

x_numpy = x_torch.numpy()         # convert a tensor to a numpy array
print(f"x_torch {type(x_torch)} =\n{repr(x_torch)}\n")  # repr is for developers, str is for normal users
print(f"x_numpy {type(x_numpy)} =\n{repr(x_numpy)}\n")

print("Difference between repr vs str\n" + 60*"=")
print(f"repr(x_numpy) = {repr(x_numpy)}")
print(f"str(x_numpy) = {str(x_numpy)} \n")

# You don't see the difference for tensor though
print(f"repr(x_torch) = {repr(x_torch)}")
print(f"str(x_torch) = {str(x_torch)}")

x_torch <class 'torch.Tensor'> =
tensor([1., 1., 1., 1., 1.])

x_numpy <class 'numpy.ndarray'> =
array([1., 1., 1., 1., 1.], dtype=float32)

Difference between repr vs str
repr(x_numpy) = array([1., 1., 1., 1., 1.], dtype=float32)
str(x_numpy) = [1. 1. 1. 1. 1.] 

repr(x_torch) = tensor([1., 1., 1., 1., 1.])
str(x_torch) = tensor([1., 1., 1., 1., 1.])


A change in the tensor reflects in the NumPy array

In [18]:
x_torch.add_(1)
print(f"x_torch =\n{repr(x_torch)}")
print(f"x_numpy =\n{repr(x_numpy)}")

x_torch =
tensor([2., 2., 2., 2., 2.])
x_numpy =
array([2., 2., 2., 2., 2.], dtype=float32)


#### NumPy array to Tensor

In [15]:
y_numpy = 2 * np.ones(5)
y_torch = torch.from_numpy(y_numpy)

# Again, a change the in NumPy array reflects in the tensor.
np.add(y_numpy, 1, out=y_numpy)    # this is just y_numpy += 1

print(f"y_torch = {y_torch}")
print(f"y_numpy = {y_numpy}")

y_torch = tensor([3., 3., 3., 3., 3.], dtype=torch.float64)
y_numpy = [3. 3. 3. 3. 3.]


**Remark on default data type in NumPy and Torch** 


If you pay enough attention, you see that by default NumPy implies data type `np.float64` but Torch implies data type `torch.float32`. So, if we create a tensor, we have **float32** data array. Therefore, when we convert that tensor to a NumPy array, that NumPy array will be inferred with the data type **float32**. On the other hand, we create a NumPy array, we have **float64** data array and obtain the tensor of data type **float64** on the conversion from NumPy array to Torch array. This conversion normally creates trouble when we try to compute the gradient of a function when we have to convert NumPy array to Torch array and have other Torch array of data type **float32**

In [16]:
x_numpy = np.random.rand(3, 2)
x_torch = torch.from_numpy(x_numpy)
print(f"x_numpy.dtype = {x_numpy.dtype}")
print(f"x_torch.dtype = {x_torch.dtype}")

x_numpy.dtype = float64
x_torch.dtype = torch.float64


In [17]:
x_torch = torch.rand(size=(3, 2))
x_numpy = x_torch.numpy()
print(f"x_numpy.dtype = {x_numpy.dtype}")
print(f"x_torch.dtype = {x_torch.dtype}")

x_numpy.dtype = float32
x_torch.dtype = torch.float32


In [18]:
x = torch.rand(size=(3, 2))
y = torch.from_numpy(np.ones(shape=(3, 2)))
print(x + y)    # it automatically converts to "bigger" datatype

tensor([[1.5262, 1.4955],
        [1.4083, 1.3363],
        [1.2626, 1.0254]], dtype=torch.float64)


**But** things might go wrong sometimes when we try to train a neural network. We will learn more about **automatic differentiaion** in the next Jupyter Notebook