![image.png](attachment:image.png)

## What is Deep learning ?

Deep learning is a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks.

![](2022-11-27-12-05-03.png)


## What can we do with Deep Learning ?

- Image Classification

- Object Detection

- Image Segmentation

- Image Captioning

- Natural Language Processing




> Diffuses example


![](2022-11-27-11-58-29.png)



![](2022-11-27-11-59-34.png)


![](2022-11-27-11-57-52.png)


![](2022-11-27-12-00-13.png)


![](2022-11-27-12-00-33.png)


![](2022-11-27-12-01-11.png)

## PyTorch


> PyTorch is a Python-based open source and scientific computing package for building neural networks. It is dynamic graph-based framework that allows you to define your neural network in a way that is easy to understand and debug. Today, PyTorch is the [most used deep learning framework](https://paperswithcode.com/trends) and mostly use by researchers and engineers.  



> PyTorch support GPU acceleration (making your code run faster) behind the scenes, better than NumPy. PyTorch also provides Autograd for automatic differentiation, which means that your code is automatically differentiated and you can use it to do backpropagation

![](2022-11-27-12-07-54.png)

![](2022-11-27-12-08-19.png)


![](./pytorch_most_used.png)



## Who uses PyTorch?

- Many of the worlds largest technology companies such as [Meta (Facebook)](https://ai.facebook.com/blog/pytorch-builds-the-future-of-ai-and-machine-learning-at-facebook/), Tesla and Microsoft as well as artificial intelligence research companies such as [OpenAI use PyTorch](https://openai.com/blog/openai-pytorch/) to power research and bring machine learning to their products.

![pytorch being used across industry and research](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/00-pytorch-being-used-across-research-and-industry.png)

> Originalally from Meta


## Different GPU vs CPU ?

- GPU is a graphics processing unit, a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device.

- CPU is a central processing unit, the electronic circuitry within a computer that executes instructions that make up a computer program.

![](2022-11-27-12-13-52.png)
![](2022-11-27-12-14-26.png)

![](2022-11-27-12-16-28.png)



## I dont have GPU, what can I do? use Free GPU

- Google Colab:  is a free cloud service and works exactly like a Jupyter Notebook.

- You can write and execute code, save and share your analyses.

- Other free GPU services include [Kaggle Kernels](https://www.kaggle.com/kernels) and [Paperspace Gradient](https://www.paperspace.com/gradient).



## Pytoch Installation



If you're using Google Colab, you're all set.

If you're using your own computer, you'll need to install PyTorch.

To do this, you can follow the instructions on the [PyTorch website](https://pytorch.org/get-started/locally/).

For me, I am using mac and conda as package manager, I therefore  run the following command


## VERIFICATION

To verify your installation works, 


## Google Colab

- Google Colab is a free cloud service and works exactly like a Jupyter Notebook.

- You can write and execute code, save and share your analyses.

- Other free GPU services include [Kaggle Kernels](https://www.kaggle.com/kernels) and [Paperspace Gradient](https://www.paperspace.com/gradient).


In [1]:
import torch
torch.manual_seed(1234)
torch.__version__

'1.12.1'

## What we're going to cover in this module

- This course is broken down into different sections (notebooks).  Each notebook covers important ideas and concepts within PyTorch.

- Subsequent notebooks build upon knowledge from the previous one (numbering starts at 00, 01, 02 and goes to whatever it ends up going to).

- This notebook deals with the basic building block of machine learning and deep learning, the tensor.

Specifically, we're going to cover:

| **Topic** | **Contents** |
| ----- | ----- |
| **Introduction to tensors** | Tensors are the basic building block of all of machine learning and deep learning. |
| **Creating tensors** | Tensors can represent almost any kind of data (images, words, tables of numbers). |
| **Getting information from tensors** | If you can put information into a tensor, you'll want to get it out too. |
| **Manipulating tensors** | Machine learning algorithms (like neural networks) involve manipulating tensors in many different ways such as adding, multiplying, combining. | 
| **Dealing with tensor shapes** | One of the most common issues in machine learning is dealing with shape mismatches (trying to mixed wrong shaped tensors with other tensors). |
| **Indexing on tensors** | If you've indexed on a Python list or NumPy array, it's very similar with tensors, except they can have far more dimensions. |
| **Mixing PyTorch tensors and NumPy** | PyTorch plays with tensors ([`torch.Tensor`](https://pytorch.org/docs/stable/tensors.html)), NumPy likes arrays ([`np.ndarray`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html)) sometimes you'll want to mix and match these. | 
| **Reproducibility** | Machine learning is very experimental and since it uses a lot of *randomness* to work, sometimes you'll want that *randomness* to not be so random. |
| **Running tensors on GPU** | GPUs (Graphics Processing Units) make your code faster, PyTorch makes it easy to run your code on GPUs. |


## What is Tensor 

> A tensor is the primary data structure used by neural networks.

Fundamentals building block of deep learning is the tensor. A tensor is a number, vector, matrix or any n-dimensional array.

![](2022-11-27-11-56-29.png)
![](2022-11-27-12-17-20.png)
- A tensor is a number, vector, matrix or any n-dimensional array.


## What to learn

![](2022-11-27-12-18-42.png)


![](2022-11-27-12-19-03.png)


![](2022-11-27-12-19-35.png)

![](2022-11-27-12-19-51.png)


- Tensor are the standard way of representing data in Pytorch, such as **text, images, and audio**. 

- Their job is to represent data in a numerical way. 


- For example, you could represent an image as a tensor with shape `[3, 224, 224]` which would mean `[colour_channels, height, width]`, as in the image has `3` colour channels (red, green, blue), a height of `224` pixels and a width of `224` pixels.


In [2]:
a= [1,2,3]

import numpy as np
b = np.array([1,2,3])


print(a)
print(b)

[1, 2, 3]
[1 2 3]


## Road to Tensor: 

- There are many Python Data Structure for holding data including Python List and Numpy Array. 

- List and Numpy Array operations are similar to Pytorch Tensor.
  

### From Python lists to Numpy Array

 - Python does not have built-in support for Arrays, but Python Lists can be used instead.
 
 - However, Python lists has the following limitations: It takes large memory size and slow.
 
 - Numpy solved the problems with List: 

    - Size - Numpy data structures take up less space

    - Performance - they have a need for speed and are faster than lists

    - Functionality - SciPy and NumPy have optimized functions such as linear algebra operations built in.
 




### Performance comparison between Python lists and Numpy Arrays


In [3]:
import numpy as np
import time


size_of_vec = 1000

def pure_python_version():
    t1 = time.time()
    X = range(size_of_vec)
    Y = range(size_of_vec)
    Z = [X[i] + Y[i] for i in range(len(X)) ]
    return time.time() - t1

def numpy_version():
    t1 = time.time()
    X = np.arange(size_of_vec)
    Y = np.arange(size_of_vec)
    Z = X + Y
    return time.time() - t1


t1 = pure_python_version()
t2 = numpy_version()
print(t1, t2)
print("Numpy is in this example " + str(t1/t2) + " faster!")

0.00016808509826660156 0.0003018379211425781
Numpy is in this example 0.556872037914692 faster!


### Less memory usage


In [4]:
import sys

list_array = [i for i in range(100)]

print("Length of List array: ",len(list_array))

print("Type of array: ",type(list_array))

print("Memory consumption by an element in the array: ",
        sys.getsizeof(list_array[0]))

print("Memory usage by list array: ",
        (sys.getsizeof(list_array[0]))*len(list_array))

Length of List array:  100
Type of array:  <class 'list'>
Memory consumption by an element in the array:  24
Memory usage by list array:  2400


In [5]:
import numpy as np

numpy_array = np.arange(100)

print("Length of NumPy array: ",len(numpy_array))

print("Type of array: ",type(numpy_array))

print("Memory consumption by an element in the array: ",
        numpy_array.itemsize)

print("Memory usage by list array: ",
        numpy_array.itemsize*numpy_array.size)

Length of NumPy array:  100
Type of array:  <class 'numpy.ndarray'>
Memory consumption by an element in the array:  8
Memory usage by list array:  800


In [6]:
a_numpy = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

print(f"shape of the nd array: {a_numpy.shape}")  # this is the shape of the nd array 
print(f"size of the nd array: {a_numpy.size}") # this is the same as the shape of the nd array
print(f"dimension of the nd array: {a_numpy.ndim}") # this prints the dimensions of the nd array
print(f"datatype of the nd array: {a_numpy.dtype}") # this printts the datatype of the nd array


shape of the nd array: (10,)
size of the nd array: 10
dimension of the nd array: 1
datatype of the nd array: int64


## From Numpy Arrays to Torch Tensor

> Tensors are like arrays, both are data structures that are used to store data. Tensor and Numpy arrays share common operations such as shape and size.

>  Tensors are generalization of vectors and matrices to an arbitrary number of dimensions. 

![](./tensor_generalization.png)


- Similar to how Numpy provides additional support not available in the Python list, so also Tensors provides support not available in Numpy array such as:

  - GPU acceleration , which is a great advantage for deep learning,
  
  - distribute operations on multiple devices or machines,and

  - keep track of the graph of computations that created them ( usefull for backpropagation).
 

## Let us Learn Tensor

Various operations are available on tensors. In the next sections, we will discuss the following operations:

- Creating tensors.
  
- Operations with tensors.
  
- Indexing, slicing, and joining with tensors Computing gradients with tensors.
  
- Using CUDA/MPS tensors with GPUs.


## Creating tensors

- PyTorch allows us to create tensors in many different ways using the torch package. We will discuss some of these ways.

### Creating Random Tensor with a specific size
 
> `torch.tensor` is a general Tensor constructor that infer the data type automatically.


> Tensors can represent almost anything. The one we just created could be the sales numbers for a steak and almond butter store (two of my favourite foods).


![](2022-11-23-16-36-54.png)

How many dimensions do you think it has? (hint: use the square bracket counting trick)



In [7]:
import torch

a_torch = torch.tensor([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

a_torch


tensor([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [8]:
# Tensor
TENSOR = torch.tensor([[[1, 2, 3],
                        [3, 6, 9],
                        [2, 4, 5]]])
TENSOR

tensor([[[1, 2, 3],
         [3, 6, 9],
         [2, 4, 5]]])

#### Scalar

In [9]:
# Scalar
scalar = torch.tensor(7)
scalar # `scalar` is a single number, it's of type `torch.Tensor`.


tensor(7)

In [10]:
scalar.ndim

0

In [11]:
# Get the Python number within a tensor (only works with one-element tensors)
scalar.item()

7

### Vector

- A vector is a single dimension tensor but can contain many numbers.

-  As in, you could have a vector `[3, 2]` to describe `[bedrooms, bathrooms]` in your house. 

-  Or you could have `[3, 2, 2]` to describe `[bedrooms, bathrooms, car_parks]` in your house.

- The important trend here is that a vector is flexible in what it can represent (the same with tensors).

In [12]:
# Vector
vector = torch.tensor([7, 7, 7, 7]) 
vector

tensor([7, 7, 7, 7])

> You can tell the number of dimensions a tensor in PyTorch has by the number of square brackets on the outside (`[`) and you only need to count one side.


- How many square brackets does `vector` have?

- Another important concept for tensors is their `shape` attribute. The shape tells you how the elements inside them are arranged.

Let's check out the shape of `vector`.



In [13]:
torch.Size(vector)

torch.Size([7, 7, 7, 7])

> The above returns `torch.Size([2])` which means our vector has a shape of `[2]`. This is because of the two elements we placed inside the square brackets (`[7, 7]`).

In [14]:
vector2 = torch.tensor([7, 7, 7, 7])
print(torch.Size(vector2))
print(vector2.shape)
print(vector2.ndim)
print(vector2.dtype)

torch.Size([7, 7, 7, 7])
torch.Size([4])
1
torch.int64


### Matrix


We can check the dimensions of a tensor using the `ndim` attribute.

In [15]:
# Matrix
MATRIX = torch.tensor([[7, 8], 
                       [9, 10]])
MATRIX

tensor([[ 7,  8],
        [ 9, 10]])

In [16]:
# Check number of dimensions for TENSOR
MATRIX.ndim

2

What about shape?

In [17]:
# Check shape of TENSOR
MATRIX.shape

torch.Size([2, 2])

Alright, it outputs `torch.Size([1, 3, 3])`.

The dimensions go outer to inner.

That means there's 1 dimension of 3 by 3.

![example of different tensor dimensions](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/00-pytorch-different-tensor-dimensions.png)

> **Note:** You might've noticed me using lowercase letters for `scalar` and `vector` and uppercase letters for `MATRIX` and `TENSOR`. This was on purpose. In practice, you'll often see scalars and vectors denoted as lowercase letters such as `y` or `a`. And matrices and tensors denoted as uppercase letters such as `X` or `W`.
>
> You also might notice the names martrix and tensor used interchangably. This is common. Since in PyTorch you're often dealing with `torch.Tensor`'s (hence the tensor name), however, the shape and dimensions of what's inside will dictate what it actually is.

Let's summarise.

| Name | What is it? | Number of dimensions | Lower or upper (usually/example) |
| ----- | ----- | ----- | ----- |
| **scalar** | a single number | 0 | Lower (`a`) | 
| **vector** | a number with direction (e.g. wind speed with direction) but can also have many other numbers | 1 | Lower (`y`) |
| **matrix** | a 2-dimensional array of numbers | 2 | Upper (`Q`) |
| **tensor** | an n-dimensional array of numbers | can be any number, a 0-dimension tensor is a scalar, a 1-dimension tensor is a vector | Upper (`X`) | 

![scalar vector matrix tensor and what they look like](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/00-scalar-vector-matrix-tensor.png)

## Random Tensor

>  a machine learning model often starts out with large random tensors of numbers and adjusts these random numbers as it works through data to better represent it.


- We've established tensors represent some form of data.

- And machine learning models such as neural networks manipulate and seek patterns within tensors.

- But when building machine learning models with PyTorch, it's rare you'll create tenors by hand (like what we've being doing).

- Instead, a machine learning model often starts out with large random tensors of numbers and adjusts these random numbers as it works through data to better represent it.

In essence:

`Start with random numbers -> look at data -> update random numbers -> look at data -> update random numbers...`

- As a data scientist, you can define how the machine learning model starts (initialization), looks at data (representation) and updates (optimization) its random numbers.

- We can create random tensors using [`torch.rand()`](https://pytorch.org/docs/stable/generated/torch.rand.html) and passing in the `size` parameter.

## 1 Creating a `tensor.Tensor()`

In [18]:
import torch # similar to import numpy 

a_random = torch.tensor((3,4)) # Create a random tensor
print(a_random)

tensor([3, 4])


In [19]:
a_random = torch.tensor([3,4]) # Create a random tensor
print(a_random)

tensor([3, 4])


In [20]:
print(a_random.shape) # print the shape of the random tensor
print(a_random.size()) # print the size of the random tensor
print(type(a_random)) # print the type of the random tensor
print(a_random.type()) # print the type of the random tens

torch.Size([2])
torch.Size([2])
<class 'torch.Tensor'>
torch.LongTensor


> Note: .shape is an alias for .size(), and was added to closely match numpy !



In [21]:
print(f"shape of the nd array: {a_numpy.shape}")  # this is the shape of the nd array 
print(f"size of the nd array: {a_numpy.size}") # this is the same as the shape of the nd array
print(f"dimension of the nd array: {a_numpy.ndim}") # this prints the dimensions of the nd array
print(f"datatype of the nd array: {a_numpy.dtype}") # this printts the datatype of the nd array

shape of the nd array: (10,)
size of the nd array: 10
dimension of the nd array: 1
datatype of the nd array: int64


- Intead of allowing the `torch.tensor` to automatically determine the data type, you can explicitly specify the type of the data type by using the `torch.type` parameter 


In [22]:
import torch

a_random = torch.tensor((3,4)) # Create a random tensor
print(a_random)

tensor([3, 4])


In [23]:
print(a_random.shape) # print the shape of the random tensor
print(a_random.size()) # print the size of the random tensor
print(type(a_random)) # print the type of the random tensor
print(a_random.type())

torch.Size([2])
torch.Size([2])
<class 'torch.Tensor'>
torch.LongTensor


- You can also change an existing tensor type. 


In [24]:
a_torch = torch.tensor([1, 2, 3]) 

print(a_torch.type()) # Tensor type

torch.LongTensor


We can change from LongTensor t:


In [25]:
a_short =  a_torch.short() # Convert to short,  
a_float =  a_torch.float() # Convert to float()

print(a_short.type()) # Tensor type
print(a_float.type()) # Tensor type

torch.ShortTensor
torch.FloatTensor


> Note: A variant of `torch.tensor` constructor is `torch.FloatTensor`constructor. 

> When use, the default tensor type is `FloatTensor`. Infact, torch.Tensor is an alias for the `torch.FloatTensor` constructor.

- The following two examples are equivalent:


In [26]:
a_random = torch.Tensor((3,4)) # Create a random tensor
b_random = torch.FloatTensor((3,4)) # Create a random tensor

print(a_random.type())
print(b_random.type())

torch.FloatTensor
torch.FloatTensor




### 2: Creating Tensors from Random Numbers

Similar to the numpy, we can create a tensor from a random number.


In [27]:
a_random_torch = torch.randn(2, 3) # uniform random distribution numbers between 0 and 1
# a_numpy_rand = np.random.randn(2,3) #numpy random normal distribution

print(a_random_torch)
# print(a_numpy_rand)

tensor([[ 0.0461,  0.4024, -1.0115],
        [ 0.2167, -0.6123,  0.5036]])


In [28]:
a_random_torch = torch.rand(2, 3) # random normal distribution
# a_numpy_rand = np.random.rand(2,3) 

print(a_random_torch)
# print(a_numpy_rand)

tensor([[0.7749, 0.8208, 0.2793],
        [0.6817, 0.2837, 0.6567]])


### 3: Creating a filled tensor


- We can create a tensor with a specific value filled in all the elements.

   

In [29]:
a_same_scalar = torch.zeros(3,3)
print(a_same_scalar)

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])


In [30]:
torch.ones(3, 3) # torch.ones(size=(3, 3)) 

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])

> Any PyTorch method with an underscore (_) refers to an in­place operation;


In [31]:
a_zero = torch.zeros(2, 3)
print(a_zero)

tensor([[0., 0., 0.],
        [0., 0., 0.]])


In [32]:
print(a_zero.fill_(5)) # inplace operation
# a_zero is now filled with 5

tensor([[5., 5., 5.],
        [5., 5., 5.]])


In [33]:
print(a_zero)  

tensor([[5., 5., 5.],
        [5., 5., 5.]])


### 4: Creating and initializing a tensor from lists


In [34]:
a_list = torch.tensor([1, 2, 3]) #
a_list

tensor([1, 2, 3])

### 5: Creating and initializing a tensor from numpy arrays


- We use `torch.from_numpy` to create a tensor from a numpy array.


In [35]:
import numpy as np
numpy_array = np.random.rand(2, 3) 
numpy_array

torch_tensor = torch.from_numpy(numpy_array) # tensor from numpy array
torch_tensor

tensor([[0.8795, 0.8838, 0.8693],
        [0.0713, 0.2840, 0.0836]], dtype=torch.float64)

In [36]:
torch_tensor.type()

'torch.DoubleTensor'

- The datatype after creating of tensor from numpy array is DoubleTensor instead of the default FloatTensor. 

- This corresponds with the data type of the NumPy random matrix, a `float64`,


> You can always convert from PyTorch tensors to Numpy arrays using the numpy function torch.numpy().


In [37]:
torch_tensor = torch_tensor.numpy() # numpy array from tensor 
type(torch_tensor)  

numpy.ndarray

In [38]:
list(range(1,10,2)) # range(start, stop, step)

[1, 3, 5, 7, 9]

### 6: Creating a range and tensors like


In [39]:
# Use torch.arange(), torch.range() is deprecated 
zero_to_ten = torch.arange(0, 10) # similar to numpy range()
zero_to_ten

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

### 7 Creating tensor of type with the same shape as another tensor.


In [40]:
zero_to_ten = torch.arange(0, 10) # similar to numpy range()
zero_to_ten

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [41]:
zero_to_ten.shape

torch.Size([10])

In [42]:
# Can also create a tensor of zeros similar to another tensor
ten_zeros = torch.zeros_like(input=zero_to_ten) # will have same shape
ten_zeros

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

##  Tensor datatypes

In [43]:
a = 2

b = 3.0

# type of a
print(type(b))

<class 'float'>


In [44]:
a = torch.tensor([2.0,3.0])

In [45]:
a.dtype

torch.float32

### Tensor datatypes

- There are many different [tensor datatypes available in PyTorch](https://pytorch.org/docs/stable/tensors.html#data-types).

- Some are specific for CPU and some are better for GPU.

- Getting to know which is which can take some time.

- Generally if you see `torch.cuda` anywhere, the tensor is being used for GPU (since Nvidia GPUs use a computing toolkit called CUDA).

- The most common type (and generally the default) is `torch.float32` or `torch.float`.

- This is referred to as "32-bit floating point".

- But there's also 16-bit floating point (`torch.float16` or `torch.half`) and 64-bit floating point (`torch.float64` or `torch.double`).

- And to confuse things even more there's also 8-bit, 16-bit, 32-bit and 64-bit integers.

Plus more!



> **Note:** An integer is a flat round number like `7` whereas a float has a decimal `7.0`.

- The reason for all of these is to do with **precision in computing**.

- Precision is the amount of detail used to describe a number.

- The higher the precision value (8, 16, 32), the more detail and hence data used to express a number.

- This matters in deep learning and numerical computing because you're making so many operations, the more detail you have to calculate on, the more compute you have to use.

- So lower precision datatypes are generally faster to compute on but sacrifice some performance on evaluation metrics like accuracy (faster to compute but less accurate).

> **Resources:** 
  * See the [PyTorch documentation for a list of all available tensor datatypes](https://pytorch.org/docs/stable/tensors.html#data-types).
  * Read the [Wikipedia page for an overview of what precision in computing](https://en.wikipedia.org/wiki/Precision_(computer_science)) is.

Let's see how to create some tensors with specific datatypes. We can do so using the `dtype` parameter.

In [46]:
# Default datatype for tensors is float32
float_32_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=None, # defaults to None, which is torch.float32 or whatever datatype is passed
                               device=None, # defaults to None, which uses the default tensor type
                               requires_grad=False) # if True, operations perfromed on the tensor are recorded 

float_32_tensor.shape, float_32_tensor.dtype, float_32_tensor.device

(torch.Size([3]), torch.float32, device(type='cpu'))

In [47]:
a * b

tensor([6., 9.])

- Aside from shape issues (tensor shapes don't match up), two of the other most common issues you'll come across in PyTorch are datatype and device issues.

- For example, one of tensors is `torch.float32` and the other is `torch.float16` (PyTorch often likes tensors to be the same format).

- Or one of your tensors is on the CPU and the other is on the GPU (PyTorch likes calculations between tensors to be on the same device).

- We'll see more of this device talk later on.

- For now let's create a tensor with `dtype=torch.float16`.

In [48]:
float_16_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=torch.float16) # torch.half would also work

float_16_tensor.dtype

torch.float16

> By deafult tensor is of type `torch.float32` which is equivalent to `torch.float` and `torch.double` is equivalent to `torch.float64`. 

## Getting information from tensors

- Once you've created tensors (or someone else or a PyTorch module has created them for you), you might want to get some information from them.

We've seen these before but three of the most common attributes you'll want to find out about tensors are:

* `shape` - what shape is the tensor? (some operations require specific shape rules)

* `dtype` - what datatype are the elements within the tensor stored in?

* `device` - what device is the tensor stored on? (usually GPU or CPU)

Let's create a random tensor and find out details about it.

In [49]:
# Create a tensor
some_tensor = torch.rand(3, 4 ) # create a tensor on the GPU

# Find out details about it
print(some_tensor)


tensor([[0.2388, 0.7313, 0.6012, 0.3043],
        [0.2548, 0.6294, 0.9665, 0.7399],
        [0.4517, 0.4757, 0.7842, 0.1525]])


In [50]:
print(f"Shape of tensor: {some_tensor.shape}")

Shape of tensor: torch.Size([3, 4])


In [51]:
print(f"Datatype of tensor: {some_tensor.dtype}")


Datatype of tensor: torch.float32


In [52]:
print(f"Device tensor is stored on: {some_tensor.device}") # will default to CPU

Device tensor is stored on: cpu


> **Note:** When you run into issues in PyTorch, it's very often one to do with one of the three attributes above. So when the error messages show up, sing yourself a little song called "what, what, where": 
  * "*what shape are my tensors? what datatype are they and where are they stored? what shape, what datatype, where where where*"

![](2022-12-02-18-28-56.png)
![](2022-12-02-18-29-41.png)

Note:  `torch.tensor()` always copies data. If you have a Tensor data and just want to change its `requires_grad flag`, use `requires_grad_() or detach()` to avoid a copy. If you have a numpy array and want to avoid a copy, use torch.as_tensor().




## Manipulating tensors (tensor operations)

- In deep learning, data (images, text, video, audio, protein structures, etc) gets represented as tensors.

- A model learns by investigating those tensors and performing a series of operations (could be 1,000,000s+) on tensors to create a representation of the patterns in the input data.

Tensor opertions include:
* Addition
* Subtraction
* Multiplication (element-wise)
* Division
* Matrix multiplication

- And that's it. Sure there are a few more here and there but these are the basic building blocks of neural networks.

- Stacking these building blocks in the right way, you can create the most sophisticated of neural networks (just like lego!).


### Basic operations

Let's start with a few of the fundamental operations, addition (`+`), subtraction (`-`), mutliplication (`*`).

They work just as you think they would.

In [53]:
# Create a tensor and add 10 to it
tensor = torch.tensor([1, 2, 3])
tensor + 10

tensor([11, 12, 13])

In [54]:
# Multiply tensor by 10
tensor * 10

tensor([10, 20, 30])

In [55]:
# Tensors don't change unless reassigned
tensor

tensor([1, 2, 3])

Let's subtract a number and this time we'll reassign the `tensor` variable. 

In [56]:
# Substract 10
tensor - 10

tensor([-9, -8, -7])

PyTorch also has a bunch of built-in functions like [`torch.mul()`](https://pytorch.org/docs/stable/generated/torch.mul.html#torch.mul) (short for multiplcation) and [`torch.add()`](https://pytorch.org/docs/stable/generated/torch.add.html) to perform basic operations. 

In [57]:
# Try out PyTorch in-built functions
torch.mul(tensor, 10)

tensor([10, 20, 30])

In [58]:
torch.add(tensor, 10)

tensor([11, 12, 13])

However, it's more common to use the operator symbols like `*` instead of `torch.mul()`

In [59]:
# Element-wise multiplication (each element multiplies its equivalent, index 0->0, 1->1, 2->2)
print(tensor, "*", tensor)
print("Equals:", tensor * tensor)

tensor([1, 2, 3]) * tensor([1, 2, 3])
Equals: tensor([1, 4, 9])


### Matrix multiplication (is all you need)

One of the most common operations in machine learning and deep learning algorithms (like neural networks) is [matrix multiplication](https://www.mathsisfun.com/algebra/matrix-multiplying.html).

PyTorch implements matrix multiplication functionality in the [`torch.matmul()`](https://pytorch.org/docs/stable/generated/torch.matmul.html) method.

The main two rules for matrix multiplication to remember are:
1. The **inner dimensions** must match:
  * `(3, 2) @ (3, 2)` won't work
  * `(2, 3) @ (3, 2)` will work
  * `(3, 2) @ (2, 3)` will work
2. The resulting matrix has the shape of the **outer dimensions**:
 * `(2, 3) @ (3, 2)` -> `(2, 2)`
 * `(3, 2) @ (2, 3)` -> `(3, 3)`

> **Note:** "`@`" in Python is the symbol for matrix multiplication.

> **Resource:** You can see all of the rules for matrix multiplication using `torch.matmul()` [in the PyTorch documentation](https://pytorch.org/docs/stable/generated/torch.matmul.html).

Let's create a tensor and perform element-wise multiplication and matrix multiplication on it.



In [60]:
import torch
tensor = torch.tensor([1, 2, 3])
tensor.shape

torch.Size([3])

The difference between element-wise multiplication and matrix multiplication is the addition of values.

For our `tensor` variable with values `[1, 2, 3]`:

| Operation | Calculation | Code |
| ----- | ----- | ----- |
| **Element-wise multiplication** | `[1*1, 2*2, 3*3]` = `[1, 4, 9]` | `tensor * tensor` |
| **Matrix multiplication** | `[1*1 + 2*2 + 3*3]` = `[14]` | `tensor.matmul(tensor)` |


In [61]:
tensor

tensor([1, 2, 3])

In [62]:
# Element-wise matrix multiplication
tensor * tensor

tensor([1, 4, 9])

In [63]:
# Matrix multiplication
torch.matmul(tensor, tensor)

tensor(14)

In [64]:
# Can also use the "@" symbol for matrix multiplication, though not recommended
tensor @ tensor

tensor(14)

#### Perfomance

You can do matrix multiplication by hand but it's not recommended.

The in-built `torch.matmul()` method is faster.

### Matrix multiplication

Two main ways of performing multiplication in neural networks and deep learning:

1. Element-wise multiplication
2. Matrix mutliplication (dot product)

More information on multiplying matrices - https://www.mathsisfun.com/algebra/matrix-multiplying.html

There are two main rules that performing matrix mutliplication needs to satisfy:
1. The **inner dimensions** must match:
* `(3, 2) @ (3, 2)` won't work
* `(2, 3) @ (3, 2)` will work
* `(3, 2) @ (2, 3)` will work
2. The resulting matrix has the shape of the **outer dimensions**:
* `(2, 3) @ (3, 2)` -> `(2, 2)`
* `(3, 2) @ (2, 3)` -> `(3, 3)`

In [65]:
# Element wise multiplication
print(tensor, "*", tensor)
print(f"Equals: {tensor * tensor}")

tensor([1, 2, 3]) * tensor([1, 2, 3])
Equals: tensor([1, 4, 9])


In [66]:
# Matrix multiplication
torch.matmul(tensor, tensor)

tensor(14)

You can do matrix multiplication by hand but it's not recommended.

The in-built `torch.matmul()` method is faster.

In [67]:
%%time
# Matrix multiplication by hand 
# (avoid doing operations with for loops at all cost, they are computationally expensive)
value = 0
for i in range(len(tensor)):
  value += tensor[i] * tensor[i]
value

CPU times: user 791 µs, sys: 1.6 ms, total: 2.39 ms
Wall time: 1.83 ms


tensor(14)

In [68]:
%%time
torch.matmul(tensor, tensor)

CPU times: user 101 µs, sys: 47 µs, total: 148 µs
Wall time: 125 µs


tensor(14)

## One of the most common errors in deep learning (shape errors)

- Because much of deep learning is multiplying and performing operations on matrices and matrices have a strict rule about what shapes and sizes can be combined, one of the most common errors you'll run into in deep learning is shape mismatches.

In [69]:
# Shapes need to be in the right way  
tensor_A = torch.tensor([[1, 2],
                         [3, 4],
                         [5, 6]], dtype=torch.float32)

tensor_B = torch.tensor([[7, 10],
                         [8, 11], 
                         [9, 12]], dtype=torch.float32)                

In [70]:
# View tensor_A and tensor_B.T
print(tensor_B.shape)
print(tensor_A.shape)

torch.Size([3, 2])
torch.Size([3, 2])


In [71]:
torch.matmul(tensor_A, tensor_B) # (this will error

RuntimeError: mat1 and mat2 shapes cannot be multiplied (3x2 and 3x2)

We can make matrix multiplication work between `tensor_A` and `tensor_B` by making their inner dimensions match.

One of the ways to do this is with a **transpose** (switch the dimensions of a given tensor).

You can perform transposes in PyTorch using either:
* `torch.transpose(input, dim0, dim1)` - where `input` is the desired tensor to transpose and `dim0` and `dim1` are the dimensions to be swapped.
* `tensor.T` - where `tensor` is the desired tensor to transpose.

Let's try the latter.

In [72]:
# View tensor_A and tensor_B
print(tensor_A)
print(tensor_B)

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])
tensor([[ 7., 10.],
        [ 8., 11.],
        [ 9., 12.]])


In [73]:
# View tensor_A and tensor_B.T
print(tensor_B.shape)
print(tensor_A.shape)

torch.Size([3, 2])
torch.Size([3, 2])


In [74]:
tensor_B

tensor([[ 7., 10.],
        [ 8., 11.],
        [ 9., 12.]])

In [75]:
tensor_B.T

tensor([[ 7.,  8.,  9.],
        [10., 11., 12.]])

In [76]:
tensor_A

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])

In [77]:
# View tensor_A and tensor_B.T
print(tensor_A)
print(tensor_B.T)

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])
tensor([[ 7.,  8.,  9.],
        [10., 11., 12.]])


In [78]:
tensor_BB = tensor_B.T

In [79]:
tensor_BB

tensor([[ 7.,  8.,  9.],
        [10., 11., 12.]])

In [80]:
print(tensor_B.T)
print(tensor_B.shape)

tensor([[ 7.,  8.,  9.],
        [10., 11., 12.]])
torch.Size([3, 2])


> New `Shape()`

In [81]:
# The operation works when tensor_B is transposed

print(f"Original shapes: tensor_A = {tensor_A.shape}, tensor_B = {tensor_B.shape}\n")
print(f"New shapes: tensor_A = {tensor_A.shape} (same as above), tensor_B.T = {tensor_B.T.shape}\n")

Original shapes: tensor_A = torch.Size([3, 2]), tensor_B = torch.Size([3, 2])

New shapes: tensor_A = torch.Size([3, 2]) (same as above), tensor_B.T = torch.Size([2, 3])



In [82]:
print(f"Multiplying: {tensor_A.shape} * {tensor_B.T.shape} <- inner dimensions match\n")
print("Output:\n")
output = torch.matmul(tensor_A, tensor_B.T)
print(output) 
print(f"\nOutput shape: {output.shape}")

Multiplying: torch.Size([3, 2]) * torch.Size([2, 3]) <- inner dimensions match

Output:

tensor([[ 27.,  30.,  33.],
        [ 61.,  68.,  75.],
        [ 95., 106., 117.]])

Output shape: torch.Size([3, 3])


You can also use [`torch.mm()`](https://pytorch.org/docs/stable/generated/torch.mm.html) which is a short for `torch.matmul()`.

In [83]:
# torch.mm is a shortcut for matmul
torch.mm(tensor_A, tensor_B.T)

tensor([[ 27.,  30.,  33.],
        [ 61.,  68.,  75.],
        [ 95., 106., 117.]])

Without the transpose, the rules of matrix mulitplication aren't fulfilled and we get an error like above.

How about a visual? 

![visual demo of matrix multiplication](https://github.com/mrdbourke/pytorch-deep-learning/raw/main/images/00-matrix-multiply-crop.gif)

You can create your own matrix multiplication visuals like this at http://matrixmultiplication.xyz/.

> **Note:** A matrix multiplication like this is also referred to as the [**dot product**](https://www.mathsisfun.com/algebra/vectors-dot-product.html) of two matrices.



Neural networks are full of matrix multiplications and dot products.

The [`torch.nn.Linear()`](https://pytorch.org/docs/1.9.1/generated/torch.nn.Linear.html) module (we'll see this in action later on), also known as a feed-forward layer or fully connected layer, implements a matrix multiplication between an input `x` and a weights matrix `A`.

$$
y = x\cdot{A^T} + b
$$

Where:
* `x` is the input to the layer (deep learning is a stack of layers like `torch.nn.Linear()` and others on top of each other).
* `A` is the weights matrix created by the layer, this starts out as random numbers that get adjusted as a neural network learns to better represent patterns in the data (notice the "`T`", that's because the weights matrix gets transposed).
  * **Note:** You might also often see `W` or another letter like `X` used to showcase the weights matrix.
* `b` is the bias term used to slightly offset the weights and inputs.
* `y` is the output (a manipulation of the input in the hopes to discover patterns in it).

This is a linear function (you may have seen something like $y = mx+b$ in high school or elsewhere), and can be used to draw a straight line!

Let's play around with a linear layer.

Try changing the values of `in_features` and `out_features` below and see what happens.

Do you notice anything to do with the shapes?

In [84]:
tensor_A

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])

In [85]:
# Since the linear layer starts with a random weights matrix, let's make it reproducible (more on this later)
torch.manual_seed(42)
# This uses matrix multiplication
linear = torch.nn.Linear(in_features=2, # in_features = matches inner dimension of input 
                         out_features=6) # out_features = describes outer value 


In [86]:
tensor_A

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])

In [87]:
tensor_A.dim()

2

In [88]:
x = tensor_A
output = linear(x)

print(f"Output:\n{output}\n\nOutput shape: {output.shape}")

Output:
tensor([[2.2368, 1.2292, 0.4714, 0.3864, 0.1309, 0.9838],
        [4.4919, 2.1970, 0.4469, 0.5285, 0.3401, 2.4777],
        [6.7469, 3.1648, 0.4224, 0.6705, 0.5493, 3.9716]],
       grad_fn=<AddmmBackward0>)

Output shape: torch.Size([3, 6])


> **Question:** What happens if you change `in_features` from 2 to 3 above? Does it error? How could you change the shape of the input (`x`) to accomodate to the error? Hint: what did we have to do to `tensor_B` above?

If you've never done it before, matrix multiplication can be a confusing topic at first.

But after you've played around with it a few times and even cracked open a few neural networks, you'll notice it's everywhere.

Remember, matrix multiplication is all you need.

![matrix multiplication is all you need](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/00_matrix_multiplication_is_all_you_need.jpeg)

*When you start digging into neural network layers and building your own, you'll find matrix multiplications everywhere. **Source:** https://marksaroufim.substack.com/p/working-class-deep-learner*

### Finding the min, max, mean, sum, etc (aggregation)

- Now we've seen a few ways to manipulate tensors, let's run through a few ways to aggregate them (go from more values to less values).

- First we'll create a tensor and then find the max, min, mean and sum of it.



In [130]:
# Create a tensor
import torch
x = torch.arange(0, 100, 10)
x

tensor([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

Now let's perform some aggregation.

In [131]:
print(f"Minimum: {x.min()}")
print(f"Maximum: {x.max()}")


Minimum: 0
Maximum: 90


In [132]:
# we can also use torch.min and torch.max
torch.min(x)

tensor(0)

> The difference between torch.min(x) and x.min() is that torch.min(x) is a standalone function that takes a tensor as input and returns its minimum value, while x.min() is a method that is called on a tensor and returns its minimum value.

> In other words, torch.min(x) and x.min() are equivalent in terms of their functionality, but x.min() is a more concise and convenient way to compute the minimum value of a tensor.

> One of the main issue in deep learning is data types.

- You may find some methods such as `torch.mean()` require tensors to be in `torch.float32` (the most common) or another specific datatype, otherwise the operation will fail. 



In [137]:
x = torch.tensor(3)

In [141]:
x.type(torch.float32)

tensor(3.)

In [138]:
print(f"Sum: {x.mean()}") # this gives an error because data type is long

RuntimeError: mean(): could not infer output dtype. Input dtype must be either a floating point or complex dtype. Got: Long

In [139]:
print(f"Mean: {x.type(torch.float32).mean()}") # won't work without float datatype


Mean: 3.0


### Finding Positional min/max with argmin/argmax

- You can also find the index of a tensor where the max or minimum occurs with [`torch.argmax()`](https://pytorch.org/docs/stable/generated/torch.argmax.html) and [`torch.argmin()`](https://pytorch.org/docs/stable/generated/torch.argmin.html) respectively.


- This is helpful incase you just want the position where the highest (or lowest) value is and not the actual value itself 

In [143]:
# Create a tensor
tensor = torch.arange(10, 100, 10)
print(f"Tensor: {tensor}")


Tensor: tensor([10, 20, 30, 40, 50, 60, 70, 80, 90])


In [144]:

# Returns index of max and min values
print(f"Index where max value occurs: {tensor.argmax()}")
print(f"Index where min value occurs: {tensor.argmin()}")

Index where max value occurs: 8
Index where min value occurs: 0


In [97]:
tensor = torch.arange(10, 100, 10)
tensor.argmax()

tensor(8)

In [145]:
help(torch.argmax)

Help on built-in function argmax in module torch:

argmax(...)
    argmax(input) -> LongTensor
    
    Returns the indices of the maximum value of all elements in the :attr:`input` tensor.
    
    This is the second value returned by :meth:`torch.max`. See its
    documentation for the exact semantics of this method.
    
    .. note:: If there are multiple maximal values then the indices of the first maximal value are returned.
    
    Args:
        input (Tensor): the input tensor.
    
    Example::
    
        >>> a = torch.randn(4, 4)
        >>> a
        tensor([[ 1.3398,  0.2663, -0.2686,  0.2450],
                [-0.7401, -0.8805, -0.3402, -1.1936],
                [ 0.4907, -1.3948, -1.0691, -0.3132],
                [-1.6092,  0.5419, -0.2993,  0.3195]])
        >>> torch.argmax(a)
        tensor(0)
    
    .. function:: argmax(input, dim, keepdim=False) -> LongTensor
       :noindex:
    
    Returns the indices of the maximum values of a tensor across a dimension.
  

> Question: in argmax(x, dim=1), what does the dim referes to row or column?


- In the argmax() function, the dim argument specifies the dimension along which to find the maximum value. In the case of a tensor with two dimensions, such as a matrix, the dim argument refers to the rows or columns of the matrix.



In [146]:
tensor = torch.tensor([[1, 2, 3], 
                     [4, 5, 6]])

In [147]:
tensor.argmax(dim=1) #  dim=1 means we're looking for the max value in each row

tensor([2, 2])

In [149]:
tensor.argmin(dim=0) # dim=0 means we're looking for the max value in each column

tensor([0, 0, 0])

> we'll see this in a later section when using the [softmax activation function](https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html)), how argmax() is used to find the position of the highest value in a tensor.

### Change tensor datatype

- As mentioned, a common issue with deep learning operations is having your tensors in different datatypes.

- If one tensor is in `torch.float64` and another is in `torch.float32`, you might run into some errors.

- But there's a fix.

- You can change the datatypes of tensors using [`torch.Tensor.type(dtype=None)`](https://pytorch.org/docs/stable/generated/torch.Tensor.type.html) where the `dtype` parameter is the datatype you'd like to use.

- First we'll create a tensor and check it's datatype (the default is `torch.float32`).

In [150]:
# Create a tensor and check its datatype
tensor = torch.arange(10., 100., 10.)
tensor.dtype

torch.float32

Now we'll create another tensor the same as before but change its datatype to `torch.float16`.


In [151]:
# Create a float16 tensor
tensor_float16 = tensor.type(torch.float16)
tensor_float16

tensor([10., 20., 30., 40., 50., 60., 70., 80., 90.], dtype=torch.float16)

In [21]:
# Create a float16 tensor
tensor_float16 = tensor.type(torch.float16)
tensor_float16

tensor([10., 20., 30., 40., 50., 60., 70., 80., 90.], dtype=torch.float16)

> In modern PyTorch, you just say `float_tensor.double()` to cast a float tensor to double tensor. There are methods for each type you want to cast to. If, instead, you have a dtype and want to cast to that, say float_tensor.to(dtype=your_dtype) (e.g., your_dtype = torch.float64)

In [152]:
dounble_tensor = tensor_float16.double()
dounble_tensor

tensor([10., 20., 30., 40., 50., 60., 70., 80., 90.], dtype=torch.float64)

In [153]:
dounble_tensor = tensor_float16.to(torch.double)
dounble_tensor


tensor([10., 20., 30., 40., 50., 60., 70., 80., 90.], dtype=torch.float64)

> Note that changing the data type of a tensor can potentially cause precision loss or overflow if the new data type is not large enough to accommodate the values in the tensor. Therefore, it is important to choose the appropriate data type for your tensor based on the values it contains.˝

> Note: if your tensor is t`ensor.cuda.FloatTensor`, converting using `tensor.DoubleTensor` removes cuda. Better to use `tensor.double()` because it works for both cpu and gpu tensors.



> **Note:** Different datatypes can be confusing to begin with. But think of it like this, the lower the number (e.g. 32, 16, 8), the less precise a computer stores the value. And with a lower amount of storage, this generally results in faster computation and a smaller overall model. Mobile-based neural networks often operate with 8-bit integers, smaller and faster to run but less accurate than their float32 counterparts. For more on this, I'd read up about [precision in computing](https://en.wikipedia.org/wiki/Precision_(computer_science)).

> **Exercise:** So far we've covered a fair few tensor methods but there's a bunch more in the [`torch.Tensor` documentation](https://pytorch.org/docs/stable/tensors.html), I'd recommend spending 10-minutes scrolling through and looking into any that catch your eye. Click on them and then write them out in code yourself to see what happens.

## Reshaping, stacking, squeezing and unsqueezing

-  One of the main error in deep learning is having tensors in the wrong shape.

- For example, if you're trying to pass a tensor into a neural network layer and it's not in the right shape, you'll get an error.

- So, if you want to reshape or change the dimensions of your tensors without actually changing the values inside them, some popular methods are:

| Method | One-line description |
| ----- | ----- |
| [`torch.reshape(input, shape)`](https://pytorch.org/docs/stable/generated/torch.reshape.html#torch.reshape) | Reshapes `input` to `shape` (if compatible), can also use `torch.Tensor.reshape()`. |
| [`torch.Tensor.view(shape)`](https://pytorch.org/docs/stable/generated/torch.Tensor.view.html) | Returns a view of the original tensor in a different `shape` but shares the same data as the original tensor. |
| [`torch.stack(tensors, dim=0)`](https://pytorch.org/docs/1.9.1/generated/torch.stack.html) | Concatenates a sequence of `tensors` along a new dimension (`dim`), all `tensors` must be same size. |
| [`torch.squeeze(input)`](https://pytorch.org/docs/stable/generated/torch.squeeze.html) | Squeezes `input` to remove all the dimenions with value `1`. |
| [`torch.unsqueeze(input, dim)`](https://pytorch.org/docs/1.9.1/generated/torch.unsqueeze.html) | Returns `input` with a dimension value of `1` added at `dim`. | 
| [`torch.permute(input, dims)`](https://pytorch.org/docs/stable/generated/torch.permute.html) | Returns a *view* of the original `input` with its dimensions permuted (rearranged) to `dims`. | 

Why do any of these?

> Because deep learning models (neural networks) are all about manipulating tensors in some way. And because of the rules of matrix multiplication, if you've got shape mismatches, you'll run into errors. These methods help you make the right elements of your tensors are mixing with the right elements of other tensors. 




### Reshaping

 > Reshaping we mean rearranging the rows and columns.
 

- There are two ways of doing it — (1) Reshaping without changing the rank of the tensor and (2) Reshaping with the change of tensor rank

- “rank” in tensor represents the indexes or the dimensions of the tensor (matrix)

Let's try them out.

First, we'll create a tensor.

In [162]:
# to check the shape or size of the tensor

t = torch.tensor([
                  [1, 1, 1, 1],
                  [2, 2, 2, 2],
                  [3, 3, 3, 3]] , dtype = torch.float32)

# using .size
t.size()


torch.Size([3, 4])

In [163]:
# using .shape , writing .shape() will give an error
t.shape


torch.Size([3, 4])

In [164]:
# calculate the dimension or the rank

len(t.shape)


2

In [156]:
a = torch.tensor([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
len(a.shape)

1

In [165]:
t

tensor([[1., 1., 1., 1.],
        [2., 2., 2., 2.],
        [3., 3., 3., 3.]])

In [167]:
# to find the no of elements 

# 1 a combination using product
torch.tensor(t.shape).prod()

tensor(12)

In [168]:
t.numel()

12

### (a) Reshaping without changing the rank of the tensor


> On reshaping, the no of elements always remains the SAME! Only the shape changes i.e., no of rows and columns might chang

In [169]:
t

tensor([[1., 1., 1., 1.],
        [2., 2., 2., 2.],
        [3., 3., 3., 3.]])

In [170]:
t.size()

torch.Size([3, 4])

3 x 1

In [175]:
tt = t.reshape(1, 12)
tt

tensor([[1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.]])

In [173]:
tt.size()

torch.Size([1, 12])

In [176]:
t.reshape(3, 4)

tensor([[1., 1., 1., 1.],
        [2., 2., 2., 2.],
        [3., 3., 3., 3.]])

In [179]:
t.reshape(6, 2)

tensor([[1., 1.],
        [1., 1.],
        [2., 2.],
        [2., 2.],
        [3., 3.],
        [3., 3.]])

In [180]:
t.reshape(4, -1)

tensor([[1., 1., 1.],
        [1., 2., 2.],
        [2., 2., 3.],
        [3., 3., 3.]])

In [48]:
t.reshape(4, 4) # this will give an error

RuntimeError: shape '[4, 4]' is invalid for input of size 12

## (b) Reshaping with the change of tensor rank


In [182]:
t.reshape(2,2,3)


tensor([[[1., 1., 1.],
         [1., 2., 2.]],

        [[2., 2., 3.],
         [3., 3., 3.]]])

In [183]:
a = t.reshape(2,2,3) # meant splitting the tensor with two new matrices of size 2 x 3 each!


In [184]:
len(a.shape)


3

In [186]:
t

tensor([[1., 1., 1., 1.],
        [2., 2., 2., 2.],
        [3., 3., 3., 3.]])

In [190]:
t.shape

torch.Size([3, 4])

In [188]:
t.size()

torch.Size([3, 4])

In [193]:
a = torch.tensor([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

In [204]:
c = t.reshape(2,2,3)


> Tensor is n-dimensional array. Rank of a tensor is the number of dimensions in the tensor. For example, the rank of a tensor is 2 if it is a 2-dimensional array. The rank of a tensor is 3 if it is a 3-dimensional array.

In [205]:
c

tensor([[[1., 1., 1.],
         [1., 2., 2.]],

        [[2., 2., 3.],
         [3., 3., 3.]]])

In [206]:
c.dim()

3

## Squeezing and Un-squeezing of tensors

- When we say squeezing of tensors, we mean to remove all the 1’s from the dimension of the tensor

- say if the dimension of the tensor is [1, 12] then after squeezing that tensor the dimension changes to [12]. 


> In many cases, tensors contain redundant information, which makes it difficult to process. Squuezing is a way to remove redundant information from tensors. Also squeezing can help save memory and computational resources.



> Why Squuezing? Because it is a common operation in deep learning. For example, if you have a batch of images, you can use unsqueeze to add a fake batch dimension of 1 to make your images tensor 4D instead of 3D. This is useful for some operations like convolutions.


In [211]:
t

tensor([[1., 1., 1., 1.],
        [2., 2., 2., 2.],
        [3., 3., 3., 3.]])

In [207]:
# remove the 1's from the dimentions or axes that have a length of 1
# taking the above tensor "t"

print(" Tensor before squeezing = " , t.reshape(1, 12))
t.reshape(1, 12).size()


 Tensor before squeezing =  tensor([[1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.]])


torch.Size([1, 12])

In [209]:
print("d after squeezing", t.reshape(1, 12).squeeze() )


d after squeezing tensor([1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.])


In [215]:
t.reshape(1, 12).squeeze().size()

torch.Size([12])

## Unsqueezing of tensors:

- When we say unsqueezing of tensors, we mean to add a dimension of 1 to the tensor

- This puts it back to its original condition with 1’s in the dimension.



In [217]:
p = t.reshape(1, 12)
p

tensor([[1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.]])

In [219]:
q = p.squeeze()

In [221]:
q.size()

torch.Size([12])

In [225]:
q2= q.unsqueeze(0)

In [227]:
q2.size()

torch.Size([1, 12])

In [212]:
print("t before unsqueezing", t.reshape(1, 12).squeeze() )

t.reshape(1, 12).squeeze().size()


t before unsqueezing tensor([1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.])


torch.Size([12])

In [213]:
print("t after unsqueezing", t.reshape(1, 12).squeeze().unsqueeze(dim=0) )

t.reshape(1, 12).squeeze().unsqueeze(dim=0).size()

t after unsqueezing tensor([[1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.]])


torch.Size([1, 12])

torch.Size([1, 12])

## Flattening of tensors

> Flattening of tensor meaning similar to say trying to press in the content of say 3 jars in 1 big jar! or maybe into 2 jars depending upon the content!

### (b) Using flatten()

In [214]:
c = torch.rand([3,3,4])
c

tensor([[[0.8694, 0.5677, 0.7411, 0.4294],
         [0.8854, 0.5739, 0.2666, 0.6274],
         [0.2696, 0.4414, 0.2969, 0.8317]],

        [[0.1053, 0.2695, 0.3588, 0.1994],
         [0.5472, 0.0062, 0.9516, 0.0753],
         [0.8860, 0.5832, 0.3376, 0.8090]],

        [[0.5779, 0.9040, 0.5547, 0.3423],
         [0.6343, 0.3644, 0.7104, 0.9464],
         [0.7890, 0.2814, 0.7886, 0.5895]]])

In [57]:
torch.flatten(c)


tensor([0.3535, 0.7828, 0.6393, 0.6011, 0.7849, 0.6351, 0.4737, 0.0247, 0.3160,
        0.6294, 0.6427, 0.8290, 0.6497, 0.8818, 0.9678, 0.7471, 0.0151, 0.6040,
        0.7622, 0.1245, 0.7094, 0.5041, 0.7225, 0.2515, 0.7550, 0.6349, 0.8386,
        0.5620, 0.5299, 0.6289, 0.8269, 0.4040, 0.0599, 0.3848, 0.7934, 0.4705])

In [58]:
torch.flatten(c).size()


torch.Size([36])

flattened into one or along axis -0


In [59]:
torch.flatten(c , start_dim = 1)


tensor([[0.3535, 0.7828, 0.6393, 0.6011, 0.7849, 0.6351, 0.4737, 0.0247, 0.3160,
         0.6294, 0.6427, 0.8290],
        [0.6497, 0.8818, 0.9678, 0.7471, 0.0151, 0.6040, 0.7622, 0.1245, 0.7094,
         0.5041, 0.7225, 0.2515],
        [0.7550, 0.6349, 0.8386, 0.5620, 0.5299, 0.6289, 0.8269, 0.4040, 0.0599,
         0.3848, 0.7934, 0.4705]])

In [60]:
torch.flatten(c , start_dim = 2)


tensor([[[0.3535, 0.7828, 0.6393, 0.6011],
         [0.7849, 0.6351, 0.4737, 0.0247],
         [0.3160, 0.6294, 0.6427, 0.8290]],

        [[0.6497, 0.8818, 0.9678, 0.7471],
         [0.0151, 0.6040, 0.7622, 0.1245],
         [0.7094, 0.5041, 0.7225, 0.2515]],

        [[0.7550, 0.6349, 0.8386, 0.5620],
         [0.5299, 0.6289, 0.8269, 0.4040],
         [0.0599, 0.3848, 0.7934, 0.4705]]])

### (b) Using reshape()

In [62]:
c.reshape(-1) # means we are reshaping the tensor to a 1D tensor

tensor([0.3535, 0.7828, 0.6393, 0.6011, 0.7849, 0.6351, 0.4737, 0.0247, 0.3160,
        0.6294, 0.6427, 0.8290, 0.6497, 0.8818, 0.9678, 0.7471, 0.0151, 0.6040,
        0.7622, 0.1245, 0.7094, 0.5041, 0.7225, 0.2515, 0.7550, 0.6349, 0.8386,
        0.5620, 0.5299, 0.6289, 0.8269, 0.4040, 0.0599, 0.3848, 0.7934, 0.4705])

> Q: So, what’s the use of flattening!! Its usually used in image flattening! And, why would you want to flatten an image?


> A : The reason we do this is that we’re going to need to insert this (flattened) data into an artificial neural network later on. This looks something like this:


![](2022-12-10-12-30-04.png)

### Stacking Tensors



> Concatenates a sequence of tensors along a new dimension. All tensors need to be of the same size. 

In [228]:
t

tensor([[1., 1., 1., 1.],
        [2., 2., 2., 2.],
        [3., 3., 3., 3.]])

In [229]:
t.size()

torch.Size([3, 4])

In [230]:
# Stack tensors on top of each other
x_stacked = torch.stack([t, t], dim=0) # try changing dim to dim=1 and see what happens
x_stacked

tensor([[[1., 1., 1., 1.],
         [2., 2., 2., 2.],
         [3., 3., 3., 3.]],

        [[1., 1., 1., 1.],
         [2., 2., 2., 2.],
         [3., 3., 3., 3.]]])

In [73]:
x_stacked.size()

torch.Size([4, 3, 4])

> Stack : Concatenates sequence of tensors along a new dimension.


##### Cat

> cat : 'Concatenates' a sequence of tensors along an existing dimension:

In [231]:
t

tensor([[1., 1., 1., 1.],
        [2., 2., 2., 2.],
        [3., 3., 3., 3.]])

In [232]:
t.size()

torch.Size([3, 4])

In [233]:
# Stack tensors on top of each other
x_stacked = torch.cat([t, t], dim=0) # try changing dim to dim=1 and see what happens
x_stacked

tensor([[1., 1., 1., 1.],
        [2., 2., 2., 2.],
        [3., 3., 3., 3.],
        [1., 1., 1., 1.],
        [2., 2., 2., 2.],
        [3., 3., 3., 3.]])

In [234]:
x_stacked.size()

torch.Size([6, 4])

> Concatenates the given sequence of seq tensors in the given dimension.


### Tensor View

- View is a method that returns a new tensor with the same data as the original tensor but of a different shape/size.

- The returned tensor shares the same data and must have the same number of elements, but may have a different size.

- In contrast, reshape returns a new tensor with the same data as the original tensor but of a different shape/size. The reshape operation returns a new tensor while view returns a view of the original tensor.  

View and reshape are similar but view is more flexible.

- `torch.view()` can be used to reshape a tensor to a different shape without changing the values inside it.

- `torch.reshape()` can only be used to reshape a tensor to a different shape if the new shape has the same number of elements as the original tensor. 

In [76]:
print(x)

x.size()

tensor([1., 2., 3., 4., 5., 6., 7.])


torch.Size([7])

In [74]:
# Change view (keeps same data as original but changes view)
# See more: https://stackoverflow.com/a/54507446/7900723
z = x.view(1, 7)
z, z.shape

(tensor([[1., 2., 3., 4., 5., 6., 7.]]), torch.Size([1, 7]))

We can also change the view with `torch.view()`.

## Indexing and Slicing (selecting data from tensors)


- In PyTorch, indexing and slicing refer to ways of accessing elements of a tensor. 

- Indexing is used to access a single element of a tensor, while slicing is used to access a sub-tensor or a group of elements in a tensor.

- Sometimes you'll want to select specific data from tensors (for example, only the first column or second row).To do so, you can use indexing.

- If you've ever done indexing on Python lists or NumPy arrays, indexing in PyTorch with tensors is very similar.

![](2022-12-07-15-27-59.png)

In [236]:
# Create a tensor 
import torch
x = torch.arange(1, 10).reshape(1, 3, 3)
x, x.shape

(tensor([[[1, 2, 3],
          [4, 5, 6],
          [7, 8, 9]]]),
 torch.Size([1, 3, 3]))

> Indexing values goes outer dimension -> inner dimension (check out the square brackets).

In [237]:
x[0] # first matrix

tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])

In [79]:
#"Second square bracket: x[0][0]}") 
x[0][0] # second square bracket

tensor([1, 2, 3])

In [84]:
x[0,0] # second square bracket

tensor([1, 2, 3])

In [85]:
#"Second square bracket: x[0][0]}") 
x[0][0][0] # third square bracket

tensor(1)

### Slicing 

![](2022-12-07-15-27-29.png)

You can also use `:` to specify "all values in this dimension" and then use a comma (`,`) to add another dimension.

In [86]:
x

tensor([[[1, 2, 3],
         [4, 5, 6],
         [7, 8, 9]]])

In [87]:
# Get all values of 0th dimension and the 0 index of 1st dimension
x[:, 0]

tensor([[1, 2, 3]])

In [88]:
# Get all values of 0th & 1st dimensions but only index 1 of 2nd dimension
x[:, :, 1]

tensor([[2, 5, 8]])

In [89]:
# Get all values of the 0 dimension but only the 1 index value of the 1st and 2nd dimension
x[:, 1, 1]

tensor([5])

In [90]:
# Get index 0 of 0th and 1st dimension and all values of 2nd dimension 
x[0, 0, :] # same as x[0][0]

tensor([1, 2, 3])

Indexing can be quite confusing to begin with, especially with larger tensors (I still have to try indexing multiple times to get it right). But with a bit of practice and following the data explorer's motto (***visualize, visualize, visualize***), you'll start to get the hang of it.

## PyTorch tensors & NumPy

- Since NumPy is a popular Python numerical computing library, PyTorch has functionality to interact with it nicely.  

The two main methods you'll want to use for NumPy to PyTorch (and back again) are: 

* Creates a Tensor from a numpy.ndarray using: [`torch.from_numpy(ndarray)`](https://pytorch.org/docs/stable/generated/torch.from_numpy.html) 

  - NumPy array -> PyTorch tensor. 
  
  - The returned tensor and ndarray share the same memory. Modifications to the tensor will be reflected in the ndarray and vice versa. The returned tensor is not resizable.


* Returns the tensor as a NumPy ndarray using [`torch.Tensor.numpy()`](https://pytorch.org/docs/stable/generated/torch.Tensor.numpy.html) - PyTorch tensor -> NumPy array.
  
  - If force is False (the default), the conversion is performed only if the tensor is on the CPU, does not require grad, does not have its conjugate bit set, and is a dtype and layout that NumPy supports. The returned ndarray and the tensor will share their storage, so changes to the tensor will be reflected in the ndarray and vice versa.

  - If force is True this is equivalent to calling t.detach().cpu().resolve_conj().resolve_neg().numpy(). If the tensor isn’t on the CPU or the conjugate or negative bit is set, the tensor won’t share its storage with the returned ndarray. Setting force to True can be a useful shorthand.



Let's try them out.

In [238]:
# NumPy array to tensor
import torch
import numpy as np
array = np.arange(1.0, 8.0)

In [239]:
array

array([1., 2., 3., 4., 5., 6., 7.])

In [240]:
tensor = torch.from_numpy(array)
array, tensor

(array([1., 2., 3., 4., 5., 6., 7.]),
 tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64))

> **Note:** By default, NumPy arrays are created with the datatype `float64` and if you convert it to a PyTorch tensor, it'll keep the same datatype (as above). 
>
> However, many PyTorch calculations default to using `float32`. 
> 
> So if you want to convert your NumPy array (float64) -> PyTorch tensor (float64) -> PyTorch tensor (float32), you can use `tensor = torch.from_numpy(array).type(torch.float32)`.

Because we reassigned `tensor` above, if you change the tensor, the array stays the same.

In [241]:
tensor = torch.from_numpy(array).type(torch.float32)
tensor.dtype

torch.float32

And if you want to go from PyTorch tensor to NumPy array, you can call `tensor.numpy()`.

In [243]:
# Tensor to NumPy array
tensor = torch.ones(7) # create a tensor of ones with dtype=float32
tensor

tensor([1., 1., 1., 1., 1., 1., 1.])

In [244]:
numpy_tensor = tensor.numpy() # will be dtype=float32 unless changed
tensor, numpy_tensor

(tensor([1., 1., 1., 1., 1., 1., 1.]),
 array([1., 1., 1., 1., 1., 1., 1.], dtype=float32))

And the same rule applies as above, if you change the original `tensor`, the new `numpy_tensor` stays the same.

## Reproducibility (trying to take the random out of random)

- As you learn more about neural networks and machine learning, you'll start to discover how much randomness plays a part.

- How does this relate to neural networks and deep learning then? We've discussed neural networks start with random numbers to describe patterns in data (these numbers are poor descriptions) and try to improve those random numbers using tensor operations (and a few other things we haven't discussed yet) to better describe patterns in data.

In short: 

               ``start with random numbers -> tensor operations -> try to make better (again and again and again)``

- Although randomness is nice and powerful, sometimes you'd like there to be a little less randomness. Why?

- Let's see a brief example of reproducibility in PyTorch. We'll start by creating two random tensors, since they're random, you'd expect them to be different right? 

In [245]:
import torch

# Create two random tensors
random_tensor_A = torch.rand(3, 4)
random_tensor_B = torch.rand(3, 4)

print(f"Tensor A:\n{random_tensor_A}\n")
print(f"Tensor B:\n{random_tensor_B}\n")
print(f"Does Tensor A equal Tensor B? (anywhere)")
random_tensor_A == random_tensor_B

Tensor A:
tensor([[0.7539, 0.1952, 0.0050, 0.3068],
        [0.1165, 0.9103, 0.6440, 0.7071],
        [0.6581, 0.4913, 0.8913, 0.1447]])

Tensor B:
tensor([[0.5315, 0.1587, 0.6542, 0.3278],
        [0.6532, 0.3958, 0.9147, 0.2036],
        [0.2018, 0.2018, 0.9497, 0.6666]])

Does Tensor A equal Tensor B? (anywhere)


tensor([[False, False, False, False],
        [False, False, False, False],
        [False, False, False, False]])

### Controlling sources of randomness



- You can use `torch.manual_seed()` to seed the RNG for all devices (both CPU and CUDA):



In [246]:
import torch
import random

# # Set the random seed
RANDOM_SEED=42 # try changing this to different values and see what happens to the numbers below
torch.manual_seed(seed=RANDOM_SEED) 
random_tensor_C = torch.rand(3, 4)

# Have to reset the seed every time a new rand() is called 
# Without this, tensor_D would be different to tensor_C 
torch.random.manual_seed(seed=RANDOM_SEED) # try commenting this line out and seeing what happens
random_tensor_D = torch.rand(3, 4)

print(f"Tensor C:\n{random_tensor_C}\n")
print(f"Tensor D:\n{random_tensor_D}\n")
print(f"Does Tensor C equal Tensor D? (anywhere)")
random_tensor_C == random_tensor_D

Tensor C:
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])

Tensor D:
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])

Does Tensor C equal Tensor D? (anywhere)


tensor([[True, True, True, True],
        [True, True, True, True],
        [True, True, True, True]])

Nice!

- It looks like setting the seed worked. 

> **Resource:** What we've just covered only scratches the surface of reproducibility in PyTorch. For more, on reproducbility in general and random seeds, I'd checkout:
> * [The PyTorch reproducibility documentation](https://pytorch.org/docs/stable/notes/randomness.html) (a good exericse would be to read through this for 10-minutes and even if you don't understand it now, being aware of it is important).

> * [The Wikipedia random seed page](https://en.wikipedia.org/wiki/Random_seed) (this'll give a good overview of random seeds and pseudorandomness in general).

In [103]:
# Python
import random
random.seed(0)


# Numpy

import numpy as np
np.random.seed(0)


---

Pytorch pager

- Completely reproducible results are not guaranteed across PyTorch releases, individual commits, or different platforms. Furthermore, results may not be reproducible between CPU and GPU executions, even when using identical seeds.

---

## Running tensors on GPUs (and making faster computations)


- Doing your Deep Learning computations on the GPU speeds up your experiment by a lot! And PyTorch makes it ridiculously easy to do it. Let's start by checking if GPU is available:

- Deep learning algorithms require a lot of numerical operations. And by default these operations are often done on a CPU (computer processing unit).

- However, there's another common piece of hardware called a GPU (graphics processing unit), which is often much faster at performing the specific types of operations neural networks need (matrix multiplications) than CPUs.

- There are a few ways to first get access to a GPU and secondly get PyTorch to use the GPU.

> **Note:** When I reference "GPU" throughout this course, I'm referencing a [Nvidia GPU with CUDA](https://developer.nvidia.com/cuda-gpus) enabled (CUDA is a computing platform and API that helps allow GPUs be used for general purpose computing & not just graphics) unless otherwise specified.



> Pytorch provides a set of tools to build Neural Networks. Including several additional packages for working [with text](https://github.com/pytorch/text) or [images](https://github.com/pytorch/vision).


### 1. Getting a GPU

You may already know what's going on when I say GPU. But if not, there are a few ways to get access to one.

| **Method** | **Difficulty to setup** | **Pros** | **Cons** | **How to setup** |
| ----- | ----- | ----- | ----- | ----- |
| Google Colab | Easy | Free to use, almost zero setup required, can share work with others as easy as a link | Doesn't save your data outputs, limited compute, subject to timeouts | [Follow the Google Colab Guide](https://colab.research.google.com/notebooks/gpu.ipynb) |
| Use your own | Medium | Run everything locally on your own machine | GPUs aren't free, require upfront cost | Follow the [PyTorch installation guidelines](https://pytorch.org/get-started/locally/) |
| Cloud computing (AWS, GCP, Azure) | Medium-Hard | Small upfront cost, access to almost infinite compute | Can get expensive if running continually, takes some time ot setup right | Follow the [PyTorch installation guidelines](https://pytorch.org/get-started/cloud-partners/) |

- There are more options for using GPUs but the above three will suffice for now.

- Personally, I use a combination of Google Colab and my own personal computer for small scale experiments (and creating this course) and go to cloud resources when I need more compute power.

> **Resource:** If you're looking to purchase a GPU of your own but not sure what to get, [Tim Dettmers has an excellent guide](https://timdettmers.com/2020/09/07/which-gpu-for-deep-learning/).

> To check if you've got access to a Nvidia GPU, you can run `!nvidia-smi` where the `!` (also called bang) means "run this on the command line".



In [None]:
#!nvidia-smi

![](2022-12-07-16-35-59.png)

If you don't have a Nvidia GPU accessible, the above will output something like:

```
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
```

In that case, go back up and follow the install steps.

If you do have a GPU, the line above will output something like:

```
Wed Jan 19 22:09:08 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.46       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   35C    P0    27W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
```

### 2. Getting PyTorch to run on the GPU

- Once you've got a GPU ready to access, the next step is getting PyTorch to use for storing data (tensors) and computing on data (performing operations on tensors).

- To do so, you can use the [`torch.cuda`](https://pytorch.org/docs/stable/cuda.html) package.


- You can test if PyTorch has access to a GPU using [`torch.cuda.is_available()`](https://pytorch.org/docs/stable/generated/torch.cuda.is_available.html#torch.cuda.is_available).


In [247]:
# Check for GPU
import torch
torch.cuda.is_available()  # returns True if GPU is available

False

In [251]:
torch.backends.mps.is_available()  # check if apple Metal is available

True

### Device Agnostic 

- If the above outputs `True`, PyTorch can see and use the GPU, if it outputs `False`, it can't see the GPU and in that case, you'll have to go back through the installation steps.

- Now, let's say you wanted to setup your code so it ran on CPU *or* the GPU if it was available.

- That way, if you or someone decides to run your code, it'll work regardless of the computing device they're using. 

-  Let's create a `device` variable to store what kind of device is available.

In [254]:
# Set device type
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cpu'

In [253]:
x = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
x

device(type='cpu')

- If the above output `"cuda"` it means we can set all of our PyTorch code to use the available CUDA device (a GPU) and if it output `"cpu"`, our PyTorch code will stick with the CPU.

> **Note:** In PyTorch, it's best practice to write [**device agnostic code**](https://pytorch.org/docs/master/notes/cuda.html#device-agnostic-code). This means code that'll run on CPU (always available) or GPU (if available).

- If you want to do faster computing you can use a GPU but if you want to do *much* faster computing, you can use multiple GPUs.

- You can count the number of GPUs PyTorch has access to using [`torch.cuda.device_count()`](https://pytorch.org/docs/stable/generated/torch.cuda.device_count.html#torch.cuda.device_count).

In [None]:
# Count number of devices
# torch.cuda.device_count()

- Knowing the number of GPUs PyTorch has access to is helpful in case you wanted to run a specific process on one GPU and another process on another (PyTorch also has features to let you run a process across *all* GPUs).

In [255]:
device = 'cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu'

In [256]:
device

'mps'

### 3. Putting tensors (and models) on the GPU

- You can put tensors (and models, we'll see this later) on a specific device by calling [`to(device)`](https://pytorch.org/docs/stable/generated/torch.Tensor.to.html) on them. Where `device` is the target device you'd like the tensor (or model) to go to.

- Why do this?

- GPUs offer far faster numerical computing than CPUs do and if a GPU isn't available, because of our **device agnostic code** (see above), it'll run on the CPU.

> **Note:** Putting a tensor on GPU using `to(device)` (e.g. `some_tensor.to(device)`) returns a copy of that tensor, e.g. the same tensor will be on CPU and GPU. To overwrite tensors, reassign them:
>
> `some_tensor = some_tensor.to(device)`

Let's try creating a tensor and putting it on the GPU (if it's available).

In [259]:
# Create tensor (default on CPU)
tensor = torch.tensor([1, 2, 3])
tensor

tensor([1, 2, 3])

In [262]:
device = 'cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu'
device

'mps'

In [260]:
# Tensor not on GPU/mps
print(tensor, tensor.device)

tensor([1, 2, 3]) cpu


In [263]:
# Move tensor to GPU/mps (if available)
tensor_on_gpu = tensor.to(device)
tensor_on_gpu

tensor([1, 2, 3], device='mps:0')

If you have a GPU available, the above code will output something like:

```
tensor([1, 2, 3]) cpu
tensor([1, 2, 3], device='cuda:0')
```


---

> Notice the second tensor has `device='cuda:0'`, this means it's stored on the 0th GPU available (GPUs are 0 indexed, if two GPUs were available, they'd be `'cuda:0'` and `'cuda:1'` respectively, up to `'cuda:n'`).



### 4. Moving tensors back to the CPU

- What if we wanted to move the tensor back to CPU?

- For example, you'll want to do this if you want to interact with your tensors with NumPy (NumPy does not leverage the GPU).

- Let's try using the [`torch.Tensor.numpy()`](https://pytorch.org/docs/stable/generated/torch.Tensor.numpy.html) method on our `tensor_on_gpu`.

In [265]:
# If tensor is on GPU, can't transform it to NumPy (this will error)
#tensor_on_gpu.numpy() # if using GPU

In [266]:
tensor_on_gpu

tensor([1, 2, 3], device='mps:0')

> Instead, to get a tensor back to CPU and usable with NumPy we can use [`Tensor.cpu()`](https://pytorch.org/docs/stable/generated/torch.Tensor.cpu.html).

> This copies the tensor to CPU memory so it's usable with CPUs.

In [268]:
tensor_on_gpu.cpu() # convert to cpu then to numpy

tensor([1, 2, 3])

The above returns a copy of the GPU tensor in CPU memory so the original tensor is still on GPU.

In [127]:
tensor_on_gpu

tensor([1, 2, 3], device='mps:0')

Thank you for reading ! 

Pytorch tu: https://github.com/yunjey/pytorch-tutorial

https://github.com/MorvanZhou/PyTorch-Tutorial


https://github.com/L1aoXingyu/pytorch-beginner

https://github.com/yunjey/pytorch-tutorial

https://github.com/bharathgs/Awesome-pytorch-list


## Exercises

All of the exercises are focused on practicing the code above.

You should be able to complete them by referencing each section or by following the resource(s) linked.

**Resources:**

* [Exercise template notebook for 00](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/extras/exercises/00_pytorch_fundamentals_exercises.ipynb).
* [Example solutions notebook for 00](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/extras/solutions/00_pytorch_fundamentals_exercise_solutions.ipynb) (try the exercises *before* looking at this).

1. Documentation reading - A big part of deep learning (and learning to code in general) is getting familiar with the documentation of a certain framework you're using. We'll be using the PyTorch documentation a lot throughout the rest of this course. So I'd recommend spending 10-minutes reading the following (it's okay if you don't get some things for now, the focus is not yet full understanding, it's awareness). See the documentation on [`torch.Tensor`](https://pytorch.org/docs/stable/tensors.html#torch-tensor) and for [`torch.cuda`](https://pytorch.org/docs/master/notes/cuda.html#cuda-semantics).
2. Create a random tensor with shape `(7, 7)`.
3. Perform a matrix multiplication on the tensor from 2 with another random tensor with shape `(1, 7)` (hint: you may have to transpose the second tensor).
4. Set the random seed to `0` and do exercises 2 & 3 over again.
5. Speaking of random seeds, we saw how to set it with `torch.manual_seed()` but is there a GPU equivalent? (hint: you'll need to look into the documentation for `torch.cuda` for this one). If there is, set the GPU random seed to `1234`.
6. Create two random tensors of shape `(2, 3)` and send them both to the GPU (you'll need access to a GPU for this). Set `torch.manual_seed(1234)` when creating the tensors (this doesn't have to be the GPU random seed).
7. Perform a matrix multiplication on the tensors you created in 6 (again, you may have to adjust the shapes of one of the tensors).
8. Find the maximum and minimum values of the output of 7.
9. Find the maximum and minimum index values of the output of 7.
10. Make a random tensor with shape `(1, 1, 1, 10)` and then create a new tensor with all the `1` dimensions removed to be left with a tensor of shape `(10)`. Set the seed to `7` when you create it and print out the first tensor and it's shape as well as the second tensor and it's shape.