<a href="https://colab.research.google.com/github/sabaripkumar/digipen/blob/main/CET3052_Colab_PyTorch_Basics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introducing PyTorch for Neural Networks

Learning objectives

* Computational Graphs
* Tensors
* Operations with tensors
* Indexing, slicing, and joining
* Computing gradients
* Use CUDA tensors with GPUs (need GPU)


## Prerequisite Terms


1.   features
2.   targets
3.   models
4.   parameters
5.   hyperparameters
6.   predictions
7.   loss functions
8.   learning/training
9.   testing

## Computational Graph

* A computational graph provides a visual representation of the operations and data flow in a neural network. It enables efficient computation by breaking down deep neural network into smaller, manageable parts. This allows for optimized execution by PyTorch, which can optimize the computation by reordering operations or parallelizing them.

* It promotes modularity by allowing individual parts of the model to be defined separately and then connected. Parts mean linear computation functions, activation or loss functions.

* It is essentially a graph where nodes represent operations or calculations, and edges represent the tensors that flow between these parts.

The below computational graph represents a basic linear function in a neural network layer: $f(A,B,C) =(A*B)+C$.

<img src="https://miro.medium.com/v2/resize:fit:720/format:webp/1*HK6gaBlCJLQOTldCURi7qQ.gif">

The below computational graph represents a neural network layer consisting of a linear function, activation function and loss function.

<img src="https://miro.medium.com/v2/resize:fit:720/format:webp/1*d3uM1IwDZWqvEU2G0p0gxA.gif">

In [None]:
from IPython  import display
from IPython.display import HTML

iframe = '<iframe width="800" height="450" src="https://www.youtube.com/embed/hCP1vGoCdYU?si=O63OoSfNlDHjZENc" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>'

display.display(HTML(iframe))




## Static vs. Dynamic Computational Graph
Tensorflow and PyTorch are implemenation libraries that compute the above types of graphs. They use multithreads and graph theory to optimize graphs and reduce time and layers. But Tensorflow implements static computational graph and PyTorch implements Dynamic Computational Graph.

### **Static Computational Graph**

A static computational graph means the graph's structure is defined and compiled before it's run. Once compiled, it cannot be changed. This is what TensorFlow before v2.0, after that it embraced more dynamic graphs through Eager Execution. See a figure below as an example,

<img src="https://miro.medium.com/v2/resize:fit:504/format:webp/0*4UHwQnsmUjyD7VtW.gif">

#### Advantages

* Efficiency

Graph is optimized in compiliation for faster execution and less resource consumption.

* Portability

The graph can be saved, deployed, and run without the code that generated it.

* Visualization

Easy to visualize and debug using tools like TensorBoard.

#### Disadvantages

* Less Intuitive

Harder for Python programmers to debug and understand, as the code doesn't execute line by line as regular Python programs.

* Flexibility

Less flexible in changing the graph during runtime, making it difficult for dynamic models.

### Dynamic Computational Graph

A dynamic computational graph, also known as an "imperative" or "define-by-run" graph, is constructed on the fly during execution. There is no compiliation for graph. This approach is used by PyTorch.

#### Advantages

* Intuitiveness

More intuitive and pythonic. The graph is built as the code is run, making it easier to understand and debug.

* Flexibility

Easy to change and adapt the graph dynamically, which is particularly useful for models where the structure changes every iteration (e.g., with variable input lengths or recursive neural networks).

#### Disadvantages

* Overhead

The flexibility can come with a runtime overhead, as the graph needs to be built from scratch at each iteration.

* Optimization

Less opportunity for upfront optimization compared to static graphs.

## PyTorch Libraries

In [None]:
import torch
import numpy as np

torch.manual_seed(1234)

<torch._C.Generator at 0x7e7e3236ca50>

## Tensors

Tensors are N-D arrays of numbers. N means `rank`, can be 0, 1 or any number.

That is,

* Rank 0 tensor is also called `Scalar Tensor`. It is a single number.
* Rank 1 tensor is also called `Vector Tensor`. It is an array of numbers.
* Rank 2 tensor is also called `Matrix Tensor`. It is a 2-D array of numbers.

For example, a vector of dimension 64, a matrix of dimension (8,8) and 3D tensor of dimension (4,4,4) look like the below,

<img src="https://media.licdn.com/dms/image/D5612AQEBqlkZkHGO9g/article-cover_image-shrink_600_2000/0/1707758899801?e=2147483647&v=beta&t=DoTQT8iLr0ePf-efbAMn5rd5PmiXZ3Vj_Yaewnr16j0" width=300 height=250>

**Reference**

Dabhade, P. (2024). "*Exploring Tensors in PyTorch: A Beginner's Guide*". [Link](https://www.linkedin.com/pulse/exploring-tensors-pytorch-beginners-guide-pratik-dabhade-jpjjc/).

#### Creating Tensors

In [None]:
def describe(x):
    print("Type: {}".format(x.type()))
    print("Shape/size: {}".format(x.shape))
    print("Values: \n{}".format(x))

You can create tensors by specifying the shape as arguments. For example, create a tensor with 2 rows, 3 columns, values from a uniform distribution on the interval $[0,1)$.

In [None]:
describe(torch.Tensor(2, 3))

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[1.1050e-05, 8.3391e-10, 1.7282e-04],
        [1.4580e-19, 7.1429e+31, 1.5766e-19]])


**Ex.** Can you create a tensor with a dimension of (3, 4, 5)?

Create a tensor from the standard normal distribution.

In [None]:
describe(torch.randn(2, 3))

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[ 0.0461,  0.4024, -1.0115],
        [ 0.2167, -0.6123,  0.5036]])


It's common in prototyping to create a tensor with some values and a specific shape. For example, initialize a tensor with dimension of (2,3) and values of **ones** or **zeros**.

In [None]:
# Tensor with zeros
describe(torch.zeros(2, 3))

# Tensor with ones
x = torch.ones(2, 3)
describe(x)

# Any function with an underscore refers to an in-place operation.
x.fill_(5)
describe(x)

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0., 0., 0.],
        [0., 0., 0.]])
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[1., 1., 1.],
        [1., 1., 1.]])
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[5., 5., 5.],
        [5., 5., 5.]])


Note:

* Tensors can be initialized and then filled in place.

* Operations that end in an underscore (`_`) are in place operations.

In [None]:
# Create a tensor, then chained with functions
x = torch.Tensor(3,4).fill_(5)
describe(x)

Type: torch.FloatTensor
Shape/size: torch.Size([3, 4])
Values: 
tensor([[5., 5., 5., 5.],
        [5., 5., 5., 5.],
        [5., 5., 5., 5.]])


Tensors can be initialized from a list of lists

In [None]:
x = torch.Tensor([[1, 2,],
                  [2, 4,]])
describe(x)

Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values: 
tensor([[1., 2.],
        [2., 4.]])


Tensors can be initialized from numpy matrices. It is important to convert between NumPy arrays and PyTorch tensors.

In [None]:
npy = np.random.rand(2, 3)
describe(torch.from_numpy(npy))
print(npy.dtype)

Type: torch.DoubleTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0.8716, 0.4140, 0.1549],
        [0.7337, 0.1130, 0.8740]], dtype=torch.float64)
float64


**Ex.** What is the difference between numpy array and tensor?

**NumPy Arrays** are designed for CPU use.

**Tensors** are designed to run on both CPUs and GPUs, facilitating massive parallel computing.

There are other differences...


**Ex.** Can Python's DataFrame be transformed to Tensor? Or if it's Yes, in what condition?

In [None]:
# Example DataFrame
df = pd.DataFrame({
    'col1': [1, 2, 3],
    'col2': [4, 5, 6],
    'col3': [7, 8, 9]
})

# Convert DataFrame to NumPy array
numpy_array = df.values

# Convert NumPy array to PyTorch tensor
tensor = torch.tensor(?)

# Print the tensor
print(tensor)

SyntaxError: invalid syntax (<ipython-input-10-5e2a0470e8bb>, line 12)

#### Tensor Types

The FloatTensor has been the default tensor that we have been creating so far. There are other base types. For example, integer.

In [None]:
x = torch.arange(6).view(2, 3)
describe(x)

Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5]])


##### Constructors

Use contructors - FloatTensor, LongTensor to create directly. Or use the default constructor with **typecasting** method, `dtype`.

In [None]:
x = torch.FloatTensor([[1, 2, 3],
                       [4, 5, 6]])
describe(x)

x = x.long()
describe(x)

x = torch.tensor([[1, 2, 3],
                  [4, 5, 6]], dtype=torch.int64)
describe(x)

x = x.float()
describe(x)

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[1., 2., 3.],
        [4., 5., 6.]])
Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[1, 2, 3],
        [4, 5, 6]])
Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[1, 2, 3],
        [4, 5, 6]])
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[1., 2., 3.],
        [4., 5., 6.]])


### Tensor Operations

#### Math Operations

In [None]:
x = torch.randn(2, 3)
describe(x)

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[ 0.2310,  0.6931, -0.2669],
        [ 2.1785,  0.1021, -0.2590]])


In [None]:
# plus
describe(x + x)
# add func
describe(torch.add(x, x))

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[ 0.4619,  1.3862, -0.5337],
        [ 4.3569,  0.2043, -0.5180]])
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[ 0.4619,  1.3862, -0.5337],
        [ 4.3569,  0.2043, -0.5180]])


#### Concatenation and Joining

In [None]:
x = torch.arange(6)
describe(x)

Type: torch.LongTensor
Shape/size: torch.Size([6])
Values: 
tensor([0, 1, 2, 3, 4, 5])


In [None]:
# Reshape
x = x.view(2, 3)
describe(x)

Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5]])


In [None]:
# Concatenation
describe(torch.cat([x, x], dim=0)) # add as new rows
describe(torch.cat([x, x], dim=1)) # add as new columns

Type: torch.LongTensor
Shape/size: torch.Size([4, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5],
        [0, 1, 2],
        [3, 4, 5]])
Type: torch.LongTensor
Shape/size: torch.Size([2, 6])
Values: 
tensor([[0, 1, 2, 0, 1, 2],
        [3, 4, 5, 3, 4, 5]])


In [None]:
describe(x)
describe(torch.sum(x, dim=0)) # sum along rows
describe(torch.sum(x, dim=1)) # sum along columns

Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5]])
Type: torch.LongTensor
Shape/size: torch.Size([3])
Values: 
tensor([3, 5, 7])
Type: torch.LongTensor
Shape/size: torch.Size([2])
Values: 
tensor([ 3, 12])


In [None]:
describe(x)
describe(torch.transpose(x, 0, 1))

Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5]])
Type: torch.LongTensor
Shape/size: torch.Size([3, 2])
Values: 
tensor([[0, 3],
        [1, 4],
        [2, 5]])


In [None]:
describe(x)

# Slice
describe(x[:1, :2])
# Access one cell
describe(x[0, 1])

Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5]])
Type: torch.LongTensor
Shape/size: torch.Size([1, 2])
Values: 
tensor([[0, 1]])
Type: torch.LongTensor
Shape/size: torch.Size([])
Values: 
1


`torch.index_select`:

This function selects elements from a tensor along a specified dimension using an index tensor.
Usage: torch.index_select(input, dim, index)

In [None]:
tensor = torch.tensor([[1, 2], [3, 4], [5, 6]])
describe(tensor)
print("\n")

indices = torch.tensor([0, 2])
describe(indices)
print("\n")

selected = torch.index_select(tensor, 0, indices)
describe(selected)

Type: torch.LongTensor
Shape/size: torch.Size([3, 2])
Values: 
tensor([[1, 2],
        [3, 4],
        [5, 6]])


Type: torch.LongTensor
Shape/size: torch.Size([2])
Values: 
tensor([0, 2])


Type: torch.LongTensor
Shape/size: torch.Size([2, 2])
Values: 
tensor([[1, 2],
        [5, 6]])


In [None]:
row_indices = torch.arange(2).long()
col_indices = torch.LongTensor([0, 1])

print(row_indices)
print(col_indices)
print("\n")

x = torch.randn([2, 3])
describe(x)
print("\n")

describe(x[row_indices, col_indices])

tensor([0, 1])
tensor([0, 1])


Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[-0.1549, -1.3706, -0.1319],
        [ 0.8848, -0.2611,  0.6104]])


Type: torch.FloatTensor
Shape/size: torch.Size([2])
Values: 
tensor([-0.1549, -0.2611])


Long Tensors are used for indexing operations and mirror the `int64` numpy type

In [None]:
x = torch.LongTensor([[1, 2, 3],
                      [4, 5, 6],
                      [7, 8, 9]])
describe(x)
print(x.dtype)
print(x.numpy().dtype)

Type: torch.LongTensor
Shape/size: torch.Size([3, 3])
Values: 
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
torch.int64
int64


You can convert a FloatTensor to a LongTensor

In [None]:
x = torch.FloatTensor([[1, 2, 3],
                       [4, 5, 6],
                       [7, 8, 9]])
x = x.long()
describe(x)

Type: torch.LongTensor
Shape/size: torch.Size([3, 3])
Values: 
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])


#### Special Tensor initializations

We can create a vector of incremental numbers

In [None]:
x = torch.arange(0, 10)
describe(x)

Type: torch.LongTensor
Shape/size: torch.Size([10])
Values: 
tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


Sometimes it's useful to have an integer-based arange for indexing

In [None]:
x = torch.arange(0, 10).long()
describe(x)

Type: torch.LongTensor
Shape/size: torch.Size([10])
Values: 
tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


#### Matrix Addition

* Note: Reshaping allows you to move the numbers in a tensor around.  One can be sure that the order is preserved.  In PyTorch, reshaping is called `view`

In [None]:
x = torch.arange(0, 20)
describe(x)
print("\n")
print(x.view(1, 20))
print(x.view(2, 10))
print(x.view(4, 5))
print(x.view(5, 4))
print(x.view(10, 2))
print(x.view(20, 1))

Type: torch.LongTensor
Shape/size: torch.Size([20])
Values: 
tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
        18, 19])


tensor([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
         18, 19]])
tensor([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]])
tensor([[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]])
tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15],
        [16, 17, 18, 19]])
tensor([[ 0,  1],
        [ 2,  3],
        [ 4,  5],
        [ 6,  7],
        [ 8,  9],
        [10, 11],
        [12, 13],
        [14, 15],
        [16, 17],
        [18, 19]])
tensor([[ 0],
        [ 1],
        [ 2],
        [ 3],
        [ 4],
        [ 5],
        [ 6],
        [ 7],
        [ 8],
        [ 9],
        [10],
        [11],
        [12],
        [13],
        [1

Computation between different dimensions.

$X_{3 \times 4} + Y_{1 \times 4}$ and $X_{3 \times 4} + Z_{3 \times 1}$ are both legitimate operations.

In [None]:
x = torch.arange(12).view(3, 4)
y = torch.arange(4).view(1, 4)
z = torch.arange(3).view(3, 1)

print(f"x=\n{x}\n")
print(f"y=\n{y}\n")
print(f"z=\n{z}\n")

print("\n")

print(f"x + y = \n{x+y}\n")
print(f"x + z = \n{x+z}\n")

x=
tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])

y=
tensor([[0, 1, 2, 3]])

z=
tensor([[0],
        [1],
        [2]])



x + y = 
tensor([[ 0,  2,  4,  6],
        [ 4,  6,  8, 10],
        [ 8, 10, 12, 14]])

x + z = 
tensor([[ 0,  1,  2,  3],
        [ 5,  6,  7,  8],
        [10, 11, 12, 13]])



Unsqueeze and squeeze will add and remove 1-dimension.

In [None]:
x = torch.arange(12).view(3, 4)
describe(x)
print("--")
x = x.unsqueeze(dim=1)
describe(x)
print("--")
x = x.squeeze()
describe(x)

Type: torch.LongTensor
Shape/size: torch.Size([3, 4])
Values: 
tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
--
Type: torch.LongTensor
Shape/size: torch.Size([3, 1, 4])
Values: 
tensor([[[ 0,  1,  2,  3]],

        [[ 4,  5,  6,  7]],

        [[ 8,  9, 10, 11]]])
--
Type: torch.LongTensor
Shape/size: torch.Size([3, 4])
Values: 
tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])


The convention of `_` indicating in-place operations continues:

In [None]:
x = torch.arange(12).reshape(3, 4)
print(x)
print("--")
print(x.add_(x))

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
--
tensor([[ 0,  2,  4,  6],
        [ 8, 10, 12, 14],
        [16, 18, 20, 22]])


There are many operations to reduce a dimension.  Such as sum:

In [None]:
x = torch.arange(12).reshape(3, 4)
print("x: \n", x)
print("---")
print("Summing across columns (dim=0): \n", x.sum(dim=0))
print("---")
print("Summing across rows (dim=1): \n", x.sum(dim=1))

x: 
 tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
---
Summing across columns (dim=0): 
 tensor([12, 15, 18, 21])
---
Summing across rows (dim=1): 
 tensor([ 6, 22, 38])


#### Matrix Multiplication

A three dimensional tensor would represent a batch of sequences, where each sequence item has a feature vector.  It is common to switch the batch and sequence dimensions so that we can more easily index the sequence in a sequential model.

For example, a training batch consists of 3 sentences with each sentence having 2 words, and each words being represented by 5 features. That is,

`batch_size = 3,
seq_size = 2,
feature_size = 5`

<img src="https://www.tensorflow.org/static/guide/images/tensor/index1.png">

In [None]:
batch_size = 3
seq_size = 2
feature_size = 5

x = torch.arange(batch_size * seq_size * feature_size).view(batch_size, seq_size, feature_size)

print("x.shape: \n", x.shape)
print("x: \n", x)
print("-----")

print("x.transpose(1, 0).shape: \n", x.transpose(1, 0).shape)
print("x.transpose(1, 0): \n", x.transpose(1, 0))

x.shape: 
 torch.Size([3, 2, 5])
x: 
 tensor([[[ 0,  1,  2,  3,  4],
         [ 5,  6,  7,  8,  9]],

        [[10, 11, 12, 13, 14],
         [15, 16, 17, 18, 19]],

        [[20, 21, 22, 23, 24],
         [25, 26, 27, 28, 29]]])
-----
x.transpose(1, 0).shape: 
 torch.Size([2, 3, 5])
x.transpose(1, 0): 
 tensor([[[ 0,  1,  2,  3,  4],
         [10, 11, 12, 13, 14],
         [20, 21, 22, 23, 24]],

        [[ 5,  6,  7,  8,  9],
         [15, 16, 17, 18, 19],
         [25, 26, 27, 28, 29]]])


Permute transpose more than 2 dimensions.

In [None]:
batch_size = 3
seq_size = 2
feature_size = 5

x = torch.arange(batch_size * seq_size * feature_size).view(batch_size, seq_size, feature_size)

print("x.shape: \n", x.shape)
print("x: \n", x)
print("-----")

print("x.permute(1, 0, 2).shape: \n", x.permute(1, 0, 2).shape)
print("x.permute(1, 0, 2): \n", x.permute(1, 0, 2))

x.shape: 
 torch.Size([3, 2, 5])
x: 
 tensor([[[ 0,  1,  2,  3,  4],
         [ 5,  6,  7,  8,  9]],

        [[10, 11, 12, 13, 14],
         [15, 16, 17, 18, 19]],

        [[20, 21, 22, 23, 24],
         [25, 26, 27, 28, 29]]])
-----
x.permute(1, 0, 2).shape: 
 torch.Size([2, 3, 5])
x.permute(1, 0, 2): 
 tensor([[[ 0,  1,  2,  3,  4],
         [10, 11, 12, 13, 14],
         [20, 21, 22, 23, 24]],

        [[ 5,  6,  7,  8,  9],
         [15, 16, 17, 18, 19],
         [25, 26, 27, 28, 29]]])


Matrix multiplication uses function - `mm`. For matrix multiplication, the number of columns in the first matrix must be equal to the number of rows in the second matrix. The result matrix has the number of rows of the first and the number of columns of the second matrix.. For example,

$ \mathbf{X1}_{2 \times 3} \times  \mathbf{X2}_{3 \times 5} → \mathbf{X}_{2 \times 5}$

If you forget about the dimension of a matrix, can refer to [Dimension of a Matrix – Explanation & Examples](https://www.storyofmathematics.com/dimension-of-a-matrix/). For matrix multiplication, refer to [Wikipedia - Matrix multiplication](https://en.wikipedia.org/wiki/Matrix_multiplication).

In [None]:
x1 = torch.arange(6).view(2, 3).float()
describe(x1)
print("---")

x2 = torch.ones(3, 5).float()
describe(x2)
print("---")

x2[:, 1] += 1
describe(x2)
print("---")

describe(torch.mm(x1, x2))

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0., 1., 2.],
        [3., 4., 5.]])
---
Type: torch.FloatTensor
Shape/size: torch.Size([3, 5])
Values: 
tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]])
---
Type: torch.FloatTensor
Shape/size: torch.Size([3, 5])
Values: 
tensor([[1., 2., 1., 1., 1.],
        [1., 2., 1., 1., 1.],
        [1., 2., 1., 1., 1.]])
---
Type: torch.FloatTensor
Shape/size: torch.Size([2, 5])
Values: 
tensor([[ 3.,  6.,  3.,  3.,  3.],
        [12., 24., 12., 12., 12.]])


See the [PyTorch Math Operations Documentation](https://pytorch.org/docs/stable/torch.html#math-operations) for more!

## Computing Gradients

The neural network takes training data and keeps updating parameter values for each neuron's computation in the computation graph. For example,

$y^{̂}=w · x + b$,

where $w$ is called parameter or the weight of the linear function of predicting true $y$. $b$ is a scaler to represent random noises generated in the dataset. Our goal is to decrease $Loss = ||y^{̂} - y||^{2}$ by updating $w$ using gradient value:
$$ \frac{\partial Loss}{\partial w}$$

<img src="https://i.ytimg.com/vi/b4Vyma9wPHo/maxresdefault.jpg"></img>

Reference:
https://youtu.be/b4Vyma9wPHo?si=uoyccnCmIvjzE1Ed

In [None]:
iframe = '<iframe width="800" height="500" src="https://www.youtube.com/embed/nJyUyKN-XBQ?si=ZSuVgL4AkZ6y27ko" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>'
display.display(HTML(iframe))



**Ex.**  Suppose $y=x^2$, in this case, gradient is $dy=2x$. Use `backward` to calculate gradient as below,

In [None]:
# Set up a tensor by turning on gradient calculation on it.
x = torch.tensor(1.0, requires_grad=True)
w = torch.tensor(2.0, requires_grad=True)
b = torch.tensor(3.0, requires_grad=True)

# Compute the predicted y = 2x+3
y = w * x + b

# Compute the gradient of y with respect to x, dy/dx = a
y.backward()

print("dy/dx = ", x.grad)
print("dy/dw = ", w.grad)
print("dy/db = ", b.grad)

dy/dx =  tensor(2.)
dy/dw =  tensor(1.)
dy/db =  tensor(1.)


**Ex.** Find the gradient of $f(x)$ at $x=1$,

$$f(x)=\left\{
\begin{array}{ll}
    sin(x) \text{ if } x>0 \\
    cos(x) \text{ otherwise } \\
\end{array}
\right.$$

In [None]:
def f(x):
    if (x.data > 0).all():
        return torch.sin(x)
    else:
        return torch.cos(x)

In [None]:
x = torch.tensor([1.0], requires_grad=True)
y = f(x)
print('x: ', x)
# y = sin(1)
print('y: ', y)
print('---')

y.backward()
# dy/dx = cos(x) = cos(1) = 0.54030230586
print(x.grad)

x:  tensor([1.], requires_grad=True)
y:  tensor([0.8415], grad_fn=<SinBackward0>)
---
tensor([0.5403])


We could apply this to a larger vector too, but we need to make sure the output is a scalar. The example below has an error, you need to fix it before running.

In [None]:
x = torch.tensor([1.0, 0.5], requires_grad=True)
y = f(x)
print('x: ', x)
print('y: ', y)
print('---')

# this is meant to break! can you fix it??
y.backward()

print(x.grad)

x:  tensor([1.0000, 0.5000], requires_grad=True)
y:  tensor([0.8415, 0.4794], grad_fn=<SinBackward0>)
---


RuntimeError: grad can be implicitly created only for scalar outputs

Solution: making the output `y` a scalar.

### Gradient function for matrix
The matrix operation `y = (x + 2).mean()` is equal to,

- $y = \frac{1}{4} \sum_{i=1}^{n} (x_i + 2)$.

- The gradient $ \frac{\partial y}{\partial x_i} $ for each element $ x_i $ can be calculated as:
  $
  \frac{\partial y}{\partial x_i} = \frac{1}{4} \times \frac{\partial (x_i + 2)}{\partial x_i} = \frac{1}{4} \times 1 = 0.25
  $

In [None]:
# x = [[1, 1],
#      [1, 1]]
x = torch.ones(2, 2, requires_grad=True)
print(f"x = {x}")
print("---")

y = (x+2).mean()

y.backward()

print(f"x.grad={x.grad}")
print("---")

print(f"y.grad_fn={y.grad_fn}")

x = tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
---
x.grad=tensor([[0.2500, 0.2500],
        [0.2500, 0.2500]])
---
y.grad_fn=<MeanBackward0 object at 0x7e7d6b2877c0>


## CUDA Tensors

In [None]:
iframe = '<iframe width="800" height="500" src="https://www.youtube.com/embed/pPStdjuYzSI?si=amQKbW4j0eiPBv6H" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>'
display.display(HTML(iframe))



PyTorch's operations can seamlessly be used on the GPU or on the CPU.  There are a couple basic operations for interacting in this way. From here, you will need to enable CUDA either from your own Jupyter Notebook on laptop or from Colab.

In [None]:
print(torch.cuda.is_available())

In [None]:
x = torch.rand(3,3)
describe(x)

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

In [None]:
x = torch.rand(3, 3).to(device)
describe(x)
print(x.device)

In [None]:
cpu_device = torch.device("cpu")

In [None]:
# this will break!
y = torch.rand(3, 3)

print(x.device)
print(y.device)

# This will break and fix it.
x + y

In [None]:
if torch.cuda.is_available(): # only is GPU is available
    a = torch.rand(3,3).to(device='cuda:0') #  CUDA Tensor
    print(a)

    b = torch.rand(3,3).cuda()
    print(b)

    print(a + b)

    # Error expected, need to fix it.
    a = a.cpu()

    print(a + b)

### Exercises

Some of these exercises might require addtional operations at [PyTorch documentation](https://pytorch.org/docs/).

#### Exercise 1

Create a 2D tensor `t` (3x3) and then add a dimension of size 1 inserted at the 0th axis to reshape to (1x3x3).

#### Exercise 2

Remove the extra dimension you just added to the previous tensor `t` of a dimension (3x3).

#### Exercise 4

Create a tensor t with dimension (3x3), and values from a normal distribution (mean=0, std=1).

#### Exercise 5

Retrieve the indexes of all the non zero elements in `tensor=([1, 1, 1, 0, 1])`.

#### Exercise 6

Create a random tensor of size (3,1) and then horizonally stack 4 copies together.

#### Exercise 7

Return the batch matrix-matrix product of two 3 dimensional matrices (a=torch.rand(3,4,5), b=torch.rand(3,5,4)). You can think of the first dimension of a and b is batch_size.

#### Exercise 8

Return the batch matrix-matrix product of a 3D matrix and a 2D matrix (a=torch.rand(3,4,5), b=torch.rand(5,4)). b can be taken as one batch of a matrix data with dimension(5x4).