<a href="https://colab.research.google.com/github/ss1705/pytorch-exp/blob/main/PyTorch_0_Fundamentals.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
import torch
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
print(torch.__version__)

2.9.0+cpu


In [None]:
!nvidia-smi

/bin/bash: line 1: nvidia-smi: command not found


# Introduction to Tensors
tensors are a way to represent numeric data

## Creating tensors
tensors are created using ``torch.tensor()``

In [None]:
#scalar
scalar = torch.tensor(7)
scalar

tensor(7)

In [None]:
#no. of dimensions
scalar.ndim

0

In [None]:
#return the value as an integer
scalar.item()

7

In [None]:
#vector
vector = torch.tensor([7,7])
vector

tensor([7, 7])

vector has magnitude and direction

In [None]:
vector.ndim

1

In [None]:
vector.shape

torch.Size([2])

convention: lowercase letters to represent scalars and vectors, uppercase letters to represent matrices and tensors

### Matrix

In [None]:
#matrix
MATRIX = torch.tensor([[7,8],
                      [9,10]])
MATRIX

tensor([[ 7,  8],
        [ 9, 10]])

In [None]:
MATRIX.ndim

2

In [None]:
MATRIX[0]

tensor([7, 8])

In [None]:
MATRIX.shape

torch.Size([2, 2])

### Tensor

In [None]:
#tensor
TENSOR = torch.tensor([[[1,2,3],
                        [3,6,9],
                        [2,4,5]]])
TENSOR

tensor([[[1, 2, 3],
         [3, 6, 9],
         [2, 4, 5]]])

In [None]:
TENSOR.ndim

3

In [None]:
TENSOR.shape

torch.Size([1, 3, 3])

In [None]:
TENSOR[0]

tensor([[1, 2, 3],
        [3, 6, 9],
        [2, 4, 5]])

In [None]:
T = torch.tensor([[[1,2,3,4],
                   [5,6,7,8],
                   [9,10,11,12],
                   [13,14,15,16]]])
#guessing ndim and shape
# ndim = 3
# shape = 1,4,4

In [None]:
T.ndim

3

In [None]:
T.shape

torch.Size([1, 4, 4])

In [None]:
T = torch.tensor([[[[1,2],
                    [3,4],
                    [5,6]]]])
# shape = 1,1,3,2
# dim = 4

In [None]:
T.shape

torch.Size([1, 1, 3, 2])

In [None]:
T.ndim

4

## Random tensors
- important because the way neural nets learn is that they start with tensors full of random numbers and then adjust those random numbers to better represent the data

Procedure:
Start with random numbers -> look at data -> update random numbers -> look at data -> update random numbers

In [None]:
random_tensor = torch.rand(size=(3,4))
random_tensor, random_tensor.dtype

(tensor([[0.3397, 0.4684, 0.7643, 0.2856],
         [0.6090, 0.1071, 0.9829, 0.1856],
         [0.9981, 0.0513, 0.6295, 0.4463]]),
 torch.float32)

we can adjust the size to be whatever we want
eg: we want to represent height, width, color_channels like [244,244,3]

In [None]:
random_image_size_tensor = torch.rand(size=(244,244,3))
random_image_size_tensor.shape, random_image_size_tensor.ndim

(torch.Size([244, 244, 3]), 3)

## Zeros and ones

In [None]:
zeros = torch.zeros(size=(3,4))
zeros, zeros.dtype

(tensor([[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]]),
 torch.float32)

In [None]:
ones = torch.ones(size=(3,4))
ones, ones.dtype

(tensor([[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]),
 torch.float32)

## Creating a range

In [None]:
zero_to_ten = torch.arange(start=0,end=10,step=1)
zero_to_ten

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [None]:
ten_zeros = torch.zeros_like(input=zero_to_ten)
ten_zeros

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

## Tensor datatypes
- some specific for CPU, some better for GPU
- torch.cuda -> tensor being used for GPU [CUDA - computing toolkit used by NVIDIA GPUs]
- most common: torch.float32 / torch.float [32-bit floating point]
- also available: torch.float16/torch.half and torch.float64/torch.double
- 8-bit, 16-bit, 32-bit, 64-bit integers
- main idea: precision in computing
- precision - amount of detail used to describe a number
- higher precision value -> more detail and data used to express a number

In [None]:
float_32_tensor = torch.tensor([3.0,6.0,9.0],
                               dtype=None, #default - torch.float32
                               device=None, #default tensor type
                               requires_grad=False) #if True - operations performed on the tensor are recorded

In [None]:
float_32_tensor.shape, float_32_tensor.dtype, float_32_tensor.device

(torch.Size([3]), torch.float32, device(type='cpu'))

"operations are recorded"
When ``requires_grad=True``, PyTorch keeps track of every operation you perform on that tensor so it can later compute gradients automatically using autograd (automatic differentiation).

Say, PyTorch builds a computation graph behind the scenes. Each operation becomes a node in that graph and the graph records how the output depends on the input.

That record is then used during ``.backward()`` to compute gradients via the chain rule.

In [None]:
x = torch.tensor(2.0, requires_grad=True)
y = x*3
z = y**2

In [None]:
z.backward()
print(x.grad)

tensor(36.)


In practice:
- Model parameters -> requires_grad=True
- Inputs, labels, constants -> requires_grad=False
- Saves memory and computation

Common issues: shape, datatype, device
- tensors are expected to be of the same format
- also expects calculations between tensors to be on the same device

In [None]:
float_16_tensor= torch.rand(size=(3,4),dtype=torch.half)
float_16_tensor

tensor([[0.7646, 0.9702, 0.5420, 0.2446],
        [0.8237, 0.7251, 0.3291, 0.3643],
        [0.0762, 0.4272, 0.7178, 0.0107]], dtype=torch.float16)

## Information from tensors

three of the most common attributes - shape, dtype, device

whenever an issue, think: "what shape are my tensors? what datatype are they and where are they stored? what shape, what datatype, where where where"

## Manipulating tensors - tensor operations
- in DL, data represented as tensors
- model learns by investigating tensors and performing a series of operations on tensors to create a representation of the patterns in the input data

#### Operations:
- addition
- subtraction
- multiplication (element-wise)
- division
- matrix multiplication

In [None]:
tensor = torch.tensor([1,2,3])
tensor + 10

tensor([11, 12, 13])

In [None]:
tensor * 10

tensor([10, 20, 30])

values inside the tensor don't change until they're reassigned

In [None]:
tensor = tensor - 10
tensor

tensor([-9, -8, -7])

In [None]:
tensor = tensor + 10
tensor

tensor([1, 2, 3])

Built-in functions: ``torch.mul()`` and ``torch.add()``

In [None]:
torch.multiply(tensor,10)

tensor([10, 20, 30])

In [None]:
tensor

tensor([1, 2, 3])

original tensor remains unchanged

In [None]:
tensor * tensor

tensor([1, 4, 9])

## Matrix Multiplication
Two rules for matrix multiplication:
1. inner dimensions must match
2. resulting matrix has the shape of outer dimensions

symbol for matrix multiplication: @

In [None]:
tensor = torch.tensor([1,2,3])

In [None]:
tensor * tensor

tensor([1, 4, 9])

In [None]:
torch.matmul(tensor,tensor)

tensor(14)

In [None]:
tensor.matmul(tensor)

tensor(14)

In [None]:
tensor @ tensor

tensor(14)

### Shape errors
A -> 3,2

B -> 3,2

won't work

to make inner dimensions match -> transpose
- ``torch.transpose(input,dim0,dim1)``
- ``tensor.T``

note:

`torch.mul()` short for `torch.multiply()`

`torch.mm()` short for `torch.matmul()`

neural nets are full of matrix multiplication + dot products

=> ``torch.nn.Linear()`` - feed-forward/fully connected layer - implements matrix multiplication between input x and weights matrix A
- also: bias b and output y

In [None]:
torch.manual_seed(42)

<torch._C.Generator at 0x78934e1e7bf0>

- sets the random number generator seed for PyTorch
- every time you run the code with the same seed, PyTorch will generate the same sequence of random numbers

PRNG: pesudo-random number generator
- mathematical algorithm
- initial state (SEED!)
- produces long sequence of numbers that appear random
- same seed -> same sequence, different seed -> different sequence

so `torch.manual_seed()` sets the internal state of PyTorch's PRNG

In [None]:
torch.manual_seed(42)
print(torch.rand(5))

tensor([0.8823, 0.9150, 0.3829, 0.9593, 0.3904])


In [None]:
torch.manual_seed(42)
print(torch.rand(5))

tensor([0.8823, 0.9150, 0.3829, 0.9593, 0.3904])


In [None]:
torch.manual_seed(7)
print(torch.rand(5))

tensor([0.5349, 0.1988, 0.6592, 0.6569, 0.2328])


### Linear Layer Example

In [3]:
tensor_A = torch.tensor([[1,2],
                         [3,4],
                         [5,6]], dtype=torch.float32)

In [6]:
linear = torch.nn.Linear(in_features=2, #matches inner dimension of input
                         out_features=6) #describes outer value

3 x 2,
2 x 6
--> 3 x 6

In [7]:
x = tensor_A
output = linear(x)
print(f"Input shape: {x.shape}\n")
print(f"Output:\n{output}\n\nOutput shape:{output.shape}")

Input shape: torch.Size([3, 2])

Output:
tensor([[ 0.5430, -1.2292,  0.1520, -0.3943,  1.4224,  0.3815],
        [ 1.6011, -2.3247,  1.2338, -1.2677,  3.7767, -0.1548],
        [ 2.6591, -3.4202,  2.3157, -2.1410,  6.1311, -0.6911]],
       grad_fn=<AddmmBackward0>)

Output shape:torch.Size([3, 6])


- in_features -> number of features per input sample
- out_features -> number of output features produced per input sample
- input: (2,) - 3 of these
- output: (6,) - 3 of these
- each input vector of size 2 is transformed into a vector of size 6

## Tensor Aggregation - min, max, mean, sum

In [13]:
x = torch.arange(0,100,10)
x, x.dtype

(tensor([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90]), torch.int64)

In [11]:
torch.min(x), x.min()

(tensor(0), tensor(0))

In [12]:
torch.max(x), x.max()

(tensor(90), tensor(90))

In [15]:
torch.mean(x.type(torch.float64)), x.type(torch.float64).mean()

(tensor(45., dtype=torch.float64), tensor(45., dtype=torch.float64))

note: the `torch.mean()` function requires a tensor of float datatype to work

In [16]:
torch.sum(x), x.sum()

(tensor(450), tensor(450))

### Positional min, max - argmin, argmax

In [17]:
tensor = torch.arange(10,100,10)
tensor

tensor([10, 20, 30, 40, 50, 60, 70, 80, 90])

In [18]:
tensor.argmax(), tensor.argmin()

(tensor(8), tensor(0))

## Change tensor datatype

`torch.Tensor.type(dtype=None)`

In [19]:
tensor = torch.arange(10.,100.,10.)
tensor.dtype

torch.float32

In [20]:
tensor_float16 = tensor.type(torch.float16)
tensor_float16

tensor([10., 20., 30., 40., 50., 60., 70., 80., 90.], dtype=torch.float16)

In [21]:
tensor_int64 = tensor.type(torch.int64)
tensor_int64

tensor([10, 20, 30, 40, 50, 60, 70, 80, 90])

## Reshape, stack, squeeze, unsqueeze

`torch.reshape(input,shape)` - reshapes input to shape

`Tensor.view(shape)` - returns view of the original tensor, but shares same data

`torch.stack(tensors,dim=0)` - concatenates sequence of tensors along a new dimension (dim), all tensors must be same size

`torch.squeeze(input)` - squeezes input to remove all dimensions with value 1

`torch.unsqueeze(input,dim)` - returns input with a dimension value of 1 added at `dim`

`torch.permute(input,dims)` - returns view of original input with dimensions permuted to `dims`

In [22]:
x = torch.arange(1.,8.)
x, x.shape

(tensor([1., 2., 3., 4., 5., 6., 7.]), torch.Size([7]))

In [23]:
x_reshaped = x.reshape(1,7) #extra dimension added
x_reshaped, x_reshaped.shape

(tensor([[1., 2., 3., 4., 5., 6., 7.]]), torch.Size([1, 7]))

In [25]:
z = x.view(1,7) #data remains the same but changes view
z, z.shape

(tensor([[1., 2., 3., 4., 5., 6., 7.]]), torch.Size([1, 7]))

only creates a new view of the same tensor. changing the view changes the original tensor too.

In [26]:
z[:,0] = 5
z,x

(tensor([[5., 2., 3., 4., 5., 6., 7.]]), tensor([5., 2., 3., 4., 5., 6., 7.]))

changing z changes x

In [27]:
x_stacked = torch.stack([x,x,x,x],dim=0)
x_stacked

tensor([[5., 2., 3., 4., 5., 6., 7.],
        [5., 2., 3., 4., 5., 6., 7.],
        [5., 2., 3., 4., 5., 6., 7.],
        [5., 2., 3., 4., 5., 6., 7.]])

In [28]:
x_squeezed = x_reshaped.squeeze()
x_squeezed, x_squeezed.shape

(tensor([5., 2., 3., 4., 5., 6., 7.]), torch.Size([7]))

In [29]:
x_unsqueezed = x_squeezed.unsqueeze(dim=0)
x_unsqueezed, x_unsqueezed.shape

(tensor([[5., 2., 3., 4., 5., 6., 7.]]), torch.Size([1, 7]))

In [30]:
x_original = torch.rand(size=(224,224,3))
x_permuted = x_original.permute(2,0,1)
x_original.shape, x_permuted.shape

(torch.Size([224, 224, 3]), torch.Size([3, 224, 224]))

the numbers tells PyTorch which old axis becomes the new axis

so:
- old axes: 224 [0], 224 [1], 3 [2]
- new axes: 3, 224, 224

note: this is the CHW format, what PyTorch models expect

## Indexing

In [31]:
x = torch.arange(1,10).reshape(1,3,3)
x, x.shape

(tensor([[[1, 2, 3],
          [4, 5, 6],
          [7, 8, 9]]]),
 torch.Size([1, 3, 3]))

In [38]:
x.ndim

3

In [34]:
print(f"First sq. bracket:\n{x[0]}")

First sq. bracket:
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])


In [35]:
print(f"Second sq. bracket:\n{x[0][0]}")

Second sq. bracket:
tensor([1, 2, 3])


In [36]:
print(f"Third sq. bracket:\n{x[0][0][0]}")

Third sq. bracket:
1


In [37]:
#all values in dim 0, 0th index of dim 1
x[:,0]

tensor([[1, 2, 3]])

In [39]:
x[:,0,0]

tensor([1])

In [40]:
x[:,:,1]

tensor([[2, 5, 8]])

In [41]:
x[:,1,1]

tensor([5])

In [42]:
x[0,0,:]

tensor([1, 2, 3])