In [1]:
import torch 
import torchvision 
import torch.nn as nn 
import torchvision.transforms as transforms
import numpy as np 

### Tensors 
Tensors are a specialized data structure that are very similar to arrays and matrices. In PyTorch, we use tensors to encode the inputs and outputs of a model, as well as the model’s parameters.

Tensors are similar to NumPy’s ndarrays, except that tensors can run on GPUs or other specialized hardware to accelerate computing

In [2]:
# Tensor Initialization 

In [3]:
# Directly from data
x = [[1,2],[3,4]]
x_tensor = torch.tensor(x)
print(x_tensor)

tensor([[1, 2],
        [3, 4]])


In [4]:
# From np array 
x_array = np.array(x) 
x_array_tensor = torch.tensor(x_array)
print(x_array_tensor)

tensor([[1, 2],
        [3, 4]])


In [5]:
# from another tensor 
x_ones = torch.ones_like(x_tensor) 
print(x_ones)

x_rand = torch.rand_like(x_tensor, dtype=torch.float) # overrides the datatype of x
print(x_rand)

x_zeroes = torch.zeros_like(x_tensor)
print(x_zeroes)

tensor([[1, 1],
        [1, 1]])
tensor([[0.7202, 0.9920],
        [0.0935, 0.5687]])
tensor([[0, 0],
        [0, 0]])


In [6]:
# with random or constant value 
shape = (2, 3,)
rand_tensor = torch.rand(shape)
ones_tensor = torch.ones(shape)
zeros_tensor = torch.zeros(shape)

print(f"Random Tensor: \n {rand_tensor} \n")
print(f"Ones Tensor: \n {ones_tensor} \n")
print(f"Zeros Tensor: \n {zeros_tensor}")

Random Tensor: 
 tensor([[0.3847, 0.3175, 0.9163],
        [0.8538, 0.9047, 0.5371]]) 

Ones Tensor: 
 tensor([[1., 1., 1.],
        [1., 1., 1.]]) 

Zeros Tensor: 
 tensor([[0., 0., 0.],
        [0., 0., 0.]])


In Python, adding a trailing comma after the last item in a tuple is a matter of style and doesn't affect the functionality. However, in certain contexts, it can improve code readability and make it easier to maintain, especially when modifying the tuple by adding or removing elements.

The version with the trailing comma `(2, 3,)` can be considered a good practice in some coding styles or guidelines because it makes it clear that the tuple has more than one element, even if there's only one element present. This can prevent errors when adding more elements to the tuple in the future, as you won't need to remember to add a comma after the last element.

In summary, while it's not strictly necessary, adding a trailing comma after the last item in a tuple can be considered a good practice for consistency and readability in Python code. However, whether or not to use it ultimately depends on the coding style guide you or your team follows.

## Tensor Attributes 

In [7]:
tensor = torch.rand(3, 4)

print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}")
print(f"Device tensor is stored on: {tensor.device}")

Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


### Basic autograds 

Gradients are quite important for the optimization purpose. PyTorch provides package which can do all the computation task.

In [8]:
x = torch.randn(3)
print(x)

tensor([ 1.5194, -1.1760,  0.1726])


In [9]:
x = torch.randn(3, requires_grad = True) # by default requires_grad = false 
print(x)

tensor([ 0.8898, -0.9223, -2.1185], requires_grad=True)


In [10]:
y = x + 2 # will create the computational graph for us 
print(y)

tensor([ 2.8898,  1.0777, -0.1185], grad_fn=<AddBackward0>)


Here, AddBackward can be seen as we have done the backpropagation. 

In [11]:
z = y*y*2 
print(z)

tensor([16.7020,  2.3231,  0.0281], grad_fn=<MulBackward0>)


Here, MulBackward can be seen as we are doing multiplication operation. 

In [12]:
z = z.mean()
print(z) 

tensor(6.3510, grad_fn=<MeanBackward0>)


In [13]:
# to calculate the gradient 
z.backward() # dz/dx
print(x.grad) # x will store the gradient value 

tensor([ 3.8531,  1.4370, -0.1579])


In [14]:
# grad can only be implicitly created only for scalar outputs
x = torch.randn(3, requires_grad = True ) 
y = x + 2 

z = y*y*2 
# z.mean()
print(z) # z is not a scalar value 

tensor([ 4.8862, 23.5957, 10.5685], grad_fn=<MulBackward0>)


In [15]:
# to calculate gradient 
z.backward()
print(x.grad)

RuntimeError: grad can be implicitly created only for scalar outputs

In [16]:
# solution is multipying it with an vector 
v = torch.tensor([0.1, 1.0, 0.001], dtype = torch.float32)
z.backward(v)
print(x.grad)

tensor([6.2522e-01, 1.3739e+01, 9.1950e-03])


##### Preventing Gradient History
There are three ways to do prevent gradient history and tracking the computational history. They are: 
1. x_requires_grad_(False)
2. x.detach()
3. with torch.no_grad():

##### 1. x_requires_grad_(False)

In [17]:
x = torch.randn(3, requires_grad = True ) 
print(x)

tensor([-0.0869, -1.6616, -0.6585], requires_grad=True)


In [18]:
x.requires_grad_(False) 
"""
Whenever a function has a trailing _, this means that it will modify our variable in place. 
"""

'\nWhenever a function has a trailing _, this means that it will modify our variable in place. \n'

In [19]:
print(x)

tensor([-0.0869, -1.6616, -0.6585])


##### 2. x.detach()

In [20]:
y = x.detach()
print(y) 

tensor([-0.0869, -1.6616, -0.6585])


This will create new tensor which also does not have requires_grad> 

##### 3. with toch.no_grad()

In [21]:
with torch.no_grad():
    y =  x + 2
    print(y)

tensor([1.9131, 0.3384, 1.3415])


This also does not have the gradient attribute. 


Now,
In PyTorch, when you call the backward() function on a tensor that is part of a computation graph, it computes the gradients of some scalar value (usually a loss) with respect to that tensor, using automatic differentiation techniques like backpropagation.

When you perform backpropagation, PyTorch accumulates the gradients of the parameters (usually weights) of your model in their respective .grad attributes. These gradients are not overwritten on subsequent calls to backward(), but rather accumulated. This behavior is useful, for example, when you have multiple losses in your model and you want to accumulate gradients from each loss before updating the model parameters.

In [22]:
# example 
weight = torch.ones(4, requires_grad = True) 

for epoch in range(1):
    model_output = (weight*3).sum()

    model_output.backward()
    print(weight.grad)

tensor([3., 3., 3., 3.])


In [23]:
weight = torch.ones(4, requires_grad = True) 

for epoch in range(2):
    model_output = (weight*3).sum()

    model_output.backward()
    print(weight.grad)

tensor([3., 3., 3., 3.])
tensor([6., 6., 6., 6.])


In [24]:
weight = torch.ones(4, requires_grad = True) 

for epoch in range(3):
    model_output = (weight*3).sum()

    model_output.backward()
    print(weight.grad)

tensor([3., 3., 3., 3.])
tensor([6., 6., 6., 6.])
tensor([9., 9., 9., 9.])


All the values are summed up and our weights are clearly incorrect. 

In [25]:
# Before we do next iteration and optimization step , we must empty the gradient. 
weight = torch.ones(4, requires_grad = True) 

for epoch in range(3):
    model_output = (weight*3).sum()

    model_output.backward()
    print(weight.grad)
    weight.grad.zero_()

tensor([3., 3., 3., 3.])
tensor([3., 3., 3., 3.])
tensor([3., 3., 3., 3.])


### Backpropagation

In [33]:
x = torch.tensor(1.0)
y = torch.tensor(2.0)

w = torch.tensor(1.0, requires_grad = True)

In [34]:
# forward pass and compute the loss
y_hat = w*x 
loss = (y_hat - y )**2

print(loss)

tensor(1., grad_fn=<PowBackward0>)


In [35]:
# backward pass
loss.backward()
print(w.grad)

tensor(-2.)


In [36]:
# update weights 
### next forward and backward 