## What is PyTorch?

1. PyTorch is a Python-based scientific computing package.
2. It utilizes GPU acceleration for high-performance computing.

#### Why is PyTorch Popular?

1. It is a preferred platform for deep learning research.
2. Offers maximum flexibility and speed for model development.

#### Key Features of PyTorch:

1. Tensor computations with strong GPU acceleration.
2. Deep neural network building using a tape-based autograd system.

__A tape-based autograd system__ in PyTorch refers to its dynamic computation graph, which records operations as they happen and allows efficient backpropagation for computing gradients.

1. __What is Autograd?__
- Autograd (short for automatic differentiation) is PyTorch’s automatic differentiation engine.
- It keeps track of operations on tensors and allows automatic gradient computation.
2. __What is a Tape-Based Autograd System?__
- PyTorch records (or "writes") operations on a computational tape while executing them.
- When you call .backward(), PyTorch reads the tape in reverse to compute gradients using the chain rule.
- This is also called define-by-run, meaning the graph is dynamically built during execution.


## Brief History about PyTorch

1. __Adoption and Popularity of PyTorch__
-> Released in January 2016, PyTorch has been increasingly adopted by researchers. </br>
-> It has become a go-to library for building complex neural networks.
Competing strongly with TensorFlow, especially in research.
Still considered new and evolving, so mass adoption is ongoing.
2. __Design Philosophy__
-> PyTorch was designed to be imperative, meaning it runs computations dynamically. </br>
-> This approach fits well with Python, making it intuitive and easy to use. </br>
-> Scientists, ML developers, and debuggers can test parts of their code in real-time. </br>
Unlike static computation graphs, PyTorch does not require re-executing the entire code for debugging.
3. __Extensibility with Python Libraries__
-> PyTorch seamlessly integrates with Python libraries such as: </br>
    NumPy (numerical computing) </br>
    SciPy (scientific computing) </br>
    Cython (performance optimization) </br>
4. __Why PyTorch for Deep Learning?__
-> Highly dynamic – Models can be modified on the go. </br>
-> Flexible – Adapts to different research needs and experiments. </br>
-> Widely used in the AI community – Adopted by researchers, students, and developers. </br>
5. __Real-World Competitions & Adoption__
-> In a recent Kaggle competition, nearly all of the top 10 finishers used PyTorch.
-> Shows strong real-world effectiveness and industry adoption.

Some of the key highlights of PyTorch includes:

__Simple Interface:__ It offers easy to use API, thus it is very simple to operate and run like Python.

__Pythonic in nature:__ This library, being Pythonic, smoothly integrates with the Python data science stack. Thus it can leverage all the services and functionalities offered by the Python environment.

__Computational graphs:__ In addition to this, PyTorch provides an excellent platform which offers dynamic computational graphs, thus you can change them during runtime. This is highly useful when you have no idea how much memory will be required for creating a neural network model.

#### Practice Questions
Q1)Write a PyTorch code snippet to create a tensor with requires_grad=True, perform a simple operation, and compute its gradient dynamically.

Q2) Show how PyTorch integrates with NumPy by converting a NumPy array into a PyTorch tensor and back.

## Why we use PyTorch in research field?

Anyone who is working in the field of deep learning and artificial intelligence has likely worked with TensorFlow before, Google’s most popular open source library. However, the latest deep learning framework – PyTorch solves major problems in terms of research work. Arguably PyTorch is TensorFlow’s biggest competitor to date, and it is currently a much favored deep learning and artificial intelligence library in the research community.


You might be thinking why we use PyTorch? I list down the three factors for that

- Easy-to-use API – PyTorch’s syntax is intuitive and Pythonic.
- It is highly favored by the research community due to its flexibility.
- Dynamic computation graph – Allows on-the-fly model changes without redefining the entire graph.

## CPU vs GPU: Key Differences and Use Cases
1. Core Architecture

- CPU (Central Processing Unit):
  - Has fewer but powerful compute cores.
  - Optimized for handling single-threaded or lightly parallel tasks.

- GPU (Graphics Processing Unit):
  - Contains thousands of smaller, less powerful cores.
  - Designed for massive parallelization, handling many tasks simultaneously.

2. Performance Differences

- CPU excels at:
  - General-purpose computing (OS operations, application execution).
  - Tasks requiring low latency and sequential processing.
  - Complex decision-making, branching, and control-heavy tasks.
- GPU excels at:
  - Deep learning, AI, and scientific computing (e.g., matrix multiplications).
  - High-performance tasks that involve parallel computation (e.g., image processing, video rendering).
  - Training deep neural networks efficiently.

3. Use Cases in Deep Learning

- CPU is better for:
  - Model inference (especially on small-scale tasks).
  - Data preprocessing before feeding data into the model.
- GPU is essential for:
  - Training deep learning models (significantly speeds up computation).
  - Running large-scale AI applications (like image classification and NLP models).

#### Practice Questions
Write a PyTorch code snippet to check if GPU is available and print the device name.

### Install

<mark style="background-color: Blue">__In CPU__</mark>

__For Windows__

* Install PyTorch using conda
      conda install pytorch torchvision cpuonly -c pytorch

* Using pip
      pip3 install torch==1.4.0+cpu torchvision==0.5.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
        
__For Mac__

* Using conda
      conda install pytorch torchvision -c pytorch
    
* Using pip
      pip3 install torch torchvision
    
__For Linux__

* Using conda
      conda install pytorch torchvision cpuonly -c pytorch
    
* Using pip
      pip3 install torch==1.4.0+cpu torchvision==0.5.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
      
      
      
<mark style="background-color: Green">__In GPU__</mark>

__For Windows__

* Install PyTorch using conda cuda=9.2 and Python=3.6
      conda install pytorch torchvision cudatoolkit=9.2 -c pytorch
     
* Using conda cuda=10.1 and Python=3.6
      conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
     
* Install Pytorch using pip cuda=9.2 and Python=3.6
      pip3 install torch==1.4.0+cu92 torchvision==0.5.0+cu92 -f https://download.pytorch.org/whl/torch_stable.html
     
* Using pip cuda=10.1 and Python=3.6
      pip3 install torch torchvision
     
__For Linux__

* Install PyTorch using conda cuda=9.2 and Python=3.6
      conda install pytorch torchvision cudatoolkit=9.2 -c pytorch
     
* Using conda cuda=10.1 and Python=3.6
      conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
     
* Install Pytorch using pip cuda=9.2 and Python=3.6
      pip3 install torch==1.4.0+cu92 torchvision==0.5.0+cu92 -f https://download.pytorch.org/whl/torch_stable.html
     
* Using pip cuda=10.1 and Python=3.6
      pip3 install torch torchvision
     
__For Mac__

* Install PyTorch using conda  for cuda=9.2 and 10.1 we can use same command and Python=3.6

      conda install pytorch torchvision -c pytorch
         # MacOS Binaries dont support CUDA, install from source if CUDA is needed
       
* Install Pytorch using pip for cuda=9.2 and 10.1 we can use same command and Python=3.6

      pip3 install torch torchvision
         # MacOS Binaries dont support CUDA, install from source if CUDA is needed
      
You have to run all these commands in __Anaconda Prompt__ , if you want to install in a notebook just put " ! " mark before the command like: !pip3 install torch==1.4.0+cpu torchvision==0.5.0+cpu -f https://download.pytorch.org/whl/torch_stable.html

For more information about Installation you can go through this site : "https://pytorch.org/"

# Tensors

Tensor is similar to Numpy's ndarray, the additional point for Tensors in we can use it in GPUs to accelerate computing.

In [None]:
from __future__ import print_function
import torch

## Initializing a Tensor

Tensors can be initialized in various ways. Take a look at the following examples:



## Directly from data

Tensors can be created directly from data. The data type is automatically inferred.

In [None]:
data = [[1, 2],[3, 4]]
x_data = torch.tensor(data)

In [None]:
x_data

tensor([[1, 2],
        [3, 4]])

In [None]:
type(data)

list

#### Practice Questions

Q) Create a 3x3 tensor with all elements equal to 5.


Q 2) Convert a Python list [[10, 20], [30, 40]] into a PyTorch tensor.

## From a NumPy array

Tensors can be created from NumPy arrays (and vice versa - see `bridge-to-np-label`).



In [None]:
import numpy as np
np_array = np.array(data)
x_np = torch.from_numpy(np_array)

In [None]:
x_np

tensor([[1, 2],
        [3, 4]])

#### Practice Questions
1.  Convert a NumPy array of shape (2,3) filled with random numbers into a PyTorch tensor.

2. Modify the NumPy array and observe whether the PyTorch tensor also changes.

3. Convert a PyTorch tensor to a NumPy array and change a value. Observe the behavior.

## From another tensor:

The new tensor retains the properties (shape, datatype) of the argument tensor, unless explicitly overridden.



In [None]:
x_ones = torch.ones_like(x_data) # retains the properties of x_data
print(f"Ones Tensor: \n {x_ones} \n")

x_rand = torch.rand_like(x_data, dtype=torch.float) # overrides the datatype of x_data
print(f"Random Tensor: \n {x_rand} \n")

Ones Tensor: 
 tensor([[1, 1],
        [1, 1]]) 

Random Tensor: 
 tensor([[0.9169, 0.5625],
        [0.0861, 0.5702]]) 



## With random or constant values:

``shape`` is a tuple of tensor dimensions. In the functions below, it determines the dimensionality of the output tensor.



In [None]:
shape = (2,3,)
rand_tensor = torch.rand(shape)
ones_tensor = torch.ones(shape)
zeros_tensor = torch.zeros(shape)

print(f"Random Tensor: \n {rand_tensor} \n")
print(f"Ones Tensor: \n {ones_tensor} \n")
print(f"Zeros Tensor: \n {zeros_tensor}")

Random Tensor: 
 tensor([[0.3702, 0.5540, 0.4668],
        [0.9475, 0.5353, 0.7006]]) 

Ones Tensor: 
 tensor([[1., 1., 1.],
        [1., 1., 1.]]) 

Zeros Tensor: 
 tensor([[0., 0., 0.],
        [0., 0., 0.]])


#### Practice Questions
 1. Create a tensor of shape (4,4) filled with random numbers.

2. Create a tensor of shape (3,3) with all values set to 7.

## Different Types of Tensors

In [None]:
# --- 1D Tensor ---
# A 1D tensor is like a vector: a simple list of numbers.
tensor_1d = torch.tensor([1.0, 2.0, 3.0])
print("1D Tensor:", tensor_1d)

1D Tensor: tensor([1., 2., 3.])


In [None]:
# --- 2D Tensor ---
# A 2D tensor is like a matrix with rows and columns.
tensor_2d = torch.tensor([[1.0, 2.0, 3.0],
                            [4.0, 5.0, 6.0]])
print("\n2D Tensor:\n", tensor_2d)


2D Tensor:
 tensor([[1., 2., 3.],
        [4., 5., 6.]])


In [None]:
# --- 3D Tensor ---
# A 3D tensor can be seen as a stack of matrices.
# Here, we create a 3D tensor with shape (2, 3, 4): 2 blocks, each of size 3x4.
tensor_3d = torch.randn(2, 3, 4)  # random numbers from a normal distribution
print("\n3D Tensor:\n", tensor_3d)


3D Tensor:
 tensor([[[ 0.5809,  0.7947,  2.2037, -1.7549],
         [-1.2282, -0.2222,  0.3969, -0.0517],
         [ 0.0241,  0.3391, -1.4028, -1.4311]],

        [[ 1.6425,  1.1303,  1.0618,  0.2898],
         [ 0.2379, -0.4366, -0.3007, -1.3163],
         [-0.1011, -0.1400, -0.6425, -1.8458]]])


__Note:__ Uninitialized matrix is declared, but doesn't contain definite known values before it is used. When we created an Unintialized matrix, whatever values were allocated inside the memory will apear as the initial values.

Construct 6x3 matrix, uninitialized:

In [None]:
# Create an uninitialized tensor (its values are whatever happens to be in memory)
a = torch.empty(6,3)
print(a)

tensor([[-2.3269e-02,  4.3098e-41, -2.3269e-02],
        [ 4.3098e-41,  4.4842e-44,  0.0000e+00],
        [ 8.9683e-44,  0.0000e+00,  5.2973e-34],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00],
        [ 1.5695e-43,  0.0000e+00,  8.9683e-44],
        [ 0.0000e+00,  5.2976e-34,  0.0000e+00]])


Construct a randomly initialized matrix:

In [None]:
# Randomly initialized tensor
a = torch.rand(4,3)
print(a)

tensor([[0.7956, 0.4431, 0.9687],
        [0.4870, 0.0318, 0.1460],
        [0.4865, 0.5054, 0.7244],
        [0.2810, 0.0887, 0.3486]])


Construct a matrix filled zeros and of dtype long:

In [None]:
# Tensor of zeros with a specific type (long integers)
a = torch.zeros(4,3, dtype=torch.long)
print(a)

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])


Construct a tensor with data:

In [None]:
# Creating a tensor from data and then using properties from an existing tensor:
a = torch.tensor([7.8, 5])
type(a)

torch.Tensor

or we can create new tensor with existing tensor.These methods we reuse its properties of input tensor, e.g. dtype, unless new values are provided by us.

In [None]:
# Create a new tensor filled with ones with the same device and (optionally) similar dtype.
a = a.new_ones(6,5, dtype=torch.double)    # new methods take in sizes
print(a)
# Overriding dtype when creating a similar-sized tensor with random numbers
a = torch.randn_like(a, dtype=torch.float)  # override dtype
print(a)                                    # result will be the same size

tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]], dtype=torch.float64)
tensor([[-1.1253, -1.6310, -1.5303, -2.2885,  0.3037],
        [ 1.1000,  0.8689, -0.3259, -0.0997,  0.6859],
        [ 0.4407,  1.3825, -0.0378, -0.4161, -0.3931],
        [-0.2030, -0.8292,  0.6071, -2.3429,  1.5022],
        [-0.0529,  0.3151,  1.0868, -0.5790,  0.0233],
        [-0.8545,  0.3893, -0.2799,  0.8890,  0.4490]])


#### Practice Questions
Q) Given a tensor x = torch.tensor([[3, 4], [5, 6]]), create a new tensor filled with zeros that has the same shape and dtype as x.

Q) Write a PyTorch code snippet to create a random tensor of the same size as x but with dtype torch.float64.

 Q) Given a = torch.ones(5, 5), create a new tensor filled with twos that has the same dtype and device as a.

## Attributes of a Tensor

Tensor attributes describe their shape, datatype, and the device on which they are stored.



Let's get the size:

In [None]:
print(a.size())

torch.Size([6, 5])


Note: <mark style="background-color: Yellow">torch_size</mark> is actually a tuple, so it supports all tuple operations.

In [None]:
tensor = torch.rand(3,4)

print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}")
print(f"Device tensor is stored on: {tensor.device}")

Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


#### Practice Questions
1. Create a 5x5 tensor of random values and check its data type.

2. Move a tensor to GPU if available.

3. Create a tensor on the GPU directly and move it back to CPU.

# Operations

There are multiple syntaxes for operations. In the following examples, we used addition operation,

Addition: syntax1

In [None]:
b = torch.rand(6,5)
print(a + b)

tensor([[-0.9014, -1.1094, -1.4968, -1.7054,  0.7191],
        [ 1.7385,  1.6336, -0.3041, -0.0910,  0.9552],
        [ 0.4912,  1.5289,  0.5098, -0.1031,  0.5646],
        [ 0.2225, -0.1237,  1.4326, -1.6106,  2.1531],
        [ 0.4764,  0.7488,  1.7340, -0.4951,  0.8451],
        [-0.3344,  0.6405,  0.7127,  1.6160,  1.1687]])


Addition: syntax2

In [None]:
print(torch.add(a,b))

tensor([[-0.9014, -1.1094, -1.4968, -1.7054,  0.7191],
        [ 1.7385,  1.6336, -0.3041, -0.0910,  0.9552],
        [ 0.4912,  1.5289,  0.5098, -0.1031,  0.5646],
        [ 0.2225, -0.1237,  1.4326, -1.6106,  2.1531],
        [ 0.4764,  0.7488,  1.7340, -0.4951,  0.8451],
        [-0.3344,  0.6405,  0.7127,  1.6160,  1.1687]])


Addition: providing an output as an argument

In [None]:
result = torch.empty(6, 5)
torch.add(a, b, out=result)
print(result)

tensor([[-0.9014, -1.1094, -1.4968, -1.7054,  0.7191],
        [ 1.7385,  1.6336, -0.3041, -0.0910,  0.9552],
        [ 0.4912,  1.5289,  0.5098, -0.1031,  0.5646],
        [ 0.2225, -0.1237,  1.4326, -1.6106,  2.1531],
        [ 0.4764,  0.7488,  1.7340, -0.4951,  0.8451],
        [-0.3344,  0.6405,  0.7127,  1.6160,  1.1687]])


Addition: in place

In [None]:
# adds a to b
b.add_(a)
print(b)

tensor([[-0.9014, -1.1094, -1.4968, -1.7054,  0.7191],
        [ 1.7385,  1.6336, -0.3041, -0.0910,  0.9552],
        [ 0.4912,  1.5289,  0.5098, -0.1031,  0.5646],
        [ 0.2225, -0.1237,  1.4326, -1.6106,  2.1531],
        [ 0.4764,  0.7488,  1.7340, -0.4951,  0.8451],
        [-0.3344,  0.6405,  0.7127,  1.6160,  1.1687]])


__Note:__ Any operation that mutates a tensor in-place is post-fixed with an <mark style="background-color: red">_.</mark> For example: a.copy_(b), a.b_(), will change a.

#### Practice Questions
1. Perform element-wise multiplication of two tensors of shape (3,3).

2. Subtract one tensor from another using different PyTorch syntax options.

We can use standard NumPy-like indexing with all bells and whistles!

In [None]:
print(a[:,2])

tensor([-1.5303, -0.3259, -0.0378,  0.6071,  1.0868, -0.2799])


Resizing: We can resize or reshape tensor, use <mark style="background-color: Yellow">tensor.view</mark> for that:

In [None]:
a = torch.randn(2, 3)
b = a.view(6)
c = a.view(-1, 6)  # the size -1 is inferred from other dimensions
print(a)
print(b)
print(c)
print(a.size(), b.size(), c.size())

tensor([[0.6051, 0.2225, 0.3018],
        [0.6804, 0.2246, 0.3197]])
tensor([0.6051, 0.2225, 0.3018, 0.6804, 0.2246, 0.3197])
tensor([[0.6051, 0.2225, 0.3018, 0.6804, 0.2246, 0.3197]])
torch.Size([2, 3]) torch.Size([6]) torch.Size([1, 6])


If you have one value tensor, use <mark style="background-color: Yellow">.item()</mark> to get the value of the Python number

In [None]:
a = torch.randn(1)
print(a)
print(a.item())

tensor([0.3004])
0.3004308044910431


#### Practice Questions:
 1. Reshape a tensor of shape (2,6) into shape (3,4)


2. Flatten a 3D tensor into a 1D tensor.

# NumPy Bridge

Converting a Torch Tensor to NumPy array and vice versa is breeze.

The Torch Tensor and NumPy array will share their underlying memory locations (if the Torch Tensor is on CPU), and changing one will change the other.

## Converting a Torch tensor to NumPy array

In [None]:
x = torch.ones(4)
print(x)

tensor([1., 1., 1., 1.])


In [None]:
y = x.numpy()
print(y)

[1. 1. 1. 1.]


See how numpy array changed in value

In [None]:
x.add_(1)
print(x)
print(y)

tensor([2., 2., 2., 2.])
[2. 2. 2. 2.]


####Practice Questions
1. Convert a NumPy array of shape (3,3) to a PyTorch tensor and back to a NumPy array.

2. Change a PyTorch tensor and check if its NumPy counterpart also changes.

## Converting NumPy array to Torch tensor

lets see how changing the numpy array changed the Torch Tensor automatically

In [None]:
import numpy as np
f = np.ones(4)
g = torch.from_numpy(f)
np.add(f, 1, out=f)
print(f)
print(g)

[2. 2. 2. 2.]
tensor([2., 2., 2., 2.], dtype=torch.float64)


All the Tensors on the CPU except a CharTensor support converting to NumPy and back.

__CUDA Tensors__

Tensors can be moved onto any device using the <mark style="background-color: Yellow">.to</mark> method.

In [None]:
# let us run this cell only if CUDA is available
# We will use ``torch.device`` objects to move tensors in and out of GPU
if torch.cuda.is_available():
    device = torch.device("cuda")          # a CUDA device object
    y = torch.ones_like(x, device=device)  # directly create a tensor on GPU
    x = x.to(device)                       # or just use strings ``.to("cuda")``
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))       # ``.to`` can also change dtype together!

####Practice Questions
1. Check if CUDA is available and move a tensor to the GPU.


2. Perform a matrix multiplication on GPU and move the result back to CPU.

3. Create a large tensor (1000x1000) on GPU, compute its mean, and move the result to CPU.

# AUTOGRAD : Automatic Differentiaition

Definition -

  - This class is an engine to calculate derivatives. It records the graph of all the operations performed on a gradient      enabled tensor and creates a acyclic graph called the dynamic computational graph(DCG).The leaves of this graph are input tensors and the roots are output tensors. Gradients are calculated by tracing the graph from the root to the leaf and multiplying every gradient in the way using the chain rule.

Let us see this in some more easy terms with some examples

## Tensor

<mark style="background-color: Yellow">torch.Tensor</mark> is the central class of the package.If we set its attribute <mark style="background-color: Yellow">.requires_grad</mark> as <mark style="background-color: dark grey">True</mark>, it starts to track all operations on it. When you finish your computation you can call <mark style="background-color: Yellow">.backward()</mark> and have all the gradients computed automatically. The gradient of this tensor will be accumulated into <mark style="background-color: Yellow">.grad</mark> attribute.

To stop a tensor from tracking history, you can call <mark style="background-color: Yellow">.detach()</mark> to detach it from the computation history, and to prevent future computation from being tracked.


To prevent tracking history(and using memory), you can also wrap the code block in with <mark style="background-color: Yellow">torch.no_grad():</mark>. This can be particularly helpful when evaluating a model because the model may have trainable parameters with <mark style="background-color: Yellow">requires_grad=True</mark>, but for which we don’t need the gradients.

There's one more class which is very important in autograd implementation - a <mark style="background-color: Yellow">Function</mark>

<mark style="background-color: Yellow">Tensor</mark> and <mark style="background-color: Yellow">Function</mark> are interconnected and build up an acyclic graph, that encodes a complete history of computation. Each tensor has a <mark style="background-color: Yellow">.grad_fn</mark> attribute that references a Function that has created the Tensor (except for Tensors created by the user - their <mark style="background-color: Yellow">grad_fn is None</mark>).

If you want to compute the derivatives, you can call <mark style="background-color: Yellow">.backward()</mark> on a <mark style="background-color: Yellow">Tensor</mark>. If <mark style="background-color: Yellow">Tensor</mark> is a scalar (i.e. it holds a one element data), you don’t need to specify any arguments to <mark style="background-color: Yellow">backward()</mark>, however if it has more elements, you need to specify a <mark style="background-color: Yellow">gradient</mark> argument that is a tensor of matching shape.

In [None]:
import torch

Create a tensor and set requires_grad=True to track computation with it.

In [None]:
a = torch.ones(2, 2, requires_grad=True)
print(a)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)


Do a tensor operation:

In [None]:
b = a + 2
print(b)

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)


b was created as a result of an operatio, so it has a <mark style="background-color: Yellow">grad_fn</mark>

In [None]:
print(b.grad_fn)

<AddBackward0 object at 0x7823bb260370>


Do more operation on b

In [None]:
c = b * b * 3
out = c.mean()

print(c, out)

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>)


<mark style="background-color: Yellow">.requires_grad_( ... )</mark>  changes an existing Tensor’s <mark style="background-color: Yellow">requires_grad</mark> flag in-place. The input flag defaults to False if not given.

In [None]:
p = torch.randn(3, 3)
p = ((p * 3) / (p - 1))
print(p.requires_grad)
p.requires_grad_(True)
print(p.requires_grad)
q = (p * p).sum()
print(q.grad_fn)

False
True
<SumBackward0 object at 0x7823bb2619c0>


####Practice Questions
1. Create a tensor with requires_grad=True and perform some operations on it.

2. Perform backpropagation on a scalar output and print the gradients.

3. Disable gradient tracking temporarily and show how it affects computation.

## Gradients

Let's backdrop now. Because <mark style="background-color: magenta">out</mark> contains a single scalar, <mark style="background-color: Yellow">out.backword</mark> is equivalent to  <mark style="background-color: Yellow">out.backward(torch.tensor(1.))</mark>.

In [None]:
out.backward()

In [None]:
out

tensor(27., grad_fn=<MeanBackward0>)

Print gradients d(out)/dx

In [None]:
print(a.grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


You should have got a matrix of 4.5. Let’s call the out Tensor “o”. We have that

$o= \frac{1}{4} \sum c_i $ $c_i= 3(a_i+2)^2$  and $c_i|_{a_i=1}= 27$ .

Therefore, $\frac{\partial_0}{\partial a_i}=\frac{3(x_i+2)}{2}$,

hence
$\frac{\partial_0}{\partial a_i}|_{a_i=1} = \frac{9}{2} = 4.5$



Now let's have a look at an example of vector-Jacobian product:

####Practice Questions
 1. Create a tensor with requires_grad=True, perform operations, and compute gradients.

2. Create a function f(x) = 3x^3 + 2x^2 - 4x + 5 and compute its gradient at x=2.

3. Compute gradients for a 2D tensor and verify results.

# Mathematically - Jacobians and Vectors in PyTorch Autograd

## 1. What is a Jacobian Matrix?
- A **Jacobian matrix (J)** represents all **possible partial derivatives** of one vector with respect to another.
- It defines the **gradient of a vector-valued function** with respect to its inputs.

## 2. Understanding the Jacobian Matrix
- Suppose we have an **input vector**:

  $$
  X = [x_1, x_2, ..., x_n]
  $$

- We pass it through a function \( f(X) \) that transforms it into another vector:

  $$
  Y = f(X) = [y_1, y_2, ..., y_m]
  $$

- The **Jacobian matrix (J)** contains all the **partial derivatives** of each \( y_i \) with respect to \( x_j \):

  $$
  J_{ij} = \frac{\partial y_i}{\partial x_j}
  $$

- This matrix **describes how each element in \( Y \) changes with respect to \( X \)**.

## 3. Application in PyTorch Autograd
- PyTorch’s **autograd engine** computes gradients **efficiently using the Jacobian-vector product**.
- Instead of computing the full Jacobian (which can be large), PyTorch efficiently computes:

  $$
  v^T J
  $$

  where **\( v \)** is an external vector called `grad_tensor`.

## 4. Vector-Jacobian Product in PyTorch
- Assume that a PyTorch tensor \( X \) requires gradients (`requires_grad=True`).
- After some operations, it results in a new vector \( Y = f(X) \).
- To compute the **gradient of a scalar loss \( l \) with respect to \( X \)**, PyTorch performs:

  $$
  \frac{dl}{dX} = v^T J
  $$

- The vector \( v \) (external gradient) is passed to `backward()`.

## 5. Why Use Vector-Jacobian Products?
- It enables PyTorch to **handle non-scalar outputs** efficiently.
- Instead of explicitly forming the **full Jacobian matrix**, PyTorch **computes gradients efficiently by multiplying with an external vector**.
- This approach allows **faster computation in deep learning models**.


Now let's have a look at an example of vector-Jacobian product:

In [None]:
x = torch.ones(3, requires_grad=True)

y = x * 2
# while y.detach().norm() < 1000:
#     y = y * 2
print(x)
print(y)

tensor([1., 1., 1.], requires_grad=True)
tensor([2., 2., 2.], grad_fn=<MulBackward0>)


Now in this case y is no longer a scalar.<mark style="background-color: Yellow">torch.autograd(torch.tensor(1.))</mark> could not compute the full Jacobian directly, but if we just want the vector-Jacobian product, simply pass the vector to backward as argument:

In [None]:
v = torch.tensor([1.0, 1.0, 1.0], dtype=torch.float)
y.backward(v)

print(x.grad)

tensor([2., 2., 2.])


You can also stop autograd from tracking history on Tensors with <mark style="background-color: Yellow">.requires_grad=True</mark> either by wrapping the code block in with <mark style="background-color: Yellow">torch.no_grad()</mark>:

In [None]:
print(x.requires_grad)
print((x ** 2).requires_grad)

with torch.no_grad():
    print((x ** 2).requires_grad)

True
True
False


Or by using .detach() to get a new Tensor with the same content but that does not require gradients:

In [None]:
print(x.requires_grad)
y = x.detach()
print(y.requires_grad)
print(x.eq(y).all())

True
False
tensor(True)


####Practice Questions
1. Compute the Jacobian matrix for a given function using PyTorch.

2. Compute the Hessian matrix for a function using PyTorch.

3. Use vector-Jacobian product (VJP) to compute gradients efficiently