# 1. What is PyTorch?

PyTorch is an open-source machine learning library developed by Facebook's AI Research lab.

It provides a flexible and dynamic computational graph, which makes it particularly suited for research and prototyping.

PyTorch is known for its simplicity and ease of use, making it a popular choice for both beginners and experienced researchers in the field of deep learning.

**Key Features**:

  * **Dynamic Computational Graph**: A **dynamic computation graph is constructed in real-time during runtime**, while a static graph is pre-defined before execution. The adaptability of dynamic graphs facilitates complex architectures and simplifies debugging, whereas static graphs can optimize performance but may lack flexibility for certain model structures.
  * **Tensors**: PyTorch operates on multi-dimensional arrays called tensors, which are similar to NumPy's ndarrays but **with the ability to run on GPUs**.
  * **Neural Network Module**: PyTorch **provides torch.nn module** to define and train neural networks.
  * **GPU Acceleration**: PyTorch **supports GPU acceleration**, allowing computations to be performed much faster than on a CPU.

# 2. Why use PyTorch?

While NumPy is an essential tool for numerical operations in Python, PyTorch offers additional features tailored for deep learning and neural network research:

**Autograd: Automatic Differentiation**

* The most significant advantage of PyTorch over NumPy is its autograd system. **Autograd automatically computes the gradients or derivatives of operations**, which is essential for training neural networks using gradient-based optimization algorithms.

* In deep learning, we often need to calculate the gradient of a loss function with respect to model parameters. **Manual computation can be error-prone and tedious**. Autograd simplifies this by automatically calculating gradients.

##  Example:

Let's say we have a simple operation like $y = x^2$ as an example.

NumPy (without Autograd)

In [None]:
import numpy as np

# Define a value
x_np = np.array([2.0])

# Manually compute the gradient
dy_dx = 2 * x_np

print(dy_dx)

[4.]


PyTorch (with Autograd)

In [None]:
import torch

# Define a tensor and set requires_grad=True to track computation with it
x_pt = torch.tensor([2.0], requires_grad=True)

# Define an operation
y = x_pt ** 2

# Compute gradients
y.backward()

# Display gradient
print(x_pt.grad)

tensor([4.])


While this is a simple example, imagine having to compute gradients for complex operations and architectures manually.

**Disabling Gradient Tracking**

By default, all tensors with **requires_grad=True** are tracking their computational history and support gradient computation. However, there are some cases when we do not need to do that, for example, when we have trained the model and just want to apply it to some input data, i.e. we only want to do forward computations through the network. We can stop tracking computations by surrounding our computation code with **torch.no_grad()** block:

In [None]:
x = torch.ones(5)  # input tensor
y = torch.zeros(3)  # expected output
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)

print(x)
print(y)
print(w)
print(b)

tensor([1., 1., 1., 1., 1.])
tensor([0., 0., 0.])
tensor([[-0.0740,  0.9387, -0.2975],
        [-0.8013,  1.5158, -0.3892],
        [-0.7580,  0.6995, -0.5530],
        [ 0.5651, -0.5556, -0.4149],
        [ 0.8197,  1.5269,  0.7209]], requires_grad=True)
tensor([ 0.4277, -0.8232, -0.2301], requires_grad=True)


In [None]:
z = torch.matmul(x, w)+b
print(z.requires_grad)

with torch.no_grad():
    z = torch.matmul(x, w)+b
print(z.requires_grad)

True
False


Another way to achieve the same result is to use the **detach()** method on the tensor:

In [None]:
z = torch.matmul(x, w)+b
z_det = z.detach()
print(z_det.requires_grad)

False


There are reasons you might want to disable gradient tracking:
- To mark some parameters in your neural network as frozen parameters.
- To speed up computations when you are only doing forward pass, because computations on tensors that do not track gradients would be more efficient.

# 3. Handling **torch.Tensor**

In PyTorch, the fundamental object used for almost all computational tasks is the Tensor. A **Tensor** is similar to NumPy's **ndarray** but with the **added capability to be used on a GPU for faster computations**. Tensors are multi-dimensional arrays and are at the core of PyTorch's design.

## a. Creating Tensors

From Lists

In [None]:
data = [[1, 2], [3, 4]]
tensor_from_data = torch.tensor(data)

print(tensor_from_data)

tensor([[1, 2],
        [3, 4]])


From NumPy Arrays

In [None]:
np_array = np.array(data)
print(np_array)

tensor_from_numpy = torch.from_numpy(np_array)
print(tensor_from_numpy)

[[1 2]
 [3 4]]
tensor([[1, 2],
        [3, 4]])


Using Built-in Functions

In [None]:
tensor_zeros = torch.zeros(3, 3)
tensor_ones = torch.ones(3, 3)
tensor_eye = torch.eye(3)  # Identity matrix
tensor_rand = torch.rand(3, 3)  # Uniform random numbers between 0 and 1

print(tensor_zeros)
print(tensor_ones)
print(tensor_eye)
print(tensor_rand)

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])
tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])
tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])
tensor([[0.2675, 0.4659, 0.0622],
        [0.1434, 0.9738, 0.0942],
        [0.3946, 0.4999, 0.8501]])


## b. Tensor Operations

**Arithmetic Operations**

Arithmetic operations in PyTorch are element-wise operations similar to those in NumPy.

In [None]:
x = torch.tensor([1, 2, 3])
y = torch.tensor([4, 5, 6])

# Element-wise addition
z1 = x + y  # or torch.add(x, y)
print(z1)

# Element-wise subtraction
z2 = x - y  # or torch.sub(x, y)
print(z2)

# Element-wise multiplication
z3 = x * y  # or torch.mul(x, y)
print(z3)

# Element-wise division
z4 = x / y  # or torch.div(x, y)
print(z4)

tensor([5, 7, 9])
tensor([-3, -3, -3])
tensor([ 4, 10, 18])
tensor([0.2500, 0.4000, 0.5000])


**Reduction Operations**

Reduction operations reduce the number of elements in a tensor.

In [None]:
x = torch.tensor([1, 2, 3, 4])

# Sum of all elements
sum_all = torch.sum(x)
print(sum_all)

# Mean of all elements (for float tensor)
mean_all = torch.mean(x.float())
print(mean_all)

# Max and Min values
max_val = torch.max(x).item()
min_val = torch.min(x).item()
print(max_val, min_val)

tensor(10)
tensor(2.5000)
4 1


**Matrix Operations**

Matrix operations are fundamental in deep learning, especially in neural network layers.



In [None]:
mat1 = torch.tensor([[1, 2], [3, 4]])
mat2 = torch.tensor([[2, 1], [1, 2]])

# Matrix multiplication
matmul_result = torch.mm(mat1, mat2)
print(matmul_result)

# Element-wise matrix multiplication
elementwise_mul = mat1 * mat2
print(elementwise_mul)

# Matrix transpose
transpose_mat = torch.t(mat1)
print(transpose_mat)

tensor([[ 4,  5],
        [10, 11]])
tensor([[2, 2],
        [3, 8]])
tensor([[1, 3],
        [2, 4]])


**Reshaping**

Reshaping allows you to change the shape (number of dimensions and size along each dimension) of a tensor.



In [None]:
x = torch.tensor([[1, 2], [3, 4]])

# Reshape to 4x1 tensor
# This stacks the rows of 'x' into a single column.
reshaped = x.view(4, 1)
print(reshaped)
print(reshaped.shape)

# Flatten the tensor
# This transforms 'x' into a 1D tensor by unrolling its values.
# The '-1' in view indicates to infer the size for that dimension based on the original tensor.
flattened = x.view(-1)
print(flattened)
print(flattened.shape)

tensor([[1],
        [2],
        [3],
        [4]])
torch.Size([4, 1])
tensor([1, 2, 3, 4])
torch.Size([4])


The reshape function is similar to **view** but provides more flexibility as it returns a new tensor with the desired shape. If the requested shape is compatible with the original tensor and no memory copy is needed, it will share the same data; otherwise, a copy will be made.

In [None]:
x_reshape = torch.tensor([1, 2, 3, 4, 5, 6])
reshaped_tensor = x_reshape.reshape(2, 3)
print(reshaped_tensor)
print(reshaped_tensor.shape)

tensor([[1, 2, 3],
        [4, 5, 6]])
torch.Size([2, 3])


**Squeeze** and **Unsqueeze**

**squeeze** function removes dimensions of size 1 from a tensor's shape. It's useful to reduce unnecessary dimensions.



In [None]:
x = torch.tensor([[1], [2], [3]])
print(x.shape)

squeezed_tensor = x.squeeze()
print(squeezed_tensor.shape)
print(x)
print(squeezed_tensor)

torch.Size([3, 1])
torch.Size([3])
tensor([[1],
        [2],
        [3]])
tensor([1, 2, 3])



 As the opposite of squeeze, **unsqueeze** function adds a dimension of size 1 to a tensor's shape at a specified position.

In [None]:
x = torch.tensor([1, 2, 3])
print(x.shape)

unsqueezed_tensor1 = x.unsqueeze(0)
print(unsqueezed_tensor1)
print(unsqueezed_tensor1.shape)

unsqueezed_tensor2 = x.unsqueeze(1)
print(unsqueezed_tensor2)
print(unsqueezed_tensor2.shape)

torch.Size([3])
tensor([[1, 2, 3]])
torch.Size([1, 3])
tensor([[1],
        [2],
        [3]])
torch.Size([3, 1])


**Concatenation**

You can concatenate multiple tensors along a specific dimension.

In [None]:
x = torch.tensor([1, 2, 3])
y = torch.tensor([4, 5, 6])

# Concatenate along dimension 0
concatenated_dim0 = torch.cat([x, y], dim=0)
print(concatenated_dim0)
print(concatenated_dim0.shape)

# If you have 2D tensors, you can concatenate along dimension 1 as well
x2 = x.unsqueeze(1)
y2 = y.unsqueeze(1)
print(x2, y2, sep='\n')
print(x2.shape, y2.shape, sep='\n')

concatenated_dim1 = torch.cat([x2, y2], dim=1)
print(concatenated_dim1)
print(concatenated_dim1.shape)

tensor([1, 2, 3, 4, 5, 6])
torch.Size([6])
tensor([[1],
        [2],
        [3]])
tensor([[4],
        [5],
        [6]])
torch.Size([3, 1])
torch.Size([3, 1])
tensor([[1, 4],
        [2, 5],
        [3, 6]])
torch.Size([3, 2])


**Indexing, Slicing, Joining, and Mutating**

These operations allow you to access and modify specific parts of a tensor.



In [None]:
x = torch.tensor([10, 20, 30, 40, 50])

# Indexing: Get the second element
second_element = x[1]
print(second_element)

# Slicing: Get the second and third elements
slice_2_3 = x[1:3]
print(slice_2_3)

# Joining: Stack tensors together
stacked = torch.stack([x, x])
print(stacked)

# Mutating: Change the first element of x to 100
x[0] = 100
print(x)

tensor(20)
tensor([20, 30])
tensor([[10, 20, 30, 40, 50],
        [10, 20, 30, 40, 50]])
tensor([100,  20,  30,  40,  50])


## c. Broadcasting

PyTorch supports broadcasting, a feature borrowed from NumPy. It allows PyTorch to work with arrays of different shapes when performing arithmetic operations.

### Examples

**Scalar and Tensor**

Adding a scalar to a tensor broadcasts the scalar across all elements of the tensor.

In [None]:
import numpy as np

a = np.random.randint(0,3,(3,1)) # (3,4)
b = np.random.randint(0,3,(4,)) # (1,4) -> (3,4)
print(a)
print(b)

print(a+b)

[[1]
 [0]
 [1]]
[1 1 1 2]
[[2 2 2 3]
 [1 1 1 2]
 [2 2 2 3]]


In [None]:
tensor = torch.tensor([1, 2, 3])
result = tensor + 2
print(result)

tensor([3, 4, 5])


**Tensors with Different Dimensions**

Adding a tensor of shape [3, 1] to a tensor of shape [3] will broadcast the second tensor across the rows of the first one.

In [None]:
tensor_a = torch.tensor([[1], [2], [3]]) # (3,1) -> (3, 3)
tensor_b = torch.tensor([4, 5, 6]) # (3,) -> (1,3) -> (3, 3)
print(tensor_a.shape, tensor_b.shape)

result = tensor_a + tensor_b
print(result)
print(result.shape)


torch.Size([3, 1]) torch.Size([3])
tensor([[5, 6, 7],
        [6, 7, 8],
        [7, 8, 9]])
torch.Size([3, 3])


**Tensors with Mismatched Dimensions**

Operations between tensors with mismatched dimensions where neither dimension is 1 will raise an error.

In [None]:
tensor_c = torch.tensor([[1, 2], [3, 4]]) #(2, 2)
tensor_d = torch.tensor([1, 2, 3]) #(3,)
print(tensor_c.shape, tensor_d.shape)

# This will raise an error: result = tensor_c + tensor_d
try:
    result = tensor_c + tensor_d
except RuntimeError as e:
    error_message = str(e)
    print(error_message)

torch.Size([2, 2]) torch.Size([3])
The size of tensor a (2) must match the size of tensor b (3) at non-singleton dimension 1


## d. Moving Tensors between CPU and GPU

In [None]:
import torch
print(torch.cuda.is_available())

True


**Moving a Tensor to GPU**

PyTorch uses CUDA (a parallel computing platform and application programming interface model created by NVIDIA) to enable GPU computations. To move a tensor to the GPU, the cuda() method is used.

In [None]:
# Create a tensor on CPU
tensor_cpu = torch.tensor([1, 2, 3])
print(tensor_cpu.device)

# Move the tensor to GPU (if CUDA is available)
if torch.cuda.is_available():
    tensor_gpu = tensor_cpu.cuda()
    print(tensor_gpu.device)

cpu
cuda:0


**Moving a Tensor to a Specific GPU**

In systems with multiple GPUs, they are numbered as 0, 1, 2, etc. You can select a specific GPU by setting its ID. Once the device is set to a specific GPU ID, you can move the tensor to that GPU using the to() method.

In [None]:
# Set the GPU ID (for this example, we'll set it to 0)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)

# Move the tensor to the specified GPU
tensor_gpu_specific = tensor_cpu.to(device)
print(tensor_gpu_specific.device)

cuda:0
cuda:0


**Moving a Tensor back to CPU**

Once the computations on the GPU are completed, you might want to move the results back to the CPU. This can be achieved using the cpu() method.

In [None]:
# Move the tensor back to CPU from GPU
tensor_cpu_again = tensor_gpu.cpu()
print(tensor_cpu_again.device)

cpu


If the tensors are not on the same device, you cannot perform operations on them.

In [None]:
# Assuming you have a CUDA-enabled GPU
if torch.cuda.is_available():
    # Create a tensor on the CPU
    tensor_cpu = torch.tensor([1, 2, 3])

    # Move the tensor to GPU
    tensor_gpu = tensor_cpu.cuda()

    # Attempt to add the CPU tensor and GPU tensor
    # This will raise an error because the tensors are on different devices
    try:
        result = tensor_cpu + tensor_gpu
    except RuntimeError as e:
        error_message = str(e)
        print(error_message)

Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!


# 4. Exercise

1. Create two tensors of size (3, 3) with random values. Multiply them element-wise.

2. Create a tensor **A** of size (5, 3) and another tensor **B** of size (3,). Add them.

3. Create two tensors of size (4, 3) and (3, 2). Perform matrix multiplication.

4. Create a tensor of size (2, 6). Reshape it to (3, 4).

5. Create a tensor **A** of size (5, 1) and another tensor **B** of size (5,). Perform element-wise multiplication.

6. Create two tensors of size (3, 4). Transpose the second tensor and perform matrix multiplication.

7. Create a tensor of size (3, 1) and expand its size to (3, 4). Sum the elements along the second dimension.

8. Create a tensor of size (6, 5). Calculate the mean along the second dimension.

9. Create a tensor of size (4, 4). Extract the second and fourth row.

10. Calculate the outer product of two tensors of size (5,) and (3,).

- *Search for the outer product function on Google!*