In [1]:
# 1. How does unsqueeze help us to solve certain broadcasting problems?

# Ans:
# The unsqueeze function in PyTorch helps solve certain broadcasting problems by adding a new dimension to a tensor.

# By using unsqueeze, you can increase the number of dimensions of a tensor at a specified position. This can be useful when aligning
# tensors for broadcasting, especially when dealing with tensors of different shapes.

# For example, if you have a 1-dimensional tensor a of shape (3,) and you want to perform elementwise operations with a 2-dimensional
# tensor b of shape (3, 4), you can use unsqueeze to add a new dimension to a to match the shape of b. The new shape of a would become'
# (3, 1), allowing it to align with b for broadcasting.

In [2]:
# 2. How can we use indexing to do the same operation as unsqueeze?

# Ans:
# We can use indexing with None or np.newaxis to achieve the same operation as unsqueeze in PyTorch or NumPy, respectively.

# By inserting None or np.newaxis at a specific index while indexing a tensor, we can effectively add a new dimension to the tensor 
# at that position. This helps in aligning tensors for broadcasting.

In [4]:
# 3. How do we show the actual contents of the memory used for a tensor?

# Sol:
!pip install torch
import torch

tensor = torch.tensor([1, 2, 3, 4, 5])

# Print the contents of the memory as a list
print(tensor.tolist())

# Print the contents of the memory as a NumPy array
print(tensor.numpy())


[1, 2, 3, 4, 5]
[1 2 3 4 5]


In [5]:
# 4. When adding a vector of size 3 to a matrix of size 3×3, are the elements of the vector added
# to each row or each column of the matrix? (Be sure to check your answer by running this code in a notebook.)

# Sol:

vector = torch.tensor([1, 2, 3])
matrix = torch.tensor([[4, 5, 6],
                       [7, 8, 9],
                       [10, 11, 12]])

result = vector + matrix
print(result)


tensor([[ 5,  7,  9],
        [ 8, 10, 12],
        [11, 13, 15]])


In [6]:
# 5. Do broadcasting and expand_as result in increased memory use? Why or why not?

# Ans:
# No, broadcasting and expand_as operations do not result in increased memory use.

# Both broadcasting and expand_as are memory-efficient operations that avoid unnecessary memory allocation. They operate on the original 
# tensors without creating additional copies or allocating new memory for the expanded or broadcasted tensors. Instead, they utilize
# memory sharing and indexing mechanisms to enable elementwise operations on tensors with different shapes.

# This memory efficiency is achieved by leveraging the underlying memory layout of the original tensors and performing the operations 
# on the fly, only when required. As a result, broadcasting and expand_as allow for efficient computation and memory utilization without
# incurring additional memory overhead.

In [8]:
# 6. Implement matmul using Einstein summation.

# Sol:

import numpy as np

def matmul_einsum(a, b):
    return np.einsum('ij, jk -> ik', a, b)

# Example usage
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

result = matmul_einsum(a, b)
print(result)

[[19 22]
 [43 50]]


In [9]:
# 7. What does a repeated index letter represent on the lefthand side of einsum?

# Ans:
# In Einstein summation notation, a repeated index letter on the left-hand side of einsum represents a summation or contraction
# operation over that index.

# When an index letter is repeated on the left-hand side, it indicates that the corresponding dimensions of the input arrays will be 
# multiplied elementwise, and then summed over that index.

# For example, in the Einstein summation notation 'ij, jk -> ik', the repeated index letter j indicates a summation or contraction 
# operation. It means that the elements along the shared dimension j will be multiplied and summed, resulting in a new array with 
# dimensions specified by the remaining non-repeated indices, i and k

In [10]:
# 8. What are the three rules of Einstein summation notation? Why?

# Ans:
# The three rules of Einstein summation notation are:

# Repeated Index: When an index appears twice in an expression (e.g., 'ij, jk -> ik'), it implies a summation or contraction over that 
# index.

# Index Ordering: The order of indices matters. The order on the right-hand side of the '->' arrow determines the order of indices on 
# the left-hand side.

# Index Range: Each index must have a defined range or size corresponding to the dimensions of the input arrays. The range of an index
# is determined by the size of the corresponding dimension of the input arrays involved in the operation.

In [12]:
# 9. What are the forward pass and backward pass of a neural network?

# Ans:
# The forward pass of a neural network refers to the process of propagating input data through the network's layers, from the input 
# layer to the output layer. During the forward pass, each layer performs its computations, applying activation functions and weight 
# operations, to generate an output prediction or representation.

# The backward pass, also known as backpropagation, is the process of computing gradients of the network's parameters with respect to a 
# loss function. It involves propagating the error or loss backward through the network, updating the weights and biases of each layer
# based on the computed gradients using optimization algorithms such as gradient descent. The backward pass allows the network to learn 
# and adjust its parameters to minimize the loss and improve its predictions over time through gradient-based optimization.

In [13]:
# 10. Why do we need to store some of the activations calculated for intermediate layers in the forward pass?

# Ans:
# Storing activations calculated for intermediate layers in the forward pass is necessary because they are needed for the backward pass
# during backpropagation.

# During backpropagation, gradients are computed by propagating the error backwards through the network. This process requires the
# intermediate activations, which are the outputs of the layers during the forward pass, to compute the gradients of the loss with 
# respect to the weights and biases of each layer.

# By storing the intermediate activations, we can efficiently compute the gradients and update the network's parameters. These activations
# serve as crucial information for calculating the gradients accurately and efficiently, ensuring effective learning and optimization of
# the neural network.

In [14]:
# 11. What is the downside of having activations with a standard deviation too far away from 1?

# Ans:
# The downside of having activations with a standard deviation too far away from 1 is that it can lead to issues with the optimization 
# process during training.

# If the standard deviation of the activations is too high, it can cause exploding gradients, where the gradients become extremely large.
# This can result in unstable training, making it difficult for the network to converge to an optimal solution. It can lead to erratic 
# updates of the network's parameters and slower convergence or even divergence of the training process.

# On the other hand, if the standard deviation of the activations is too low, it can cause vanishing gradients, where the gradients become
# extremely small. This can hinder the flow of gradient information backward through the network during backpropagation. The network may
# struggle to learn effectively, especially in deeper architectures, as the gradients diminish rapidly with each layer, leading to slower
# or no learning.

# Therefore, it is desirable to have activations with a standard deviation close to 1, as it helps maintain a reasonable magnitude of 
# gradients during training, leading to stable and efficient optimization.

In [15]:
# 12. How can weight initialization help avoid this problem?

# Ans:
# Weight initialization can help avoid the problem of activations with a standard deviation too far from 1 by setting an appropriate 
# initial range for the weights.

# By initializing the weights of the neural network properly, we can promote activations that are neither too high nor too low in
# magnitude. This can help prevent the issues of exploding or vanishing gradients during training.