1. Write the Python code to implement a single neuron.


In [1]:
import numpy as np

class Neuron:
    def __init__(self, num_inputs):
        self.weights = np.random.randn(num_inputs)
        self.bias = np.random.randn()

    def forward(self, inputs):
        return np.dot(inputs, self.weights) + self.bias


2. Write the Python code to implement ReLU.


In [2]:
import numpy as np

def relu(x):
    return np.maximum(0, x)


In [None]:
3. Write the Python code for a dense layer in terms of matrix multiplication.


In [3]:
import numpy as np

class DenseLayer:
    def __init__(self, input_size, output_size):
        self.weights = np.random.randn(input_size, output_size)
        self.biases = np.zeros(output_size)
    
    def forward(self, X):
        return np.dot(X, self.weights) + self.biases


In [None]:
4. Write the Python code for a dense layer in plain Python (that is, with list comprehensions
and functionality built into Python).


In [4]:
def dense_layer(inputs, weights, biases):
    """Compute the output of a dense layer using matrix multiplication."""
    # Matrix multiplication
    outputs = [[sum(x * y for x, y in zip(row, col)) for col in zip(*weights)] for row in inputs]
    # Add biases
    outputs = [[x + b for x, b in zip(row, biases)] for row in outputs]
    # Apply activation function (if any)
    outputs = [[max(0, x) for x in row] for row in outputs]  # ReLU activation function
    return outputs


5. What is the “hidden size” of a layer?


The "hidden size" of a layer refers to the number of neurons in that layer. It is called "hidden" because it is not part of the input or output layer and is not directly observable. Instead, the hidden layer processes the input data and passes it on to the output layer, which produces the final output. The hidden size is a hyperparameter that can be adjusted during model design and training to improve model performance.

6. What does the t method do in PyTorch?


In [6]:
pip install torch


Collecting torch
  Downloading torch-2.0.0-cp310-cp310-manylinux1_x86_64.whl (619.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m619.9/619.9 MB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting nvidia-cuda-cupti-cu11==11.7.101
  Downloading nvidia_cuda_cupti_cu11-11.7.101-py3-none-manylinux1_x86_64.whl (11.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.8/11.8 MB[0m [31m76.8 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting nvidia-cuda-runtime-cu11==11.7.99
  Downloading nvidia_cuda_runtime_cu11-11.7.99-py3-none-manylinux1_x86_64.whl (849 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m849.3/849.3 kB[0m [31m61.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting nvidia-cudnn-cu11==8.5.0.96
  Downloading nvidia_cudnn_cu11-8.5.0.96-2-py3-none-manylinux1_x86_64.whl (557.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m557.1/557.1 MB[0m [31m2.6 MB/s[0m eta [36m0:00:

In [7]:
import torch

x = torch.tensor([[1, 2, 3], [4, 5, 6]])
print(x.t()) # tensor([[1, 4], [2, 5], [3, 6]])


tensor([[1, 4],
        [2, 5],
        [3, 6]])


7. Why is matrix multiplication written in plain Python very slow?


Matrix multiplication written in plain Python is slow because it involves iterating through nested loops to perform the multiplication of each element of the matrices, resulting in a time complexity of O(n^3), where n is the size of the matrix. This is not efficient for large matrices, as the number of operations required grows very quickly. On the other hand, optimized libraries like BLAS (Basic Linear Algebra Subprograms) and cuBLAS (CUDA Basic Linear Algebra Subprograms) use highly optimized algorithms and parallel processing to perform matrix multiplication much faster than plain Python.

8. In matmul, why is ac==br?


In matmul, the condition `ac == br` is required for matrix multiplication to be defined mathematically. 

If we multiply a matrix A of size `(a, b)` with a matrix B of size `(c, d)`, then we can only perform the multiplication if the inner dimensions of the matrices match, that is, if `b == c`. In this case, the resulting matrix will have size `(a, d)`. 

So in the condition `ac == br`, `ac` represents the product of the number of rows of the first matrix and the number of columns of the second matrix, and `br` represents the product of the number of columns of the first matrix and the number of rows of the second matrix. If these two products are equal, then the matrices can be multiplied.

9. In Jupyter Notebook, how do you measure the time taken for a single cell to execute?


You can use the magic command %timeit at the beginning of the cell to measure the time taken to execute the code inside the cell. The command will run the code a few times to get an accurate measurement of the average execution time.

10. What is elementwise arithmetic?


Elementwise arithmetic refers to performing arithmetic operations between two matrices or tensors by applying the operation between the corresponding elements of the matrices or tensors. For example, given two matrices A and B, elementwise addition would result in a new matrix C, where each element of C is the sum of the corresponding elements of A and B. Similarly, elementwise multiplication would result in a new matrix D, where each element of D is the product of the corresponding elements of A and B. This operation is also known as Hadamard product or Schur product.

11. Write the PyTorch code to test whether every element of a is greater than the
corresponding element of b.


In [9]:
import torch

a = torch.tensor([1, 2, 3])
b = torch.tensor([0, 2, 2])

greater_than_b = torch.gt(a, b)

print(greater_than_b)


tensor([ True, False,  True])


12. What is a rank-0 tensor? How do you convert it to a plain Python data type?


A rank-0 tensor is a tensor with no dimensions, also called a scalar. In PyTorch, it is represented as a tensor with an empty shape, i.e., `torch.tensor(42)` would be a rank-0 tensor representing the scalar value 42.

To convert a rank-0 tensor to a plain Python data type, you can use the `item()` method, like this:



In [11]:
x = torch.tensor(42)
y = x.item()  # y is now the Python integer 42


13. How does elementwise arithmetic help us speed up matmul?


Elementwise arithmetic helps us speed up matmul by allowing us to perform the multiplication of corresponding elements in two tensors, rather than performing the full matrix multiplication. This can be done much more efficiently using vectorized operations in hardware, such as GPUs. Once we have performed the elementwise multiplication, we can sum the resulting tensors along appropriate axes to get the final result of the matrix multiplication. This approach can be much faster than performing the full matrix multiplication, especially for large matrices.

14. What are the broadcasting rules?


Broadcasting is a feature in NumPy and PyTorch that allows arrays of different shapes to be used in arithmetic operations. The broadcasting rules are as follows:

1. If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side.
2. If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.
3. If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

For example, if we have two arrays `a` and `b`, where `a` is of shape `(3, 1)` and `b` is of shape `(1, 4)`, we can add them together as follows:

```
a = np.array([[1], [2], [3]])
b = np.array([[4, 5, 6, 7]])

c = a + b

# c is now:
# array([[5, 6, 7, 8],
#        [6, 7, 8, 9],
#        [7, 8, 9, 10]])
```

Here, the shape of `a` is padded with a 1 on its left to become `(3, 1)`, and the shape of `b` is padded with a 1 on its left to become `(1, 4)`. Then, both arrays are broadcast to shape `(3, 4)` by stretching the size-1 dimensions, and the addition is performed elementwise.

15. What is expand_as? Show an example of how it can be used to match the results of
broadcasting.



In PyTorch, expand_as is a method that returns a new tensor with the same data as the input tensor, but with the specified shape. It is used to match the shapes of two tensors so that broadcasting can be performed