# Questions (01-04 notebooks)

## 1. Write a Python Code to implement a single neuron.

In [1]:
import torch

x = torch.randn(1, 10)
w = torch.randn(1, 10)
b = torch.randn(1)

neuron = x @ w.t() + b
neuron

tensor([[-1.5776]])

## 2. Write the Python code to implement ReLU.

In [2]:
def ReLU(x):
    return x.clamp_min(0.)

ReLU(x), x

(tensor([[0.0000, 0.0000, 0.9725, 0.3088, 0.0000, 1.4561, 0.0000, 0.0000, 0.0000,
          0.0000]]),
 tensor([[-0.1179, -0.4141,  0.9725,  0.3088, -0.0642,  1.4561, -0.8060, -0.9297,
          -0.3240, -1.4377]]))

## 3. Write the Python code for a dense layer in terms of matrix multiplication.

In [43]:
x = torch.randn(1, 10)
w1 = torch.randn(5, 10)
b1 = torch.randn(5)

def linear(x, w, b):
    return x @ w.t() + b

layer1 = ReLU(linear(x, w1, b1))
layer1

tensor([0.0000, 1.6669, 0.0000, 1.2285, 0.0000])

## 4. Write the Python code for a dense layer in plain Python.

**Matrix multiplication:**

$$c_{ij} = a_{i1}b_{1j} + a_{i2}b_{2j} +\cdots + a_{in}b_{nj}= \sum_{k=1}^n a_{ik}b_{kj}$$

**ReLU**

$$f(x) = x^+ = \max(0, x)$$

In [44]:
def ReLU(x):
    r = torch.zeros(x.shape[1])
    for i, e in enumerate(x.squeeze()):
        r[i] = e.item() if e.item() >= 0 else 0.
    return r

def linear(x, w, b):
    return matmul(x, w.t()) + b

def matmul(a, b):
    ar, ac = a.shape[0], a.shape[1]
    br, bc = b.shape[0], b.shape[1]
    c = torch.zeros(ar, bc)
    assert ac==br  
    for i in range(ar):
        for j in range(bc):
            for k in range(br):
                c[i, j] += a[i, k] * b[k, j]
    return c

layer1 = ReLU(linear(x, w1, b1))
layer1

tensor([0.0000, 1.6669, 0.0000, 1.2285, 0.0000])

## 5. What is the hidden size of a layer?

## 6. What does the t method do in PyTorch

t equals Transpose:

In [5]:
x = torch.zeros(2, 5)
x, x.t()

(tensor([[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]]),
 tensor([[0., 0.],
         [0., 0.],
         [0., 0.],
         [0., 0.],
         [0., 0.]]))

## 7. Why is matrix multiplication written in plain Python very slow?

Python is slow. Without optimization we stay depending of Python *nested Loops*, that's not fast.

## 8. In matmul, why is ac==br?

Numbers of *columns* of array A must be equals numbers of *rows* on array B. 

To do a matrix multiplication we need multiply each entire **A column** with entire **B row**, and to multiply 2 vectors they must have same size.

## 9. In Jupyter Notebook, how do you measure the time taken for a single cell to execute?

Using the 'magic' command **%time**

## 10. What is elementwise arithmetic?

Is the application of some arithmetic operation in each element of an array with the another one.

Works on tensors of any rank, as long as they have the same shape.

## 11. Write the PyTorch code to test whether every element of *a* is greater than the corresponding element of b.

In [8]:
a = torch.randn(5)
b = torch.randn(5)
a, b, a > b

(tensor([ 0.8069,  0.4581,  0.6376, -0.2375,  0.3406]),
 tensor([-1.1001, -1.2028,  1.8080, -0.3195, -0.3702]),
 tensor([ True,  True, False,  True,  True]))

## 12. What is a rank-0 tensor? How do you convert it to plain Python data type?

rank-0 tensor with a tensor without shape and dimension equals 0.

For convert to plain Python data type we use the **.item()** method.

In [23]:
a = torch.randn(())
a, a.ndim, a.shape, a.item()

(tensor(-0.1598), 0, torch.Size([]), -0.15981844067573547)

## 13. What does this return, and why?

tensor([1, 2]) + tensor([1])

Sum element 1 with each element of [1, 2], returning tensor([2, 3]).

Using broadcasting, the Tensor([1]) are expanded to Tensor([1, 1]) and the arithmetic is applied (eq: Tensor([1, 2]) + Tensor([1, 1])).

PyTorch doesn't create three copies of [1] in memory.

In [26]:
torch.Tensor([1, 2]) + torch.Tensor([1])

tensor([2., 3.])

## 14. What does this return, and why?

tensor([1, 2]) + tensor([1, 2, 3])

This didn't work. To broadcast, the number of elements of (b) must be equals 1 or equals the (a) tensor.

## 15. How does elementwise arithmetic help us speed up matmul?

With elementwise we can remove one of three nested loops summing all i-th and j-th elements.

In [45]:
def matmul(a, b):
    ar, ac = a.shape[0], a.shape[1]
    br, bc = b.shape[0], b.shape[1]
    c = torch.zeros(ar, bc)
    assert ac==br  
    for i in range(ar):
        for j in range(bc):
                c[i, j] = torch.sum(a[i, :] * b[:, j])
    return c

def linear(x, w, b):
    return matmul(x, w.t()) + b

layer1 = ReLU(linear(x, w1, b1))
layer1

tensor([0.0000, 1.6669, 0.0000, 1.2285, 0.0000])

## 16. What are the broadcasting rules?

Dimensions are compatible when:
1. Have the same size, or
2. one of the is 1

A (3d array): 28 x 28 x 3
B (1d array): 28 x 1  x 3
Result:       28 x 28 x 3 (works)

A (3d array): 28 x 28 x 3
B (1d array): 28 x 3  x 3
Result:       28 x 28 x 3 (don't work)

## 17. What is expand_as? Show an example of how it can be used to match the results of broadcasting.

## 18. How doees unsqueeze help us to solve certain broadcasting problems?

## 19. How can we use indexing to do the same operation as unsqueeze?