### 1. How does `unsqueeze` help us to solve certain broadcasting problems?

The `unsqueeze` operation adds a singleton dimension to a tensor, which can help in aligning the dimensions of tensors for broadcasting. By introducing an extra axis with size 1, you make it possible for PyTorch or NumPy to successfully broadcast to the desired shape.

---

### 2. How can we use indexing to do the same operation as `unsqueeze`?

You can add a `None` index to introduce a new axis into the tensor. For example, if `a` is a 1D tensor of shape `(n,)`, then `a[:, None]` would introduce a new axis, making the shape `(n, 1)`.

```python
import numpy as np
a = np.array([1, 2, 3])
b = a[:, None]
```

---

### 3. How do we show the actual contents of the memory used for a tensor?

You can use the `.numpy()` method in PyTorch to convert a tensor to a NumPy array and inspect the underlying data. In NumPy, you can simply print the array.

```python
import torch

a = torch.tensor([1, 2, 3])
print(a.numpy())
```

---

### 4. When adding a vector of size 3 to a matrix of size 3Ã—3, are the elements of the vector added to each row or each column of the matrix?

The elements of the vector are added to each row of the matrix.

```python
import numpy as np

vector = np.array([1, 2, 3])
matrix = np.array([[4, 5, 6],
                   [7, 8, 9],
                   [10, 11, 12]])

result = matrix + vector  # Broadcasting happens along rows
print(result)
```

---

### 5. Do broadcasting and `expand_as` result in increased memory use? Why or why not?

No, they don't usually result in increased memory usage. Broadcasting performs the arithmetic operation without actually duplicating the data in memory. `expand_as` in PyTorch also returns a new tensor that has the same data as its input tensor but with a different size, without actually allocating new memory.

---

### 6. Implement `matmul` using Einstein summation.

```python
import numpy as np

def matmul_einsum(A, B):
    return np.einsum('ij,jk->ik', A, B)
```

---

### 7. What does a repeated index letter represent on the lefthand side of `einsum`?

In Einstein summation notation, a repeated index letter on the lefthand side represents a contraction over that index. This effectively means performing a sum over that index.

---

### 8. What are the three rules of Einstein summation notation? Why?

1. Each index can appear at most twice in any term.
2. Each term must have the same indices.
3. Any index appearing exactly once in each term is a "free index" and represents a dimension in the output. An index appearing twice is a "dummy index" and is summed over.

These rules provide a concise way to express complex array manipulations and contractions.

---

### 9. What are the forward pass and backward pass of a neural network?

- Forward Pass: Computation flows from input to output, calculating the prediction of the neural network for a given input.
  
- Backward Pass: The process of backpropagation computes the gradient of the loss function with respect to each parameter by applying the chain rule of calculus.

---

### 10. Why do we need to store some of the activations calculated for intermediate layers in the forward pass?

These stored activations are needed during the backward pass to compute gradients. Without them, you would have to recompute them, which would be inefficient.

---

### 11. What is the downside of having activations with a standard deviation too far away from 1?

Having a standard deviation too far from 1 can lead to issues like vanishing or exploding gradients, which in turn can make the neural network difficult to train effectively.

---

### 12. How can weight initialization help avoid this problem?

Proper weight initialization helps in maintaining the variance of activations and gradients as they pass through layers, thereby mitigating issues like vanishing or exploding gradients. Methods like Xavier (Glorot) initialization or He initialization are commonly used for this purpose.