# Lab 1

Inspired from Berkeley STAT 157

To be submitted before 11 February 2025 on Canvas. s

1. Write all code in the notebook.
1. Write all text in the notebook. 
1. **Execute** the notebook and **save** the results.
1. To be safe, print the notebook as PDF and add it to the repository, too. Your repository should contain two files: ``your_name.ipynb`` and ``your_name.pdf``. 

## 0. Installing new packages

To install new package, simply type:

`!pip install package_1_name package_2_name`

Ex. `!pip install numpy scipy `

## 1. Speedtest for vectorization

Your goal is to measure the speed of linear algebra operations for different levels of vectorization. Using numpy to compute the following statements and record its compute time. 

1. Construct two matrices $A$ and $B$ with Gaussian random entries of size $4096 \times 4096$. 
1. Compute $C = A B$ using matrix-matrix operations and report the time. 
1. Compute $C = A B$, treating $A$ as a matrix but computing the result for each column of $B$ one at a time. Report the time.
1. Compute $C = A B$, treating $A$ and $B$ as collections of vectors. Report the time.


In [7]:
# 1. Construct two matrices $A$ and $B$ with Gaussian random entries of size $4096 \times 4096$. 

import numpy as np
import time
mu, sigma = 0, 0.1 
start_gen= time.time()
A = np.random.normal(mu, sigma, size=(4096, 4096))
B = np.random.normal(mu, sigma, size=(4096, 4096))
end_gen= time.time()
print("generation time: ", end_gen - start_gen)

generation time:  0.7756555080413818
multiplication time: 0.9666411876678467


In [8]:
# 1. Compute $C = A B$ using matrix-matrix operations and report the time. 
start = time.time()
C = np.dot(A, B)
end = time.time()
print('multiplication time:', end - start)

multiplication time: 0.8772711753845215


In [10]:
# 1. Compute $C = A B$, treating $A$ as a matrix but computing the result for each column of $B$ one at a time. Report the time.
C = np.empty((4096, 4096))
start_3 = time.time()
for i in range(4096):
    C[:, i] = np.dot(A, B[:, i])
end_3 = time.time()

print(f"Operation3 took: ", end_3-start_3)

Operation3 took:  -26.428006649017334


In [ ]:
# 1. Compute $C = A B$, treating $A$ and $B$ as collections of vectors. Report the time.
C = np.empty((4096, 4096))
start_4 = time.time()
for i in range(4096):
    for j in range(4096):
        C[i,j] = np.dot(A[i, :], B[:, j])
end_4 = time.time()
print("Operation4 took: ",  end_4 - start_4, " seconds")

## 2. Pytorch on CPUs/GPUs

1. Install GPU drivers (if needed)
1. Install Pytorch on a GPU/CPU instance (https://pytorch.org/)
1. Display `!nvidia-smi` (if running on GPU instance)
1. Create a $2 \times 2$ matrix on the GPU (if applicable) and print it. See https://pytorch.org/tutorials/beginner/basics/tensorqs_tutorial.html for details.

## 3. Speedtest for vectorization

1. Repeat #1 again using Pytorch

## 4. Pytorch and NumPy (for Challengers)

Your goal is to measure the speed penalty between Pytorch and Python when converting data between both. We are going to do this as follows:

1. Create two Gaussian random matrices $A, B$ of size $4096 \times 4096$ in NDArray. 
1. Compute a vector $\mathbf{c} \in \mathbb{R}^{4096}$ where $c_i = \|A B_{i\cdot}\|^2$ where $\mathbf{c}$ is a **NumPy** vector.

To see the difference in speed due to Python perform the following two experiments and measure the time:

1. Compute $\|A B_{i\cdot}\|^2$ one at a time and assign its outcome to $\mathbf{c}_i$ directly.
1. Use an intermediate storage vector $\mathbf{d}$ in Pytorch Tensor for assignments and copy to NumPy at the end.

## 5. Memory efficient computation (for Terminators)

We want to compute $C \leftarrow A \cdot B + C$, where $A, B$ and $C$ are all matrices. Implement this in the most memory efficient manner. Pay attention to the following two things:

1. Do not allocate new memory for the new value of $C$.
1. Do not allocate new memory for intermediate results if possible.