# **Assignment 1**

---
## 📚 **Introduction**
This template notebook is designed to guide you through Assignment 1 of the LLM System course. Follow the steps to set up your environment, implement the required functions, and test your code.

### 🚀 **Goal of the Assignment**
You will implement a basic deep learning framework, miniTorch, capable of performing operations on tensors with automatic differentiation and necessary operators. You will use this framework to build a simple feedforward neural network for a sentiment classification task.

---

## ⚙️ **Environment Setup**
First, ensure that you have changed the runtime to **T4 GPU**. Run the following commands to set up your environment.

In [27]:
# Clone the starter code repository
# !git clone https://github.com/llmsystem/llmsys_s25_hw1.git
!git clone https://github.com/lindafei01/llmsys_s25_hw1.git
%cd llmsys_s25_hw1

Cloning into 'llmsys_s25_hw1'...
remote: Enumerating objects: 218, done.[K
remote: Counting objects: 100% (218/218), done.[K
remote: Compressing objects: 100% (102/102), done.[K
remote: Total 218 (delta 100), reused 211 (delta 93), pack-reused 0 (from 0)[K
Receiving objects: 100% (218/218), 131.33 KiB | 734.00 KiB/s, done.
Resolving deltas: 100% (100/100), done.
/content/llmsys_s25_hw1/llmsys_s25_hw1/llmsys_s25_hw1/llmsys_s25_hw1


In [2]:
# Install dependencies
!python -m pip install -r requirements.txt
!python -m pip install -r requirements.extra.txt
!python -m pip install -Ue .

Collecting colorama==0.4.3 (from -r requirements.txt (line 1))
  Downloading colorama-0.4.3-py2.py3-none-any.whl.metadata (14 kB)
Collecting hypothesis==6.54 (from -r requirements.txt (line 2))
  Downloading hypothesis-6.54.0-py3-none-any.whl.metadata (6.1 kB)
Collecting mypy==0.971 (from -r requirements.txt (line 3))
  Downloading mypy-0.971-py3-none-any.whl.metadata (1.8 kB)
Collecting numba==0.58.1 (from -r requirements.txt (line 4))
  Downloading numba-0.58.1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.7 kB)
Collecting numpy==1.23.5 (from -r requirements.txt (line 5))
  Downloading numpy-1.23.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.3 kB)
Collecting pre-commit==2.20.0 (from -r requirements.txt (line 6))
  Downloading pre_commit-2.20.0-py2.py3-none-any.whl.metadata (1.8 kB)
Collecting pytest==7.1.2 (from -r requirements.txt (line 7))
  Downloading pytest-7.1.2-py3-none-any.whl.metadata (7.8 kB)
Collecting pytest-env (from -r

---

## 🔧 **CUDA Kernel Compilation**
You will need to compile the CUDA kernels for this assignment. Run the following command to create the necessary directory and compile the CUDA files.

---

In [28]:
# Compile CUDA kernels
!mkdir -p minitorch/cuda_kernels
!nvcc -o minitorch/cuda_kernels/combine.so --shared src/combine.cu -Xcompiler -fPIC
# nvcc: NVIDIA CUDA编译器，用于编译CUDA C/C++代码，将CUDA代码转化为GPU可执行代码
# fPIC: 生成位置无关代码，代码的执行不依赖于它在内存中的具体位置，可以被加载到内存中的任意位置都能正常运行。为什么需要位置无关代码：多个程序可能同时使用同一个共享库，每个程序加载共享库的位置可能不同，
# fPIC工作原理：使用相对地址而不是绝对地址；通过全局偏移表访问全局变量；通过程序链接表处理函数调用

## 📋 **Assignment Sections**

### 🧮 **Problem 1: Automatic Differentiation**
**Goal:** Implement the functions `topological_sort` and `backpropagate` in `minitorch/autodiff.py`.

🔧 **Instructions:**
1. Navigate to `minitorch/autodiff.py`.
2. Locate the placeholders marked with `BEGIN ASSIGN1_1` and `END ASSIGN1_1`.
3. Implement the functions based on the assignment description.

**Testing:**
Run the following command to test your implementation.

```python
!python -m pytest -l -v -k "autodiff"
```

---

In [4]:
!pip uninstall -y datasets
!pip install "datasets<2.0.0"

Found existing installation: datasets 2.4.0
Uninstalling datasets-2.4.0:
  Successfully uninstalled datasets-2.4.0
Collecting datasets<2.0.0
  Downloading datasets-1.18.4-py3-none-any.whl.metadata (22 kB)
Downloading datasets-1.18.4-py3-none-any.whl (312 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m312.1/312.1 kB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: datasets
Successfully installed datasets-1.18.4


In [5]:
!pip install hypothesis



In [6]:
!pip install embeddings



In [7]:
!pip install pycuda



In [11]:
# Problem 1: Autodiff Tests

# TODO: Implement the functions in minitorch/autodiff.py
# topological_sort and backpropagate functions

!python -m pytest -l -v -k "autodiff"


platform linux -- Python 3.11.11, pytest-7.1.2, pluggy-1.5.0 -- /usr/bin/python3
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/content/llmsys_s25_hw1/llmsys_s25_hw1/.hypothesis/examples')
rootdir: /content/llmsys_s25_hw1/llmsys_s25_hw1, configfile: setup.cfg
plugins: hypothesis-6.54.0, env-0.6.2, langsmith-0.3.6, anyio-3.7.1, typeguard-4.4.1
collected 110 items / 72 deselected / 38 selected                                                  [0m

tests/test_tensor_autodiff.py::test_create [32mPASSED[0m[32m                                            [  2%][0m
tests/test_tensor_autodiff.py::test_topo_case1 [32mPASSED[0m[32m                                        [  5%][0m
tests/test_tensor_autodiff.py::test_topo_case2 [32mPASSED[0m[32m                                        [  7%][0m
tests/test_tensor_autodiff.py::test_one_derivative[fn0] [32mPASSED[0m[32m                               [ 10%][0m
tests/test_tensor_autodiff.p

---

### 🚀 **Problem 2: CUDA Backend Implementation**

In this task, you will implement operators for the CUDA backend by creating efficient GPU-based functions in `src/combine.cu` and connecting them to the Python code in `minitorch/cuda_kernel_ops.py`.

#### ✅ **Step 1: Compile the CUDA Kernels**
Each time you make changes to the CUDA code, recompile it using the following command:
```bash
!nvcc -o minitorch/cuda_kernels/combine.so --shared src/combine.cu -Xcompiler -fPIC
```

#### 🧑‍💻 **Step 2: Implement CUDA Functions in Python**
Update the following functions in `minitorch/cuda_kernel_ops.py` to load the compiled CUDA functions:

```python
class CudaKernelOps(TensorOps):
    @staticmethod
    def zip(fn: Callable[[float, float], float]):
        # Implement zip function using CUDA kernel
        ...

    @staticmethod
    def reduce(fn: Callable[[float, float], float], start: float = 0.0):
        # Implement reduce function using CUDA kernel
        ...

    @staticmethod
    def matrix_multiply(a: Tensor, b: Tensor) -> Tensor:
        # Implement matrix multiplication using CUDA kernel
        ...
```

#### 🧪 **Step 3: Run Tests to Verify Your Implementation**
Use the following commands to verify your functions:

- **Run all CUDA tests:**
  ```bash
  !python -m pytest -l -v -k "cuda"
  ```

- **Run specific tests for each CUDA function:**
  - Map function:  
    ```bash
    !python -m pytest -l -v -k "cuda_one"
    ```
  - Zip function:  
    ```bash
    !python -m pytest -l -v -k "cuda_two"
    ```
  - Reduce function:  
    ```bash
    !python -m pytest -l -v -k "cuda_reduce"
    ```
  - Matrix multiplication:  
    ```bash
    !python -m pytest -l -v -k "cuda_matmul"
    ```


In [14]:
# Check CUDA version match
!nvcc --version
!nvidia-smi

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Jun__6_02:18:23_PDT_2024
Cuda compilation tools, release 12.5, V12.5.82
Build cuda_12.5.r12.5/compiler.34385749_0
Mon Feb 10 22:14:41 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   43C    P8             11W /   70W |       0MiB /  15360MiB |      0%      Default |
|                       

In [15]:
# Change CUDA version
# Remove current CUDA version
!apt-get remove --purge cuda-* nvidia-* -y
!apt-get autoremove -y
!apt-get clean
!rm -rf /usr/local/cuda*

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Note, selecting 'cuda-toolkit-12-4-config-common' for glob 'cuda-*'
Note, selecting 'cuda-cudart-dev-11-0' for glob 'cuda-*'
Note, selecting 'cuda-cudart-dev-11-1' for glob 'cuda-*'
Note, selecting 'cuda-cudart-dev-11-7' for glob 'cuda-*'
Note, selecting 'cuda-cudart-dev-11-8' for glob 'cuda-*'
Note, selecting 'cuda-toolkit-12-5-config-common' for glob 'cuda-*'
Note, selecting 'cuda-cudart-dev-12-0' for glob 'cuda-*'
Note, selecting 'cuda-cudart-dev-12-1' for glob 'cuda-*'
Note, selecting 'cuda-cudart-dev-12-2' for glob 'cuda-*'
Note, selecting 'cuda-cudart-dev-12-3' for glob 'cuda-*'
Note, selecting 'cuda-cudart-dev-12-4' for glob 'cuda-*'
Note, selecting 'cuda-cudart-dev-12-5' for glob 'cuda-*'
Note, selecting 'cuda-cudart-dev-12-6' for glob 'cuda-*'
Note, selecting 'cuda-cudart-dev-12-8' for glob 'cuda-*'
Note, selecting 'cuda-toolkit-12-6-config-common' for glob 'cuda-*'
Note, selecting

In [16]:
# Install 12.4
!apt-get install -y cuda-toolkit-12-4

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  cuda-cccl-12-4 cuda-command-line-tools-12-4 cuda-compiler-12-4 cuda-crt-12-4 cuda-cudart-12-4
  cuda-cudart-dev-12-4 cuda-cuobjdump-12-4 cuda-cupti-12-4 cuda-cupti-dev-12-4 cuda-cuxxfilt-12-4
  cuda-documentation-12-4 cuda-driver-dev-12-4 cuda-gdb-12-4 cuda-libraries-12-4
  cuda-libraries-dev-12-4 cuda-nsight-12-4 cuda-nsight-compute-12-4 cuda-nsight-systems-12-4
  cuda-nvcc-12-4 cuda-nvdisasm-12-4 cuda-nvml-dev-12-4 cuda-nvprof-12-4 cuda-nvprune-12-4
  cuda-nvrtc-12-4 cuda-nvrtc-dev-12-4 cuda-nvtx-12-4 cuda-nvvm-12-4 cuda-nvvp-12-4 cuda-opencl-12-4
  cuda-opencl-dev-12-4 cuda-profiler-api-12-4 cuda-sanitizer-12-4 cuda-toolkit-12-4-config-common
  cuda-tools-12-4 cuda-visual-tools-12-4 default-jre default-jre-headless fonts-dejavu-core
  fonts-dejavu-extra gds-tools-12-4 libatk-wrapper-java libatk-wrapper-java-jni libcublas-12-4
  libcu

In [17]:
# Check versions
!ls /usr/local | grep cuda

cuda
cuda-12
cuda-12.4


In [24]:
!nvcc -o minitorch/cuda_kernels/combine.so --shared src/combine.cu -Xcompiler -fPIC

In [25]:
# Problem 2: Cuda Kernel Tests
!python -m pytest -l -v -k "cuda"

# Uncomment the following command if you want to separately test the four abstraction functions
# !python -m pytest -l -v -k "cuda_one"    # map
# !python -m pytest -l -v -k "cuda_two"    # zip
# !python -m pytest -l -v -k "cuda_reduce" # reduce
# !python -m pytest -l -v -k "cuda_matmul" # matrix multiplication


platform linux -- Python 3.11.11, pytest-7.1.2, pluggy-1.5.0 -- /usr/bin/python3
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/content/llmsys_s25_hw1/llmsys_s25_hw1/llmsys_s25_hw1/.hypothesis/examples')
rootdir: /content/llmsys_s25_hw1/llmsys_s25_hw1/llmsys_s25_hw1, configfile: setup.cfg
plugins: hypothesis-6.54.0, env-0.6.2, langsmith-0.3.6, anyio-3.7.1, typeguard-4.4.1
collected 110 items / 43 deselected / 67 selected                                                  [0m

tests/test_tensor_general.py::test_create[cuda] [32mPASSED[0m[32m                                       [  1%][0m
tests/test_tensor_general.py::test_cuda_one_args[cuda-fn0] [32mPASSED[0m[32m                            [  2%][0m
tests/test_tensor_general.py::test_cuda_one_args[cuda-fn1] [32mPASSED[0m[32m                            [  4%][0m
tests/test_tensor_general.py::test_cuda_one_args[cuda-fn2] [32mPASSED[0m[32m                            [  5%][0

### 🧠 **Problem 3: Neural Network Architecture**
**Goal:** Implement the `Linear` and `Network` classes in `project/run_sentiment.py`.

🔧 **Instructions:**
1. Navigate to `project/run_sentiment.py`.
2. Locate the placeholders marked with `BEGIN ASSIGN1_3` and `END ASSIGN1_3`.
3. Implement the functions as per the assignment description.

**Testing:**
Run the following command to test your neural network implementation.

```python
!python -m pytest -l -v -k "network"
```

---

In [29]:
# TODO: Implement the Linear and Network classes in project/run_sentiment.py

!python -m pytest -l -v -k "network"

platform linux -- Python 3.11.11, pytest-7.1.2, pluggy-1.5.0 -- /usr/bin/python3
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/content/llmsys_s25_hw1/llmsys_s25_hw1/llmsys_s25_hw1/llmsys_s25_hw1/.hypothesis/examples')
rootdir: /content/llmsys_s25_hw1/llmsys_s25_hw1/llmsys_s25_hw1/llmsys_s25_hw1, configfile: setup.cfg
plugins: hypothesis-6.54.0, env-0.6.2, langsmith-0.3.6, anyio-3.7.1, typeguard-4.4.1
collected 110 items / 105 deselected / 5 selected                                                  [0m

tests/test_neural_network.py::test_Linear_1 [32mPASSED[0m[32m                                           [ 20%][0m
tests/test_neural_network.py::test_Linear_2 [32mPASSED[0m[32m                                           [ 40%][0m
tests/test_neural_network.py::test_Network_1 [32mPASSED[0m[32m                                          [ 60%][0m
tests/test_neural_network.py::test_Network_2 [32mPASSED[0m[32m                     

### 📈 **Problem 4: Training and Evaluation**
**Goal:** Implement the training and validation loop in the `SentenceSentimentTrain` class in `project/run_sentiment.py`.

🔧 **Instructions:**
1. Navigate to `project/run_sentiment.py`.
2. Locate the placeholders marked with `BEGIN ASSIGN1_4` and `END ASSIGN1_4`.
3. Complete the code for training and validation.

**Testing:**
Run the following command to start training and see the validation results.

```python
!python project/run_sentiment.py
```

---

In [30]:
# TODO: Implement the training loop in SentenceSentimentTrain class in project/run_sentiment.py
!python project/run_sentiment.py

Downloading:   0% 0.00/7.78k [00:00<?, ?B/s]Downloading: 28.8kB [00:00, 39.6MB/s]       
Downloading:   0% 0.00/4.47k [00:00<?, ?B/s]Downloading: 28.7kB [00:00, 47.4MB/s]       
Downloading and preparing dataset glue/sst2 (download: 7.09 MiB, generated: 4.81 MiB, post-processed: Unknown size, total: 11.90 MiB) to /root/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad...
Downloading: 100% 7.44M/7.44M [00:00<00:00, 32.9MB/s]
Dataset glue downloaded and prepared to /root/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad. Subsequent calls will reuse this data.
100% 3/3 [00:00<00:00, 261.60it/s]
Downloading from http://nlp.stanford.edu/data/glove.6B.zip to /root/.embeddings/glove/wikipedia_gigaword.zip
100% 400000/400000 [00:24<00:00, 16318.89it/s]
missing pre-trained embedding for 55 unknown words
Traceback (most recent call last):
  File "/content/llmsys_s25_hw1/llmsy

---

### 💾 **Submit Your Assignment: Create a ZIP File for Submission**

Run the following code to create a `llmsys_s25_hw1.zip` file, which you can download and upload to Canvas:


---

### 📋 **Instructions for Submission:**
1. **Run the cell below.**  
   - This will generate a `llmsys_s25_hw1.zip` file containing your entire project.
2. **Click the download link** that appears after the cell finishes running.
3. **Upload the downloaded ZIP file to Canvas.**



In [None]:
import shutil

# Define the directory to zip
dir_to_zip = "llmsys_s25_hw1"

# Create a zip file
output_filename = f"{dir_to_zip}.zip"
shutil.make_archive(dir_to_zip, 'zip', dir_to_zip)

# Provide a download link
from google.colab import files
files.download(output_filename)


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>