Compares sequential C (GCC) with parallel CUDA on 100M-element float arrays. Targets WSL2 with an NVIDIA RTX 500 Ada GPU.
Prepared with the help of Claude Code.
| Version | Time (s) | Throughput (M elem/s) | Speedup |
|---|---|---|---|
| CPU | 0.7301 | 137 | 1× |
| GPU | 0.0176 | 5691 | ~41× |
| File | Description |
|---|---|
vector_add_cpu.c |
Sequential for loop, timed with clock() |
vector_add_gpu.cu |
CUDA kernel, 256 threads/block, timed with cudaEvent API |
Makefile |
Build script for both versions |
- CPU: GCC
- GPU: NVIDIA GPU + CUDA Toolkit (
nvccin PATH) - Compile and run from WSL2, not Windows directly
- Install NVIDIA drivers on Windows (regular GeForce/Studio drivers).
- Install CUDA Toolkit in WSL2:
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-keyring_1.1-1_all.deb sudo dpkg -i cuda-keyring_1.1-1_all.deb sudo apt-get update sudo apt-get install -y cuda-toolkit-12-8
- Add to
~/.bashrc:export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
make all # Build both versions
make cpu # Build CPU version only
make gpu # Build GPU version only
make run # Build and run both
make clean # Remove compiled binariesFrom Windows: wsl make run
| Parameter | Location | Default | Notes |
|---|---|---|---|
N |
Both files | 100 000 000 | Keep below 1B for 4 GB VRAM |
THREADS_PER_BLOCK |
GPU only | 256 | — |