Skip to content

occisn/c-gpu-example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

c-gpu-example: CPU vs GPU Vector Addition Benchmark

Compares sequential C (GCC) with parallel CUDA on 100M-element float arrays. Targets WSL2 with an NVIDIA RTX 500 Ada GPU.

Prepared with the help of Claude Code.

Results

Version Time (s) Throughput (M elem/s) Speedup
CPU 0.7301 137
GPU 0.0176 5691 ~41×

Files

File Description
vector_add_cpu.c Sequential for loop, timed with clock()
vector_add_gpu.cu CUDA kernel, 256 threads/block, timed with cudaEvent API
Makefile Build script for both versions

Requirements

  • CPU: GCC
  • GPU: NVIDIA GPU + CUDA Toolkit (nvcc in PATH)
  • Compile and run from WSL2, not Windows directly

WSL2 CUDA Setup

  1. Install NVIDIA drivers on Windows (regular GeForce/Studio drivers).
  2. Install CUDA Toolkit in WSL2:
    wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-keyring_1.1-1_all.deb
    sudo dpkg -i cuda-keyring_1.1-1_all.deb
    sudo apt-get update
    sudo apt-get install -y cuda-toolkit-12-8
  3. Add to ~/.bashrc:
    export PATH=/usr/local/cuda/bin:$PATH
    export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

Build & Run

make all          # Build both versions
make cpu          # Build CPU version only
make gpu          # Build GPU version only
make run          # Build and run both
make clean        # Remove compiled binaries

From Windows: wsl make run

Key Parameters

Parameter Location Default Notes
N Both files 100 000 000 Keep below 1B for 4 GB VRAM
THREADS_PER_BLOCK GPU only 256

end of file

About

C minimal GPU code example

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors