A hands-on, experiment-driven introduction to GPU kernels, memory hierarchy, and performance optimization — using HIP.
This repository is an interactive learning environment for anyone who wants to understand how GPU compute programming really works.
Instead of reading theory, you will run kernels, measure performance, visualize results, and discover GPU architecture concepts by experimentation.
You do not need an AMD GPU. These tutorials use AMD's HIP, however any supported GPU or just a CPU backend will still work.
By completing the 12 interactive lessons in this repo, you will understand:
- Threads, blocks, grids
- Wavefronts/warps and occupancy
- How GPUs hide latency with massive parallelism
- Global memory, caches, shared memory
- Memory coalescing and strided accesses
- Latency vs bandwidth bottlenecks
- Shared memory bank conflicts
- Launch parameters (block size, tile size)
- Divergence and branch coherence
- Tiling and data reuse
- Compute-bound vs memory-bound workloads
- Overlapping compute and memory transfers with streams
- Accurate timing with HIP events
- Running parameter sweeps
- Visualizing bandwidth, GFLOPs, and latency
- Interpreting performance curves
These topics reflect the fundamentals used by AMD, NVIDIA, and Intel GPU engineers.
- Run the following command in the root of the repo to install the required packages: pip install -r requirements.txt
- No GPU is necessary to run the experiments
Each lesson includes:
- A concept explanation
- A runnable HIP kernel
- An experiment (parameter sweep)
- A plot or visualization
- A discussion of results
Threads, blocks, grids, indexing (vector add).
gpu-programming-interactive-tutorials/
│
├── lessons/
│ └── lesson01_execution_model/
│ ├── kernel.cu
│ ├── experiment.cpp
│ └── README.md
│
├── common/
│ ├── utils.h
│ ├── timing.h
│ ├── plotting/
│ │ └── (Python or JS plotting helpers)
│ └── data/
│
├── python/
│ ├── analyze_results.ipynb
│ └── plot_results.py
│
├── results/
│ ├── example_plots/
│ └── csv/
│
└── README.md
- HIP / ROCm toolchain (or HIP CPU backend)
- CMake or hipcc
- Python 3.10+
- Pandas
- Matplotlib or Plotly for visualizations