<img src="images/cuda-python.jpg" style="float: right;" />

# GPU Development in Python 101

_Written by [Jacob Tomlinson](https://jacobtomlinson.dev)_

**Welcome to the GPU Development in Python 101 tutorial.**

I joined NVIDIA in 2019 and since then I’ve gotten to grips with the fundamentals of writing accelerated code in Python. I was amazed to discover that I didn’t need to learn C++ and I didn’t need new development tools. Writing GPU code in Python is easier today than ever, and in this tutorial, I will share what I’ve learned and how you can get started with accelerating your code.

In this tutorial we will cover:
- What is a GPU and why is it different to a CPU?
- An overview of the CUDA development model.
- Numba: A high performance compiler for Python.
- Writing your first GPU code in Python.
- Managing memory.
- Understanding what your GPU is doing with pyNVML (memory usage, utilization, etc).
- RAPIDS: A suite of GPU accelerated data science libraries.
- Working with Numpy style arrays on the GPU.
- Working with Pandas style dataframes on the GPU.
- Performing some scikit-learn style machine learning on the GPU.

Attendees will be expected to have a general knowledge of Python and programming concepts, but no GPU experience will be necessary. The key takeaway for attendees will be the knowledge that they don’t have to do much differently to get their code running on a GPU.

### Outline

- **Chapter 1** - Intro to GPUs (30 mins)
- **Chapter 2** - Writing low level GPU code in Python with Numba (30 mins)
- **Chapter 3** - More Numba (30 mins)
- **Chapter 4** - Observability and interoperability (30 mins)
- **Chapter 5** - Working with NumPy style arrays in Cupy (30 mins)
- **Chapter 6** - Accelerating DataFrames with cuDF (30 mins)
- **Chapter 7** - High performance machine learning with cuML (30 mins)
- **Chapter 8** - Distributing GPU Python code with Dask (30 mins)

### Requirements

#### At a conference

Generally when this tutorial is run as part of a conference a compute platform will be provided for attendees. Usually this is a custom [Binder](https://mybinder.org/) platform with GPUs available and dependencies such as drivers preinstalled. If you are following along with this tutorial at a conference right now then refer to your instructor for platform details.

#### Running it locally

Alternatively you can use your own environment if you have [an NVIDIA Pascal™ or better GPU](https://medium.com/dropout-analytics/which-gpus-work-with-rapids-ai-f562ef29c75f). All workshop material has been tested in the [RAPIDS software environment](https://rapids.ai/start.html#get-rapids) which can be used on Linux or Windows with WSL.

##### Operating system

- Ubuntu 18.04/20.04 or CentOS 7/8 with gcc/++ 9.0+
  - [Installation instructions](https://rapids.ai/start.html#get-rapids)
    - See RDN 8 for recent changes to gcc/++ 9.0 requirements
    - RHEL 7/8 support is provided through CentOS 7/8 builds/installs
- Experimental WSL2 on Windows
  - [Alternative Windows installation instructions](https://developer.nvidia.com/blog/run-rapids-on-microsoft-windows-10-using-wsl-2-the-windows-subsystem-for-linux/)

##### Drivers

One of the following supported CUDA and driver versions:

- 11.0 & v450.80.02+
- 11.2 & v460.27.03+
- 11.4 & v470.42.01+
- 11.5 & v495.29.05+

If you have a system that meets these requirements then head to the RAPIDS getting started docs to install things either with conda or docker. https://rapids.ai/start.html#get-rapids
