## Getting Started with RAPIDS: GPU-Accelerated Data Science for PyData Users

### What is RAPIDS?

The [RAPIDS](https://rapids.ai/) data science framework is a collection of libraries for running end-to-end data science
pipelines completely on the GPU. The interface is designed to have a familiar look and feel to Python users, while 
utilizing optimized NVIDIA® CUDA® primitives and high-bandwidth GPU memory under the hood.

### Outline
- GPU basics
    - What are the differences between a CPU and a GPU?
    - How can I use a GPU for Data Science?
- RAPIDS
    - Intro to cuDF
        - When to use cuDF and when not to
    - cuDF pandas accelerator
        - Extract performance from your pandas workflow with zero-code changes
    - Intro to cuML 
        - GPU-Accelerated Machine Learning 
    - cuml.accel
        - Accelerate scikit-learn workflows with zero code changes 

## Setup

### Google Colab

You can run this tutorial on Google Colab, with the basic free account, you can
get an interactive Python environment with GPU. cuDF and cuML, and all the 
libraries needed for this tutorial are already baked into the basic environment.

To run each notebook, click on the corresponding link below to open it in 
Google Colab, change the runtime type to `T4 GPU` and save.

You will download the required data for each notebook in its respective session.

| Notebook    | Link |
| ----------- | ----------- |
| 1 Intro to cuDF      | [![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/rapidsai-community/tutorial/blob/main/1.Intro_to_cuDF.ipynb) |
| 2 cudf.pandas        | [![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/rapidsai-community/tutorial/blob/main/2.cudf_pandas.ipynb) |
| 3 Intro to cuML      | [![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/rapidsai-community/tutorial/blob/main/4.Intro_to_cuML.ipynb) |
| 4 cuml.accel         | [![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/rapidsai-community/tutorial/blob/main/5.cuml_accel.ipynb) |


#### Extra content: 

| Notebook    | Link |
| ----------- | ----------- |
| cudf polars engine    | [![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/rapidsai-community/tutorial/blob/main/extras/cudf_polars_engine.ipynb) |


### Local 

If you have access to a GPU you can run this locally:

### Get Notebooks and Setup Environment

In a terminal:

```bash
git clone https://github.com/rapidsai-community/tutorial
```

Once inside the repository: 

```bash
conda env create -f local-env.yaml
conda activate rapids-tutorial
```

### Get the data
During this tutorial we will use different datasets, you can get them all by 
running the cell below. 

```bash
python data_setup.py --pydata-vt
```

## Appendix: 

If you are interested in learning more about GPU Python you can follow the 
[The Accelerated Computing Hub - GPU Python Tutorial](https://github.com/NVIDIA/accelerated-computing-hub/tree/main/gpu-python-tutorial) which covers topics 
like:

- Kernel development from Python with Numba
- Memory allocation between Python libraries with the CUDA Array Interface
- Understanding what your GPU is doing with pyNVML (memory usage, utilization, etc)
- CuPy: A NumPy/SciPy like library that runs on the GPU
- Multi-GPU with Dask

Colab links to specific notebooks can be found [here](https://github.com/NVIDIA/accelerated-computing-hub/tree/main/gpu-python-tutorial#notebooks)