<img src="../images/rapids_logo.png" width="500">

# Accelerating Data Science Workflows with RAPIDS

In this workshop you will use RAPIDS, a platform enabling end to end GPU-accelerated data science, to perform accelerated data manipulation, accelerated machine learning, and, in the service of solving a real world problem.

## [GPU-Accelerated Data Manipulation](https://github.com/rapidsai/cudf)

In these notebooks you will interact with several data sets to transform UK population data in preparation for a variety of machine learning algorithms.

## [GPU-Accelerated Machine Learning](https://github.com/rapidsai/cuml)

In these notebooks you will utilize a variety of machine learning algorithms - K-means, DBSCAN, and XGBoost - to analyze ideal supply locations, clusters of infected people, and probabilities of infection.

## Project: Save the UK

In the final notebook you will utilize many of the techniques you have learned to identify clusters of infected people spread throughout the UK.

## Available GPU Accelerators

We can use these interactive cells to run shell commands by prefixing them with `!`. For example, execute the following cell to run the shell command `nvidia-smi`, which will print information about your environment's available GPU, its current memory usage, and any processes currently utilizing it:

In [None]:
!nvidia-smi

As you can see almost no GPU memory is being used right now, and, there are no active processes utilizing the GPU. Throughout the lab you can use this command to keep an eye on memory usage. As a general rule of thumb when doing data analysis on the GPU, we try to keep about half the GPU memory free for operations which will expand the data stored on the GPU device.

## Download Dataset

In [None]:
!wget https://github.com/jupytercon/2020-zronaghi/blob/master/data/data_pop.csv?raw=true -O ../data/data_pop.csv

## Table of Contents

**01_intro.ipynb**: This notebook.

**02_cudf_basics.ipynb**: Begin learning GPU-accelerated dataframe manipulation with cuDF.

**03_cudf_group_sort.ipynb**: Learn more advanced cuDF operations and visualize the population data.

**04_cuml_k-means.ipynb**: Optimize supply depot locations.

**05_xgboost.ipynb**: Estimate probability of infection for population members.

**06_final_exercise.ipynb:** Find dense clusters of infected individuals spread throughout the UK.