![NCAR UCAR Logo](img/NCAR_CISL_NSF_banner.jpeg)
# CuPy and Legate for NumPy, SciPy, Pandas code to GPU

By: Brett Neuman [bneuman@ucar.edu](mailto:bneuman@ucar.edu), Consulting Services Group, CISL & NCAR

Date: July 28th 2022

Head to the [NCAR JupyterHub portal](https://jupyterhub.hpc.ucar.edu/stable) and __start a JupyterHub session on Casper login__ (or batch nodes using 1 CPU, no GPUs) and open the notebook in `12_PythonGPU.ipynb`. Be sure to clone (if needed) and update/pull the NCAR GPU_workshop directory.

```shell
# Use the JupyterHub GitHub GUI on the left panel or the below shell commands
git clone git@github.com:NCAR/GPU_workshop.git
git pull
```

# Workshop Etiquette
* Please mute yourself and turn off video during the session.
* Questions may be submitted in the chat and will be answered when appropriate. You may also raise your hand, unmute, and ask questions during Q&A at the end of the presentation.
* By participating, you are agreeing to [UCAR’s Code of Conduct](https://www.ucar.edu/who-we-are/ethics-integrity/codes-conduct/participants)
* Recordings & other material will be archived & shared publicly.
* Feel free to follow up with the GPU workshop team via Slack or submit support requests to [support.ucar.edu](https://support.ucar.edu)
    * Office Hours: Asynchronous support via [Slack](https://ncargpuusers.slack.com) or schedule a time with an organizer

## Notebook Setup
Set the `PROJECT` code to a currently active project, ie `UCIS0004` for the GPU workshop, and `QUEUE` to the appropriate routing queue depending on if during a live workshop session (`gpuworkshop`), during weekday 8am to 5:30pm MT (`gpudev`), or all other times (`casper`). Due to limited shared GPU resources, please use `GPU_TYPE=gp100` during the workshop. Otherwise, set `GPU_TYPE=v100` (required for `gpudev`) for independent work. See [Casper queue documentation](https://arc.ucar.edu/knowledge_base/72581396#StartingCasperjobswithPBS-Concurrentresourcelimits) for more info.  

In [None]:
%%bash
export PROJECT=UCIS0004
export QUEUE=gpudev
export GPU_TYPE=gp100

module load nvhpc/22.2 &> /dev/null
export PNETCDF_INC=/glade/u/apps/dav/opt/pnetcdf/1.12.2/openmpi/4.1.1/nvhpc/22.2/include
export PNETCDF_LIB=/glade/u/apps/dav/opt/pnetcdf/1.12.2/openmpi/4.1.1/nvhpc/22.2/lib
echo $GPU_TYPE

## Python Virtual Environment Setup

See [Python virtual environment documentation](https://kb.ucar.edu/display/RC/Using+conda+environments+for+Python+access) for using the NCAR Python Librray (npl) or for setting up your own virtual environment.

In [4]:
%%bash
module load conda/latest

# Required first time run
#mamba create -n pgpu python==3.7.* numpy scipy cupy matplotlib pandas xarray

# Not needed
#conda env create --file envs/environment.yml
conda activate pgpu

# Required to select kernel from JupyterHub
# python -m ipykernel install --user --name=pgpu

# Export a new virtual environment based on NPL
# conda env export [--from-history] -n npl > npl-environment.yml

In [5]:
%%bash
# jupyter kernelspec list
# jupyter kernel -h
# jupyter kernel --kernel=pgpu

In [2]:
import cupy as cp

In [3]:
# Stable implementation of log(1 + exp(x))
def softplus(x):
    xp = cp.get_array_module(x)  # 'xp' is a standard usage in the community
    print("Using:", xp.__name__)
    return xp.maximum(0, x) + xp.log1p(xp.exp(-abs(x))) 

print(softplus(10))

Using: numpy
10.000045398899218


# Python GPU Packages and Tools

Tools to enhance performant Python packages to be executed on the GPU.

CuPy

Legate

## From CPU to GPU

How to get your code to run on the GPU

## GPU Principles applied in Python

Awareness of memory location, memory management, synchronization, and porting choices are still the responsibility of the user.  

# CuPy

## Overview

CuPy is a NumPy/SciPy-compatible array library for GPU-accelerated computing with Python. CuPy acts as a drop-in replacement to run existing NumPy/SciPy code on NVIDIA CUDA or AMD ROCm platforms.

CuPy provides a ndarray, sparse matrices, and the associated routines for GPU devices, all having the same API as NumPy and SciPy.

## Setup

Drop in replace option

CPU / GPU agnostic code with ```import cupy as xp```

In [None]:
import cupy as cp

In [None]:
# Stable implementation of log(1 + exp(x))
def softplus(x):
    xp = cp.get_array_module(x)  # 'xp' is a standard usage in the community
    print("Using:", xp.__name__)
    return xp.maximum(0, x) + xp.log1p(xp.exp(-abs(x))) 


print(softplus(10))