# Notebook 2 : High-level Xarray Functions: CuPy vs. NumPy

Negin Sobhani, Deepak Cherian, and Max Jones  
negins@ucar.edu, dcherian@ucar.edu, and max@carbonplan.org

------------

## Introduction 

In the previous tutorial, we introduced the powerful combination of Xarray and CuPy for handling multi-dimensional datasets and leveraging GPU acceleration to significantly improve performance. 

In this tutorial, we are going to explore high-level Xarray functions such as groupby, rolling mean, and weighted mean, and compared their execution times with traditional NumPy-based implementations.

###  High-level Xarray Functions: CuPy vs. NumPy

In thistutorial, we'll explore the performance differences between high-level Xarray functions using CuPy and NumPy. CuPy is a GPU-based NumPy-compatible library, while NumPy is the well-known CPU-based library for numerical computations. We'll focus on three high-level functions: groupby, rolling mean, and weighted mean. We'll also compare the time it takes to execute each function using both CuPy and NumPy.
Let's create some sample data to work with.

We'll use a 3-dimensional dataset (time, latitude, longitude) with random values:

In [1]:
import time

In [2]:
import numpy as np 
import xarray as xr

In [3]:
import cupy as cp
import cupy_xarray  # Adds .cupy to Xarray objects

In [4]:
np.random.seed(0)
data_np = np.random.rand(1000, 180, 360)
data_cp = cp.array(data_np)
data_xr_np = xr.DataArray(data_np, dims=['time', 'lat', 'lon'])
data_xr_cp = xr.DataArray(data_cp, dims=['time', 'lat', 'lon'])

#### Groupby
The groupby function is used to group data based on one or more dimensions. Here, we'll group our data by the 'time' dimension using both CuPy and NumPy:

In [5]:
start_time_np = time.time()

grouped_data_np = data_xr_np.groupby('time')

end_time_np = time.time()
time_np = end_time_np - start_time_np

In [6]:
start_time_cp = time.time()

grouped_data_cp = data_xr_cp.groupby('time')

end_time_cp = time.time()
time_cp = end_time_cp - start_time_cp

print("GroupBy with Xarray DataArrays using CuPy provides a", round(time_np / time_cp,2), "x speedup over NumPy.\n")

GroupBy with Xarray DataArrays using CuPy provides a 7.96 x speedup over NumPy.



### Rolling Mean:

The rolling mean is a widely used technique for smoothing data over a specified window. We'll calculate the rolling mean along the 'time' dimension with a window size of 10:

In [7]:
xr.set_options(use_bottleneck=False)

<xarray.core.options.set_options at 0x2b72e8d90fa0>

In [8]:
start_time_np = time.time()

rolling_mean_np = data_xr_np.rolling(time=10).mean()

end_time_np = time.time()
time_np = end_time_np - start_time_np


In [9]:
start_time_cp = time.time()

rolling_mean_cp = data_xr_cp.rolling(time=10).mean()

end_time_cp = time.time()
time_cp = end_time_cp - start_time_cp

In [10]:
print("Rolling mean with Xarray DataArrays using CuPy provides a", round(time_np / time_cp,2), "x speedup over NumPy.\n")

Rolling mean with Xarray DataArrays using CuPy provides a 17.9 x speedup over NumPy.



### Weighted Mean

The weighted mean is another way to smooth data, taking into account the varying importance of each data point. Here, we'll use a uniform weight along the 'time' dimension:



In [None]:
start_time_np = time.time()

weights_np = xr.DataArray(np.ones_like(data_np), dims="time")
weighted_mean_np = data_xr_np.weighted(weights_np).mean(dim='time')

end_time_np = time.time()
time_np = end_time_np - start_time_np

In [None]:
start_time_cp = time.time()

weights_cp = xr.DataArray(cp.ones_like(data_cp), dims="time")
weighted_mean_cp = data_xr_cp.weighted(weights_cp).mean(dim='time')

end_time_cp = time.time()
time_cp = end_time_cp - start_time_cp

## Apply custom kernels with apply_ufunc

## Plotting (possibly in the other notebook). 