# Fundamental Xarray + Cupy

Negin Sobhani, Deepak Cherian, and Max Jones  
negins@ucar.edu, dcherian@ucar.edu, and max@carbonplan.org

------------

## Introduction 

Xarray is a powerful library for working with labeled multi-dimensional arrays in Python. It provides a convenient and intuitive way to manipulate large and complex datasets, and is built on top of NumPy. CuPy, on the other hand, is a library that allows for efficient numerical computations on NVIDIA GPUs. 


When used together, Xarray and CuPy can provide an easy way to take advantage of GPU acceleration for scientific computing tasks. cupy-xarray provides an interface for using cupy in xarray, providing convenience accessors. 

In this tutorial, we'll explore how to use these two libraries together to perform high-performance computations on large datasets.

### Using CuPy with Xarray



In [5]:
## Import NumPy and CuPy
import cupy as cp
import numpy as np
import xarray as xr
import cupy_xarray # Adds .cupy to Xarray objects

#### Creating Xarray DataArray with CuPy

In [6]:
# create a DataArray with three dimensions and 100 elements along each dimension
da_np = xr.DataArray(np.random.rand(100, 100, 100), dims=['x', 'y', 'z'])

# create a DataArray with three dimensions and 100 elements along each dimension
da_cp = xr.DataArray(cp.random.rand(100, 100, 100), dims=['x', 'y', 'z'])

Check if these arrays are on GPU:

In [7]:
da_np.cupy.is_cupy

False

In [8]:
da_cp.cupy.is_cupy

True

In [9]:
da_cp.data.device

<CUDA Device 0>

### Basic Operations with Xarray and CuPy

Once we have created a DataArray using CuPy, we can perform various operations on it using the familiar Xarray syntax. For example:

In [10]:
%%time
# calculate the mean along the x dimension
mean_da = da_cp.mean(dim='x')

CPU times: user 196 ms, sys: 7.2 ms, total: 203 ms
Wall time: 276 ms


In [14]:
print (type(mean_da))

<class 'xarray.core.dataarray.DataArray'>


In [11]:
%%time 
# calculate the standard deviation along the y and z dimensions
std_da = da_cp.std(dim=['y', 'z'])

CPU times: user 750 ms, sys: 13.6 ms, total: 763 ms
Wall time: 812 ms


In [13]:
print (type(std_da))

<class 'xarray.core.dataarray.DataArray'>


<div class="alert alert-block alert-success">
<b> Most Xarray operations preserve array type. </b>
</div>


## Comparing Performance: CuPy with Xarray vs NumPy with Xarray
To compare the performance of using CuPy with Xarray to using NumPy with Xarray, let's perform a matrix multiplication operation using both libraries.

In [15]:
import time
# create two 1000x1000 DataArrays
n = 1000
da_np = xr.DataArray(np.random.rand(n, n), dims=['x', 'y'])
da_cp = xr.DataArray(cp.random.rand(n, n), dims=['x', 'y'])

# perform matrix multiplication with Xarray and NumPy
start_time = time.time()
result_np = da_np.dot(da_np)
end_time = time.time()
numpy_time = end_time - start_time

# perform matrix multiplication with Xarray and CuPy
start_time = time.time()
result_cp = da_cp.dot(da_cp)
cp.cuda.Stream.null.synchronize()  # wait for GPU computation to finish
end_time = time.time()
cupy_time = end_time - start_time

# calculate the speedup value with two decimal places
speedup = round(numpy_time / cupy_time, 2)

# print the speedup
print(f"CuPy with Xarray provides a {speedup:.2f}x speedup over NumPy with Xarray.")


CuPy with Xarray provides a 0.04x speedup over NumPy with Xarray.


Now, let's make the same comparison with other array sizes:

In [16]:
for n in [10, 100, 1000, 5000, 10000, 25000]:
    print("n =", n)

    da_np = xr.DataArray(np.random.rand(n, n), dims=['x', 'y'])
    da_cp = xr.DataArray(cp.random.rand(n, n), dims=['x', 'y'])

    # perform matrix multiplication with Xarray and NumPy
    start_time = time.time()
    result_np = da_np.dot(da_np)
    end_time = time.time()
    numpy_time = end_time - start_time

    # perform matrix multiplication with Xarray and CuPy
    start_time = time.time()
    result_cp = da_cp.dot(da_cp)
    cp.cuda.Stream.null.synchronize()  # wait for GPU computation to finish
    end_time = time.time()
    cupy_time = end_time - start_time

    print("Xarray DataArrays using CuPy provides a", round(numpy_time / cupy_time,2), "x speedup over NumPy.\n")

n = 10
Xarray DataArrays using CuPy provides a 0.7 x speedup over NumPy.

n = 100
Xarray DataArrays using CuPy provides a 0.55 x speedup over NumPy.

n = 1000
Xarray DataArrays using CuPy provides a 2.04 x speedup over NumPy.

n = 5000
Xarray DataArrays using CuPy provides a 18.64 x speedup over NumPy.

n = 10000
Xarray DataArrays using CuPy provides a 33.34 x speedup over NumPy.

n = 25000
Xarray DataArrays using CuPy provides a 53.72 x speedup over NumPy.



Plotting ?

In [None]:
Also plotting results from the benchmark
