# Parallelization Testing

In this notebook, I will learn how to use dask within xarray to parallelize running code and speed up parts of the Argo analysis. I'll start by running a simple test case (I hope to find) in xarray's documentation. If this work successfully, I will then move on to running the depth-->density interpolation function to see if that comes with speed improvements too.

In [1]:
import xarray as xr
import matplotlib.pyplot as plt
import matplotlib as mpl
from matplotlib.path import Path
import seaborn as sns
import seaborn
import pandas as pd
import numpy as np
from importlib import reload
import cartopy.crs as ccrs
import cmocean.cm as cmo
import gsw
import dask.array as da

In [2]:
import os
os.chdir('/home/jovyan/argo-intern/funcs')
import density_funcs as df
import EV_funcs as ef
import filt_funcs as ff
import plot_funcs as pf
import processing_funcs as prf

In [3]:
reload(df)
reload(ef)
reload(ff)
reload(prf)

<module 'processing_funcs' from '/home/jovyan/argo-intern/funcs/processing_funcs.py'>

# Reproducable Test

Goal here is to make a really big array and then test loading with dask vs loading without dask. I'm following the rough steps Stephan Hoyer outlines in this blogpost (https://stephanhoyer.com/2015/06/11/xray-dask-out-of-core-labeled-arrays/), including creating a dataset with the same dimensions of

Dimensions:(latitude: 256, longitude: 512, time: 52596)

In [4]:
factor = 1

lat, lon, time = 256*factor, 512*factor, 52596*factor

In [8]:
data = da.random.random((time,lat,lon),chunks=(100,256,512))

In [9]:
ds = xr.Dataset(
    {
        "data": (["time", "latitude", "longitude"], data)
    },
    coords={
        "time": np.arange(time),
        "latitude": np.linspace(-90, 90, lat),
        "longitude": np.linspace(-180, 180, lon)
    }
)

In [10]:
ds

Unnamed: 0,Array,Chunk
Bytes,51.36 GiB,100.00 MiB
Shape,"(52596, 256, 512)","(100, 256, 512)"
Dask graph,526 chunks in 1 graph layer,526 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 51.36 GiB 100.00 MiB Shape (52596, 256, 512) (100, 256, 512) Dask graph 526 chunks in 1 graph layer Data type float64 numpy.ndarray",512  256  52596,

Unnamed: 0,Array,Chunk
Bytes,51.36 GiB,100.00 MiB
Shape,"(52596, 256, 512)","(100, 256, 512)"
Dask graph,526 chunks in 1 graph layer,526 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray


In [None]:
%time result = ds.mean('time').compute()

This isn't promising. The CPU time seems very close to the wall time. Maybe creating a much bigger array will make a difference? I'll try again but up the array size. If that doesn't seem to work, then I will need to figure out a different way to create the array (one that doesn't involve using dask, because obviously that's providing issues for comparison to not using dask)

Okay, this definitely isn't working. Wall time greater than CPU time means that this is not running in parallel. What else can we try?

In [68]:
natl = xr.open_dataset('/swot/SUM05/amf2288/sync-boxes/lon:(-25,-20)_lat:(-70,70)_ds_z.nc')
datl = xr.open_dataset('/swot/SUM05/amf2288/sync-boxes/lon:(-25,-20)_lat:(-70,70)_ds_z.nc').chunk({'N_PROF':1000})

In [73]:
%time float(natl.CT.mean())

CPU times: user 43.3 ms, sys: 18.1 ms, total: 61.4 ms
Wall time: 57.9 ms


6.800860854562184

In [74]:
%time float(datl.CT.mean())

CPU times: user 85.5 ms, sys: 66.1 ms, total: 152 ms
Wall time: 63.5 ms


6.800860854562181

In [69]:
%time natl.CT.groupby('LATITUDE').mean();

CPU times: user 6.36 s, sys: 228 ms, total: 6.59 s
Wall time: 6.59 s


In [76]:
%time datl.CT.groupby('LATITUDE').mean();

CPU times: user 17.1 s, sys: 96.9 ms, total: 17.2 s
Wall time: 17.2 s


Okay something is not working as expected because the xr ds loaded with dask takes longer than the one loaded without. A few thoughts:
- It's possible the chunks are too small, so the overhead added for each calculation overwhelmes any advantage of running in parallel.
- Maybe it's not using multiple cores at all: the CPU time is about the same as wall time, which isn't a good sign.
- Maybe this isn't a time consuming enough calculation for using dask to make a difference at all?

The first thing to look into is definitely the second bullet point. If the processes aren't running oon multiple cores, then nothing else is going to work either.

Okay, I went to http://gyre.ldeo.columbia.edu:19999/#menu_users_submenu_cpu;theme=slate;help=true and the natl and datl runs both took right at (or slightly over ) 100%. So I don't think anything is being parallelized. What to try next??