This notebook has blocks of code that can be used for timing data access to the geopolar dataset. The code here has been run from multiple compute locations. The spreadsheet timing results is in google drive.

In [1]:
import xarray as xr
import dask.array as da
import zarr

# Summary Information

## AOI for Timing

AOI_A) 1 degree x 1 degree (`geopolar.sel(lat=slice(40, 41), lon=slice(-70, -69))`)
* 2853600 (2.85 million) `.size`, 11.4MB `.bytes`

AOI_B) 2 degree x 2 degree (`geopolar.sel(lat=slice(40, 42), lon=slice(-70, -68))`)
* `.size`, `.bytes`


## Results
AOI_A)
* loading data (`.values`): 12 min 50s (:gasp:)
* calculation (loop): 44s
* total: 13 min 34s

AOI_B)
* loading data (`.values`): 16 min 15s (2 min 30 sec CPU time)
* calculation (loop): not timed


## Projection
Estimate (linear) for full area (~500 square degrees):
* loading data (`.values`): 6417 minutes = 4.5 DAYS. DAYS.
* calculation (loop): 1 hour 53 minutes

Goodness.

### Writes - no dask
A) 0.1 degree x 0.2 degree (`geopolar.sel(lat=slice(40, 40.1), lon=slice(-70, -69.8))`)
* `.to_zarr()`: 13 minutes. For 0.2 square degrees :double-gasp:

... they were both about 13 minutes ... is it possible that is how long it takes to access every chunk? AKA if we scaled up to the full area it still may not take much longer than 13 minutes?

# Code Snippets for Timed Runs

## Accessing data with `xarray`

In [2]:
from dask.distributed import Client

Client()

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status,

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 16
Total threads: 128,Total memory: 503.40 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:44272,Workers: 16
Dashboard: http://127.0.0.1:8787/status,Total threads: 128
Started: Just now,Total memory: 503.40 GiB

0,1
Comm: tcp://127.0.0.1:44353,Total threads: 8
Dashboard: http://127.0.0.1:33798/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:34010,
Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-wov07bo8,Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-wov07bo8

0,1
Comm: tcp://127.0.0.1:35514,Total threads: 8
Dashboard: http://127.0.0.1:45417/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:35575,
Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-zc_pzq58,Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-zc_pzq58

0,1
Comm: tcp://127.0.0.1:45373,Total threads: 8
Dashboard: http://127.0.0.1:38765/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:45459,
Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-rlathuez,Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-rlathuez

0,1
Comm: tcp://127.0.0.1:41770,Total threads: 8
Dashboard: http://127.0.0.1:43067/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:44396,
Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-49hjm6su,Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-49hjm6su

0,1
Comm: tcp://127.0.0.1:40572,Total threads: 8
Dashboard: http://127.0.0.1:36178/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:42501,
Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-r5fifx5n,Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-r5fifx5n

0,1
Comm: tcp://127.0.0.1:39087,Total threads: 8
Dashboard: http://127.0.0.1:43641/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:33726,
Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-3du_xr87,Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-3du_xr87

0,1
Comm: tcp://127.0.0.1:34631,Total threads: 8
Dashboard: http://127.0.0.1:34111/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:39841,
Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-t15qva7k,Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-t15qva7k

0,1
Comm: tcp://127.0.0.1:38867,Total threads: 8
Dashboard: http://127.0.0.1:34687/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:41308,
Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-81w_qr84,Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-81w_qr84

0,1
Comm: tcp://127.0.0.1:46789,Total threads: 8
Dashboard: http://127.0.0.1:41509/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:39376,
Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-y72e5v_y,Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-y72e5v_y

0,1
Comm: tcp://127.0.0.1:44095,Total threads: 8
Dashboard: http://127.0.0.1:37902/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:42883,
Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-uzemq6b4,Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-uzemq6b4

0,1
Comm: tcp://127.0.0.1:41464,Total threads: 8
Dashboard: http://127.0.0.1:40412/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:41823,
Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-6d80za68,Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-6d80za68

0,1
Comm: tcp://127.0.0.1:40559,Total threads: 8
Dashboard: http://127.0.0.1:37423/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:34637,
Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-n656u058,Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-n656u058

0,1
Comm: tcp://127.0.0.1:43961,Total threads: 8
Dashboard: http://127.0.0.1:42255/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:34141,
Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-hf88klch,Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-hf88klch

0,1
Comm: tcp://127.0.0.1:39843,Total threads: 8
Dashboard: http://127.0.0.1:43649/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:38772,
Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-o2rkgkkm,Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-o2rkgkkm

0,1
Comm: tcp://127.0.0.1:45673,Total threads: 8
Dashboard: http://127.0.0.1:37062/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:42071,
Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-uwruy82j,Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-uwruy82j

0,1
Comm: tcp://127.0.0.1:37190,Total threads: 8
Dashboard: http://127.0.0.1:46305/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:33915,
Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-trliieuh,Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-trliieuh


In [3]:
%%time

# open dataset
filepath = 'https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/noaa-coastwatch-geopolar-sst-feedstock/noaa-coastwatch-geopolar-sst.zarr'
geopolar = xr.open_zarr(filepath)
# subset
geopolar = geopolar.analysed_sst.sel(lat=slice(40, 42), lon=slice(-70, -68))
# access data
sst = geopolar.values

CPU times: user 2min 16s, sys: 14.9 s, total: 2min 30s
Wall time: 16min 15s


## Timing `zarr` access

In [5]:
from dask.distributed import Client

Client()

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status,

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 16
Total threads: 128,Total memory: 503.40 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:40610,Workers: 16
Dashboard: http://127.0.0.1:8787/status,Total threads: 128
Started: Just now,Total memory: 503.40 GiB

0,1
Comm: tcp://127.0.0.1:37245,Total threads: 8
Dashboard: http://127.0.0.1:36390/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:44123,
Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-6rsm6dem,Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-6rsm6dem

0,1
Comm: tcp://127.0.0.1:44516,Total threads: 8
Dashboard: http://127.0.0.1:46514/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:37005,
Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-dj2pb_zv,Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-dj2pb_zv

0,1
Comm: tcp://127.0.0.1:36867,Total threads: 8
Dashboard: http://127.0.0.1:43967/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:35380,
Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-9zt0_te6,Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-9zt0_te6

0,1
Comm: tcp://127.0.0.1:36687,Total threads: 8
Dashboard: http://127.0.0.1:40475/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:44195,
Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-4c4u8060,Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-4c4u8060

0,1
Comm: tcp://127.0.0.1:42268,Total threads: 8
Dashboard: http://127.0.0.1:39361/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:37427,
Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-sjwropzl,Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-sjwropzl

0,1
Comm: tcp://127.0.0.1:44891,Total threads: 8
Dashboard: http://127.0.0.1:44057/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:42397,
Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-zymua3ik,Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-zymua3ik

0,1
Comm: tcp://127.0.0.1:46288,Total threads: 8
Dashboard: http://127.0.0.1:46683/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:35475,
Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-8xlv54bh,Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-8xlv54bh

0,1
Comm: tcp://127.0.0.1:34515,Total threads: 8
Dashboard: http://127.0.0.1:44175/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:45999,
Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-f17d5r1t,Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-f17d5r1t

0,1
Comm: tcp://127.0.0.1:39640,Total threads: 8
Dashboard: http://127.0.0.1:34464/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:41416,
Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-qk1pmg3c,Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-qk1pmg3c

0,1
Comm: tcp://127.0.0.1:40751,Total threads: 8
Dashboard: http://127.0.0.1:37654/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:34322,
Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-mb9wxdvw,Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-mb9wxdvw

0,1
Comm: tcp://127.0.0.1:43233,Total threads: 8
Dashboard: http://127.0.0.1:42036/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:45990,
Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-azgdxwdt,Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-azgdxwdt

0,1
Comm: tcp://127.0.0.1:36220,Total threads: 8
Dashboard: http://127.0.0.1:45105/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:36978,
Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-_zbmsxgw,Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-_zbmsxgw

0,1
Comm: tcp://127.0.0.1:44456,Total threads: 8
Dashboard: http://127.0.0.1:34617/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:39325,
Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-sg5viy6u,Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-sg5viy6u

0,1
Comm: tcp://127.0.0.1:45043,Total threads: 8
Dashboard: http://127.0.0.1:46379/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:43579,
Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-hzgk1d01,Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-hzgk1d01

0,1
Comm: tcp://127.0.0.1:39461,Total threads: 8
Dashboard: http://127.0.0.1:46395/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:43184,
Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-wr7ec674,Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-wr7ec674

0,1
Comm: tcp://127.0.0.1:35430,Total threads: 8
Dashboard: http://127.0.0.1:45793/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:46453,
Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-cro108o_,Local directory: /homes/metogra/rwegener/mhw-ocetrac-census/notebooks/dask-worker-space/worker-cro108o_


### 1 - using the `zarr` library

In [6]:
%%time

# open dataset
filepath = 'https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/noaa-coastwatch-geopolar-sst-feedstock/noaa-coastwatch-geopolar-sst.zarr'
geopolar = zarr.open(filepath, mode='r')
sst = geopolar.analysed_sst

# subset and access data
data_subset = sst[:, 2600:2620, 2200:2220]

CPU times: user 11min 20s, sys: 5min 33s, total: 16min 53s
Wall time: 13min 48s


### 2 - using `da.from_zarr()`

In [10]:
%%time

# open dataset
filepath = 'https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/noaa-coastwatch-geopolar-sst-feedstock/noaa-coastwatch-geopolar-sst.zarr'
geopolar = da.from_zarr(filepath, component='/analysed_sst')

# compute
sst_subset = geopolar[:, 2600:2620, 2200:2220]
sst_subset = sst_subset.compute()

CPU times: user 2min 53s, sys: 12.4 s, total: 3min 6s
Wall time: 12min 53s


### netcdf openmfdataset

In [4]:
xr.open_mfdataset('/data/pacific/rwegener/noaa-geopolar-nc/*.nc')

Unnamed: 0,Array,Chunk
Bytes,688.86 GiB,98.88 MiB
Shape,"(7134, 3600, 7200)","(1, 3600, 7200)"
Count,21402 Tasks,7134 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 688.86 GiB 98.88 MiB Shape (7134, 3600, 7200) (1, 3600, 7200) Count 21402 Tasks 7134 Chunks Type float32 numpy.ndarray",7200  3600  7134,

Unnamed: 0,Array,Chunk
Bytes,688.86 GiB,98.88 MiB
Shape,"(7134, 3600, 7200)","(1, 3600, 7200)"
Count,21402 Tasks,7134 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,688.86 GiB,98.88 MiB
Shape,"(7134, 3600, 7200)","(1, 3600, 7200)"
Count,21402 Tasks,7134 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 688.86 GiB 98.88 MiB Shape (7134, 3600, 7200) (1, 3600, 7200) Count 21402 Tasks 7134 Chunks Type float32 numpy.ndarray",7200  3600  7134,

Unnamed: 0,Array,Chunk
Bytes,688.86 GiB,98.88 MiB
Shape,"(7134, 3600, 7200)","(1, 3600, 7200)"
Count,21402 Tasks,7134 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,688.86 GiB,98.88 MiB
Shape,"(7134, 3600, 7200)","(1, 3600, 7200)"
Count,21402 Tasks,7134 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 688.86 GiB 98.88 MiB Shape (7134, 3600, 7200) (1, 3600, 7200) Count 21402 Tasks 7134 Chunks Type float32 numpy.ndarray",7200  3600  7134,

Unnamed: 0,Array,Chunk
Bytes,688.86 GiB,98.88 MiB
Shape,"(7134, 3600, 7200)","(1, 3600, 7200)"
Count,21402 Tasks,7134 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,688.86 GiB,98.88 MiB
Shape,"(7134, 3600, 7200)","(1, 3600, 7200)"
Count,21402 Tasks,7134 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 688.86 GiB 98.88 MiB Shape (7134, 3600, 7200) (1, 3600, 7200) Count 21402 Tasks 7134 Chunks Type float32 numpy.ndarray",7200  3600  7134,

Unnamed: 0,Array,Chunk
Bytes,688.86 GiB,98.88 MiB
Shape,"(7134, 3600, 7200)","(1, 3600, 7200)"
Count,21402 Tasks,7134 Chunks
Type,float32,numpy.ndarray
