# Reading and writing xarrays on Raven using netcdf

## Performance data


### 10 GB of random data: Writing

- 10 GB of random data, uncompressed -> 3 seconds -> 10.3 GB (about 3GB / sec) - cache effect?
- 10 GB of random data, zlib level 1 ->  6.2 minutes -> 8.76 GB (1.61 GB / min)
- 10 GB of random data, zlib level 9 ->  8.3 minutes -> 8.71 GB (1.2 GB / min)
- 10 GB of random data, gzip level 1 ->  7.8 minutes -> 9.79 GB (1.28 GB / min)
- 10 GB of random data, gzip level 9 -> 9.15 minutes -> 9.72 GB (1.09 GB / min)

### 10 GB of random data: Reading
(We may see cache effects here.
Update: probably no cache effects.)

- 10 GB of random data, uncompressed -> 3 seconds -> about 3GB / sec -> 26 Gbit/sec 
- 10 GB of random data, zlib level 1: 43.5 seconds -> 13.7 GB / minute
- 10 GB of random data, zlib level 9: 56.7 seconds -> 10.6 GB / minute

### A 25 GB data set (random numbers)
- Write 25 GB of random data using zlib-level 1 ->  15.5minutes, 22 GB -> 1.4 GB / minute [compression takes the time]

### 1 GB of zeros
- Write 1GB of zeros using zlib-level 1 -> 4seconds -> 0.005GB (5MB)

### 1 GB of random data
- Write 1GB of random data using gzip level 1 -> 44.0 seconds -> 0.936 GB
- Write 1GB of random data using gzip level 9 -> 52.8 seconds -> 0.930 GB
- Write 1GB of random data using zlib level 1 -> 38.8 seconds -> 0.839 GB

### 75 GB of random data
On Raven `/home`

- Write to disk uncompressed: 21 seconds -> 28.6 Gbit/sec
- Read from disk uncompressed: 21.5 seconds -> 28 Gbit/sec

On Raven `/ptmp`
- Write to disk uncompressed: 14.5 seconds -> 41 Gbit/sec
- Read from disk uncompressed: 21.5 seconds -> 28 Gbit/sec





In [None]:
import sys

In [None]:
pwd

In [None]:
cd / ptmp / hafan / ilke

In [None]:
pwd

In [None]:
sys.version

In [None]:
# !pip list

In [None]:
!pip install h5netcdf xarray

In [None]:
import xarray

In [None]:
import numpy as np

In [None]:
%%time
n = 75  # number of gigabytes in test data set
r = np.random.uniform(size=(n, 401, 401, 801))
r.nbytes / 1e9

In [None]:
%%time
zeros = np.zeros(shape=(n, 401, 401, 801))
zeros.nbytes / 1e9

Convert numpy array to xarray with name

In [None]:
%%time
xr = xarray.DataArray(r, name="field")

In [None]:
%%time
xr_zeros = xarray.DataArray(zeros, name="field")

In [None]:
xr_zeros.shape

In [None]:
!rm *nc

In [None]:
!ls -l

In [None]:
xr.nbytes / 1e9

Write data uncompressed

In [None]:
%%time
xr.to_netcdf("test-default.nc")

In [None]:
!ls -lh test-default.nc

Write compressed data (zlib level 1)

In [None]:
%%time
encode_field = {"zlib": True, "complevel": 1}
xr.to_netcdf("test-zlib-level1.nc", engine="netcdf4", encoding={"field": encode_field})

Write compressed data (zlib level 9)

In [None]:
%%time
encode_field = {"zlib": True, "complevel": 9}
xr.to_netcdf("test-zlib-level9.nc", engine="netcdf4", encoding={"field": encode_field})

In [None]:
!ls -l *.nc

# Writing compressed uniform data


In [None]:
%%time
encode_field = {"zlib": True, "complevel": 1}
xr_zeros.to_netcdf(
    "zeros-zlib-level1.nc", engine="netcdf4", encoding={"field": encode_field}
)

# Reading

In [None]:
pwd

In [None]:
%%time
tmp = xarray.open_dataarray("test-default.nc")
tmp.data.shape

In [None]:
%%time
tmp = xarray.open_dataarray("test-default.nc")
tmp.data.shape

In [None]:
%%time
tmp = xarray.open_dataarray("test-zlib-level1.nc")
tmp.data.shape

In [None]:
np.abs((tmp - xr).data).max()  # should be zero

%%time
tmp = xarray.open_dataarray('test-zlib-level9.nc')
tmp.data.shape

# Compression with gzip 

Needs installation of `h5netcdf` module via pip.

%%time
encode_field = {'compression': 'gzip', "compression_opts": 9}
xr.to_netcdf('test-gzip-level9.nc', 
                   engine='h5netcdf', 
                   encoding={'field': encode_field})

%%time
encode_field = {'compression': 'gzip', "compression_opts": 1}
xr.to_netcdf('test-gzip-level1.nc', 
                   engine='h5netcdf', 
                   encoding={'field': encode_field})

In [None]:
!ls -l

In [None]:
!ls  -lh

In [None]:
!rm *.nc