# Impact of lock parameter on loading time with xarray

### Author 
 - Julien Le Sommer, CNRS

### Context
 - following the analysis published [in this first notebook](https://github.com/lesommer/notebooks/blob/master/Profiling_simple_derivative_computation_with_numpy_vs_dask_vs_xarray.ipynb) and and and [this second notebook](https://github.com/lesommer/notebooks/blob/master/More_profiling_of_dask_vs_xarray.ipynb) and solved in [this third notebook](https://github.com/lesommer/notebooks/blob/master/Profiling_derivatives_with_dask_and_xarray_end_of_the_story.ipynb), as discussed in this [thread](https://groups.google.com/forum/#!topic/xarray/TOX5BIc08WA)
 
### Purpose
 - illustrate the impact of lock parameter on loading time with xarray


## Imported modules 

In [1]:
#- import
import numpy as np
from netCDF4 import Dataset
import dask.array as da
import xarray as xr

In [2]:
#- print versions
import dask 
print('dask version : ' + dask.__version__)
print('xarray version : ' + xr.__version__)

dask version : 0.8.1
xarray version : 0.7.2


## Files, chunks and array definition methods

In [3]:
#- Medium size 2D dataset from NATL60 
#  array shape is (1,3454,5422)
file_2d_gridt = "/Users/lesommer/data/NATL60/NATL60-MJM155-S/1d/2008/NATL60-MJM155_y2008m01.1d_BUOYANCYFLX.nc"
varname_2d = "vosigma0"

# chunks
chunks2d = (1727,2711)
xr_chunks2d = {'x': chunks2d[-1], 'y': chunks2d[-2]}

In [4]:
#- dask from netcdf :  dachk
def load_dachk(filename,varname,chunks,it=0.):
    d = Dataset(filename).variables[varname]
    array = da.from_array(d, chunks=(1,)+ chunks)[it]
    return array 

#- xarray with chunks :  xachk
def load_xachk(filename,varname,xr_chunks,it=None,decode_cf=True,engine='netcdf4',lock=False):
    ds = xr.open_dataset(filename,chunks=xr_chunks,decode_cf=decode_cf,engine=engine,lock=lock)
    if it is None:
       array = ds.variables[varname]
    else:
       array = ds.variables[varname][it]
    return array.chunk()

## Native dask array

In [5]:
dda = load_dachk(file_2d_gridt,varname_2d,chunks2d,it=0)
dda

dask.array<getitem..., shape=(3454, 5422), dtype=float32, chunksize=(1727, 2711)>

In [6]:
%timeit va = np.asarray(dda[1000,1000])
np.asarray(dda[1000,1000])

1000 loops, best of 3: 1.38 ms per loop


array(26.201061248779297, dtype=float32)

## xarray dask array (lock=True)

In [7]:
ddx = load_xachk(file_2d_gridt,varname_2d,xr_chunks2d,it=0,lock=True).data
ddx

dask.array<getitem..., shape=(3454, 5422), dtype=float64, chunksize=(1727, 2711)>

In [8]:
%timeit vx = np.asarray(ddx[1000,1000])
np.asarray(ddx[1000,1000])

1 loop, best of 3: 1.87 s per loop


array(26.201061248779297)

## xarray dask array (lock=False)

In [9]:
ddx = load_xachk(file_2d_gridt,varname_2d,xr_chunks2d,it=0,lock=False).data
ddx

dask.array<getitem..., shape=(3454, 5422), dtype=float64, chunksize=(1727, 2711)>

In [10]:
%timeit vx = np.asarray(ddx[1000,1000])
np.asarray(ddx[1000,1000])

1000 loops, best of 3: 1.46 ms per loop


array(26.201061248779297)

**conclusion** : in this particular example, lock=True slows down loading a value by a factor 1000...