## Lazy evaluation  

Under the hood nchack relies mostly on CDO to carry out the specified manipulation of netcdf files. Each time CDO is called a new temporary file is generated. This has the potential to result in slower than necessary processing chains, as IO takes up far too much time.

I will demonstrate this using a netcdf file os sea surface temperature. To download the file we can just use wget:

In [1]:
import nchack as nc
import warnings
warnings.filterwarnings('ignore')
from IPython.display import clear_output
! wget ftp://ftp.cdc.noaa.gov/Datasets/COBE2/sst.mon.ltm.1981-2010.nc
clear_output()

We can then set up the tracker file which we will use for manipulating the SST climatology.

In [2]:
ff =  "sst.mon.ltm.1981-2010.nc"
tracker = nc.NCTracker(ff)

Now, let's select the variable sst, clip the file to the northern hemisphere, calculate the mean value in each grid cell for the first half of the year, and then calculate the spatial mean.

In [3]:
tracker.select_variables("sst")
tracker.clip(lat = [0,90])
tracker.select_months(list(range(1,7)))
tracker.mean()
tracker.spatial_mean()

The tracker's history is as follows:

In [4]:
tracker.history

['cdo -s -selname,sst sst.mon.ltm.1981-2010.nc /tmp/nchackknphvysanchacknchackknphvysanchacktmpsc5w3q07.nc',
 'cdo -s -sellonlatbox,-180,180,0,90 /tmp/nchackknphvysanchacknchackknphvysanchacktmpsc5w3q07.nc /tmp/nchackknphvysanchacknchackknphvysanchacktmp9oawgs3c.nc',
 'cdo -s -selmonth,1,2,3,4,5,6 /tmp/nchackknphvysanchacknchackknphvysanchacktmp9oawgs3c.nc /tmp/nchackknphvysanchacknchackknphvysanchacktmp623fx8jm.nc',
 'cdo -s -timmean /tmp/nchackknphvysanchacknchackknphvysanchacktmp623fx8jm.nc /tmp/nchackknphvysanchacknchackknphvysanchacktmp66dscls5.nc',
 'cdo -s -fldmean /tmp/nchackknphvysanchacknchackknphvysanchacktmp66dscls5.nc /tmp/nchackknphvysanchacknchackknphvysanchacktmpebhtjmsp.nc']

In total, there are 5 operations, with temporary files created each time. However, we only want to generate one temporary file. So, can we do that? Yes, thanks to CDO's method chaining ability.
If we want to utilize this we need to set the tracker's evaluation to lazy. Once this is done nchack will only evaluate things either when it needs to, e.g. you call a method that cannot possibly be chained, or if you release it.
This works as follows:

In [5]:
ff =  "sst.mon.ltm.1981-2010.nc"
tracker = nc.NCTracker(ff)
tracker.lazy()
tracker.select_variables("sst")
tracker.clip(lat = [0,90])
tracker.select_months(list(range(1,7)))
tracker.mean()
tracker.spatial_mean()
tracker.release()

We can now see that the history is much cleaner, with only one command.

In [6]:
tracker.history

['cdo -s -L -fldmean -timmean -selmonth,1,2,3,4,5,6 -sellonlatbox,-180,180,0,90 -selname,sst sst.mon.ltm.1981-2010.nc /tmp/nchackknphvysanchacknchackknphvysanchacktmpxgk0eazy.nc']

How does this impact run time? Let's time the original, unchained method.

In [7]:
%%time
ff =  "sst.mon.ltm.1981-2010.nc"
tracker = nc.NCTracker(ff)
tracker.select_variables("sst")
tracker.clip(lat = [0,90])
tracker.select_months(list(range(1,7)))
tracker.mean()
tracker.spatial_mean()

CPU times: user 11.6 ms, sys: 22.6 ms, total: 34.2 ms
Wall time: 538 ms


In [8]:
%%time
ff =  "sst.mon.ltm.1981-2010.nc"
tracker = nc.NCTracker(ff)
tracker.lazy()
tracker.select_variables("sst")
tracker.clip(lat = [0,90])
tracker.select_months(list(range(1,7)))
tracker.mean()
tracker.spatial_mean()
tracker.release()

CPU times: user 4.21 ms, sys: 5.16 ms, total: 9.36 ms
Wall time: 113 ms


This was almost 4 times faster. Exact speed improvements, will of course depend on specific IO requirements, and some times using lazy evaluation will make negligible impact, but in others can make code over 10 times fasteExact speed improvements, will of course depend on specific IO requirements, and some times using lazy evaluation will make negligible impact, but in others can make code over 10 times faster.
