# 004. Using "cdo" to manipulate the data

The **Climate data operators** are a popular tool of command line functions. Lately, a python-bindings became available (https://pypi.org/project/cdo/).

Setup and Documentation for CDO: https://code.mpimet.mpg.de/projects/cdo/wiki/Cdo#Documentation

As we are dealing with monthly files for many variables, we have to
1. Put the monthly files with hourly data together
2. Aggregate the hourly data into daily data (optionally)
2. Merge the variables into one file

CDO provides the following methods:
- `cdo.cat()` concatenates files
- `cdo.dayavg()` averages hourly data into daily data
- `cdo.daysum()` sums hourly values to daily values

You could use these like so:

In [None]:
from cdo import Cdo
import glob
import xarray as xr

cdo = Cdo()
tmp_file = './tmp.nc'

xar = xr.open_mfdataset(glob.glob(path_to_data+'era5_precipitation*.nc'),
                        combine='by_coords')
xar.to_netcdf(tmp_file)
cdo.daysum(input=tmp_file, 
           output=path_to_data+'era5_precip_daysum.nc')
os.remove(tmp_file)

__Note__ that CDO daily aggregations (max/mean) set the timestamps to 23 UTC of each day you are aggregating over.
We needed to shift those times to the whole day to work with the data (subtract 23 hours from the time coordinate).


Within this projects, the CDO command line tools were used from python using `os.system()`. This is maybe not the most elegant solution, but it works and we used it because we were most familiar with this solution. You are free to use the python-bindings as well, there should not be any difference.

The methods reside within the **utils.py** file inside the **/python/aux/** dir and are used to easily preprocess the data.

The following methods are currently implemented:

1. `cdo_daily_means()`: generates daily averages from the input data
2. `cdo_precip_sums()`: generates daily precipitation sums from input data
3. `cdo_clean_precip()`: extracts precipitation vars from input data to new file and removes it from input data
4. `cdo_spatial_cut()`: extracts all of the input data within a specified bounding box to a new file
5. `cdo_merge_time()`: merges all of the input data into a new file on the time dimension

Example calls are listed below. 

### Python-Path
To import python functions from the **./python/aux/** dir, we have to add the main path of the repository to the so called *python-path* of the system. This is done with the following two lines:

In [1]:
import sys
sys.path.append('../../')

### Define the needed variables
All files inside the specified directory which include the specified string are processed.

In [2]:
path_to_data = 'volume/project/data/'

### Execute the methods

For every existing and matching file, the method is executed. For more details check the **utils.py** file.

### 1) cdo_daily_means
loops through the given directory and and executes "cdo dayavg * file_includes * file_out" appends "dayavg" at the end of the filename

In [None]:
from python.aux.utils import cdo_daily_means

In [None]:
incl = 'temperature'

In [None]:
cdo_daily_means(path=path_to_data, file_includes=incl)

### 2) cdo_precip_sums
loops through the given directory and and executes "cdo -b 32 daysum filein.nc fileout.nc" appends "daysum" at the end of the filename

In [None]:
from python.aux.utils import cdo_precip_sums

In [None]:
incl = 'large_scale_precipitation'

In [None]:
cdo_precip_sums(path=path_to_data, file_includes=incl)

### 3) cdo_clean_precip
loops through the given directory and and executes "ncks -v cp,tp filein.nc fileout.nc" or "ncks -x -v cp,tp filein.nc fileout.nc" for all files which contain precip_type in their name and creates new files with the corresponding variables

In [None]:
from python.aux.utils import cdo_clean_precip

In [None]:
cdo_clean_precip(path=path_to_data, precip_type='precipitation')

### 4) cdo_spatial_cut
loops through the given directory and and executes "cdo -sellonlatbox,lonmin,lonmax,latmin,latmax * file_includes * fileout.nc" appends "spatial_cut_*new_file_includes*" at the end of the filename

In [None]:
from python.aux.utils import cdo_spatial_cut

In [7]:
lonmin = 10
lonmax = 20
latmin = 40
latmax = 50
incl = 'temperature'
incl_new = 'temperature_spatial_cut'

In [None]:
cdo_spatial_cut(path=path_to_data, file_includes=incl, new_file_includes=incl_new, lonmin, lonmax, latmin, latmax)

### 5) cdo_merge_time
merges all files including a specified string in their name within the given directory into the specified new file with "cdo mergetime * file_includes * fileout.nc"

In [None]:
from python.aux.utils import cdo_merge_time

In [8]:
incl = 'temperature'
new_filename = 'temperature_YYYYinit-YYYYend.nc'

In [None]:
cdo_merge_time(path=path_to_data, file_includes=incl, new_file=new_filename)