## Getting started
First, import all the required packages. Make sure you have installed the ewatercycle environment (https://ewatercycle.readthedocs.io/en/latest/system_setup.html#conda-environment), and activated this environment by typing *'conda activate ewatercycle'* in the conda shell.

In [1]:
# import packages
import glob
from pathlib import Path
import os
import geopandas as gpd
import iris
import iris.pandas
import numpy as np
from esmvalcore import preprocessor
from iris.coords import DimCoord
from iris.cube import Cube
from pathos.threading import ThreadPool as Pool
from datetime import datetime
from datetime import timedelta
import pandas as pd

Here we import all the python functions defined in the scripts *f_grid_to_catchments.py*, *f_postprocess_timeseries.py*, *f_catch_characteristics.py* and *f_preprocess_discharge.py*.

In [3]:
# import python functions
from f_grid_to_catchments import *
from f_postprocess_timeseries import *
from f_catch_characteristics import *
from f_preprocess_discharge import *

## Define working directory
Here we define the working directory, where all the scripts and data are saved. Make sure that you generate within this working directory the following subdirectories with the data:\
/work_dir/data/forcing/*netcdf forcing files*\
/work_dir/data/shapes/*catchment shapefiles*\
/work_dir/data/gsim_discharge/*gsim discharge timeseries*


In [6]:
# define your working directory
work_dir=Path("/work/users/vanoorschot/fransje/scripts/GLOBAL_SR/global_sr_module")

Here we create the output directory inside your working directory. In the remainder of this module, the same command will be used regularly to create directories.

In [7]:
# make output directory
if not os.path.exists(f'{work_dir}/output'):
    os.makedirs(f'{work_dir}/output')

## Make lists of catchment IDs
The module calculates catchment root zone storage capacities for a large sample of catchments. Here we save the catchment names in a .txt file, for later use in the scripts.

In [8]:
# list the filenames of catchment shapefiles
shape_dir = Path(f'{work_dir}/data/shapes/')
shapefiles = glob.glob(f"{shape_dir}/*shp")

# make an empty list
catch_id_list = []

# loop over the catchment shapefiles, extract the catchment id and store in the empty list
for i in shapefiles:
    catch_id_list.append(Path(i).name.split('.')[0])
    
# save the catchment id list in your output directory
np.savetxt(f'{work_dir}/output/catch_id_list.txt',catch_id_list,fmt='%s')

## Preprocess GSIM discharge data

The GSIM yearly discharge timeseries are stored in *.year* files. A detailed explanation of the column names is provided in Table 3 and 4 in https://essd.copernicus.org/articles/10/787/2018/. Here we preprocess these data into readable *.csv* files for each catchment. The preprocessing function *preprocess_gsim_discharge* is defined in the file *f_preprocess_discharge.py*. With this function we generate for each catchment a file with the yearly discharge timeseries and a file with the specifications of the catchment.

In [9]:
# make output directories
if not os.path.exists(f'{work_dir}/output/discharge/timeseries'):
    os.makedirs(f'{work_dir}/output/discharge/timeseries')
    
if not os.path.exists(f'{work_dir}/output/discharge/characteristics'):
    os.makedirs(f'{work_dir}/output/discharge/characteristics')

In [10]:
# define folder with discharge timeseries data
fol_in = f'{work_dir}/data/gsim_discharge/'

# define output folder
fol_out = f'{work_dir}/output/discharge/'

# run preprocess_gsim_discharge function (defined in f_preprocess_discharge.py) for all catchments in catch_id_list
for catch_id in catch_id_list:
    preprocess_gsim_discharge(catch_id, fol_in, fol_out)

## From gridded data to catchment timeseries
We don't have data on precipitation, potential evaporation and temperature at the catchment scale. Therefore, we use global gridded products of these parameters (there are a lot of possibilities which data to use). For doing analyses at the catchment scale, we need to convert these gridded products into catchment timeseries.
To do this, we calculate the mean parameter values of the gridcells that fall within the catchment shapes. The procedure is defined by the functions in the *f_grid_to_catchments.py* file.\

In [17]:
# make output directories
if not os.path.exists(f'{work_dir}/output/forcing_timeseries/raw'):
    os.makedirs(f'{work_dir}/output/forcing_timeseries/raw')
if not os.path.exists(f'{work_dir}/output/forcing_timeseries/processed'):
    os.makedirs(f'{work_dir}/output/forcing_timeseries/processed')

In [18]:
# define directories 
SHAPE_DIR = Path(f'{work_dir}/data/shapes/') # dir of shapefiles
NC4_DIR = Path(f'{work_dir}/data/forcing/') # dir of netcdf forcing files
OUT_DIR = Path(f'{work_dir}/output/forcing_timeseries/raw') # output dir

The conversion from grid to catchment is computationally expensive. Therefore, we run this conversion for all catchments in parallel using the python function *pathos threadpool* (https://pathos.readthedocs.io/en/latest/pathos.html#module-pathos.threading).
With the *construct_lists_for_parallel_function* function from *f_grid_to_catchments.py* we create lists that contain all combinations of shapefile, netcdf-file and output-directory. OPERATOR LIST NOT NEEDED?. These lists are the input for the *run_function_parallel* function from *f_grid_to_catchments.py*, which returns timeseries of the catchment mean values of precipitation (P), potential evaporation (Ep) and  temperature (T). 

In [15]:
# Construct lists for parallel run
(shapefile_list, netcdf_list, operator_list, output_dir_list) = construct_lists_for_parallel_function(NC4_DIR, SHAPE_DIR, OUT_DIR)

In [16]:
# run function parallel
run_function_parallel(shapefile_list, netcdf_list, operator_list, output_dir_list)

[None, None, None, None, None, None, None, None, None]

The output of the *run_function_parallel* function contains daily timeseries of P, Ep and T for all catchments. Here we postprocess these data to get dataframes containing Ep, P and T together with daily, monthly and yearly timeseries, climatology and mean values (stored as *csv* files). This postprocessing is done in the *process_forcing_timeseries* function defined in *f_postprocess_timeseries.py*.

In [20]:
# define input directory
fol_in=f'{work_dir}/output/forcing_timeseries/raw'
# define output directory
fol_out=f'{work_dir}/output/forcing_timeseries/processed'

# define variables
var = ['Ep','P','T']

# run process_forcing_timeseries (defined in f_postprocess_timeseries.py) for all catchments in catch_id_list
for catch_id in catch_id_list:
    process_forcing_timeseries(catch_id,fol_in,fol_out,var)

## Catchment descriptor variables
For the global root zone storage capacity estimation, we need to calculate catchment descriptor variables. These descriptors can be climatological variables (e.g. mean precipitation (p_mean); seasonality of precipitation (si_p); timelag between maximum P and Ep (phi)) or landscape variables (e.g. mean treecover (tc); mean elevation (h_mean)). A detailed list of all the descriptors considered is provided here xxxxx.\
To calculate the catchment descriptor variables we use the *catch_characteristics* function from the *f_catch_characteristics.py* file. In this function you specify the variables of interest, the catchment ID and your in- and output folders. Then, based on all the timeseries you have generated in the preceding codes it will return a table with the catchment descriptor variables for all your catchments (that is saved as csv in your *work_dir/catchment_characteristics.csv*).

In [25]:
# define in and output folder
fol_in=f'{work_dir}/output/'
fol_out=f'{work_dir}/output/'

# define variables of interest
var=['p_mean','ep_mean','q_mean','t_mean','ai','rc','ea_wb','si_p','si_ep','phi','st','tc','ntc','nonveg','start_year','end_year']

# run catch_characteristics (defined in f_catch_characteristics.py) for the catchments in your catch_id_list
catch_characteristics(var, catch_id_list, fol_in, fol_out)

Unnamed: 0,p_mean,ep_mean,q_mean,t_mean,ai,rc,ea_wb,si_p,si_ep,phi,st,tc,ntc,nonveg,start_year,end_year
br_0000495,5.985058,3.635537,3.21034,27.17695,1.646265,,,0.610597,0.051158,6,,,,,,
fr_0000326,3.540341,2.105863,1.701488,10.755472,1.681183,,,0.36535,0.303236,6,,,,,,
us_0002247,4.276623,2.636243,3.9137,16.772313,1.622242,,,0.179611,0.364593,6,,,,,,
