# Main run script

This script contains the main procedure to calculate global root zone storage capacities. 

### 1. Getting started
First, import all the required packages.

In [1]:
# import packages
import glob
from pathlib import Path
import os
import numpy as np
from datetime import datetime
from datetime import timedelta
import pandas as pd
import calendar

Here we import all the python functions defined in the scripts *f_catch_characteristics.py* and *f_preprocess_discharge.py*.

In [2]:
# import python functions
from f_catch_characteristics import *
from f_preprocess_discharge import *
from f_sr_calculation import *

### 2. Define working directory
Here we define the working directory, where all the scripts and data are saved. Make sure that you generate within this working directory the following subdirectories with the data:\
/work_dir/data/forcing/*netcdf forcing files*\
/work_dir/data/shapes/*catchment shapefiles*\
/work_dir/data/gsim_discharge/*gsim discharge timeseries*


In [3]:
# define your working directory
work_dir=Path("/work/users/vanoorschot/fransje/scripts/GLOBAL_SR/global_sr_module")

Here we create the output directory inside your working directory. In the remainder of this module, the same command will be used regularly to create directories.

In [4]:
# make output directory
if not os.path.exists(f'{work_dir}/output'):
    os.makedirs(f'{work_dir}/output')

### 3. Make lists of catchment IDs
The module calculates catchment root zone storage capacities for a large sample of catchments. Here we save the catchment names in a .txt file, for later use in the scripts.

In [5]:
# list the filenames of catchment shapefiles
shape_dir = Path(f'{work_dir}/data/shapes/')
shapefiles = glob.glob(f"{shape_dir}/*shp")

# make an empty list
catch_id_list = []

# loop over the catchment shapefiles, extract the catchment id and store in the empty list
for i in shapefiles:
    catch_id_list.append(Path(i).name.split('.')[0])
    
# save the catchment id list in your output directory
np.savetxt(f'{work_dir}/output/catch_id_list.txt',catch_id_list,fmt='%s')

### 4. Preprocess GSIM discharge data

The GSIM yearly discharge timeseries are stored in *.year* files. A detailed explanation of the column names is provided in Table 3 and 4 in https://essd.copernicus.org/articles/10/787/2018/. Here we preprocess these data into readable *.csv* files for each catchment. The preprocessing function *preprocess_gsim_discharge* is defined in the file *f_preprocess_discharge.py*. With this function we generate for each catchment a file with the yearly discharge timeseries and a file with the specifications of the catchment.

In [6]:
# make output directories
if not os.path.exists(f'{work_dir}/output/discharge/timeseries'):
    os.makedirs(f'{work_dir}/output/discharge/timeseries')
    
if not os.path.exists(f'{work_dir}/output/discharge/characteristics'):
    os.makedirs(f'{work_dir}/output/discharge/characteristics')

In [7]:
# define folder with discharge timeseries data
fol_in = f'{work_dir}/data/gsim_discharge/'

# define output folder
fol_out = f'{work_dir}/output/discharge/'

# run preprocess_gsim_discharge function (defined in f_preprocess_discharge.py) for all catchments in catch_id_list
for catch_id in catch_id_list:
    preprocess_gsim_discharge(catch_id, fol_in, fol_out)

### 5. From gridded data to catchment timeseries
For this step go to the notebook *run_script_grid_to_catchments*. This part needs a different python environment and is therefore run in another notebook. The output data of this script can be found in *work_dir/output/forcing_timeseries*.

### 6. Google earth engine for catchment characteristics
For this step go to the notebook *run_script_earthengine*. This part needs a different python environment and is therefore run in another notebook. The output data of this script can be found in *work_dir/output/earth_engine_timeseries*.

### 7. Calculate root zone storage capacity
Here we calculate the catchment root zone storage capacity (Sr) based on catchment water balances. First, catchment root zone storage deficits are computed as the cumulative difference between P and Et (transpiration). Second, the Sr is then calculated based on an extreme value analysis of the storage deficits. A detailed description of this method can be found here xxxxxx.

Here we use the *run_sd_calculation* and *run_sr_calculation* functions from the *f_sr_calculation* file. The output of both storage deficit and Sr calculations are saved in your *work_dir/output/sr_calculation*.


In [8]:
# make output directories
if not os.path.exists(f'{work_dir}/output/sr_calculation/sd_catchments'):
    os.makedirs(f'{work_dir}/output/sr_calculation/sd_catchments')
    
if not os.path.exists(f'{work_dir}/output/sr_calculation/sr_catchments'):
    os.makedirs(f'{work_dir}/output/sr_calculation/sr_catchments')

Calculate storage deficits using the *run_sd_calculation* function from *f_sr_calculation*

In [9]:
# define directories
pep_dir = f'{work_dir}/output/forcing_timeseries/processed/daily'
q_dir = f'{work_dir}/output/discharge/timeseries'
out_dir = f'{work_dir}/output/sr_calculation/sd_catchments'

# maximum interception capacity
Si_max=2.5

# run sd calculation for all catchments in catch_id_list
for catch_id in catch_id_list:
    run_sd_calculation(catch_id, pep_dir,q_dir,out_dir,Si_max)

Calculate Sr using the *run_sr_calculation* function from *f_sr_calculation*

In [10]:
# define directories
sd_dir = f'{work_dir}/output/sr_calculation/sd_catchments'
out_dir = f'{work_dir}/output/sr_calculation/sr_catchments'

# define return periods
RP = [2,3,5,10,20,30,40,50,60]

# run sr calculation for all catchments in catch_id_list
for catch_id in catch_id_list:
    run_sr_calculation(catch_id,RP,sd_dir,out_dir)


### 8. Catchment descriptor variables
For the global root zone storage capacity estimation, we need to calculate catchment descriptor variables. These descriptors can be climatological variables (e.g. mean precipitation (p_mean); seasonality of precipitation (si_p); timelag between maximum P and Ep (phi)) or landscape variables (e.g. mean treecover (tc); mean elevation (h_mean)). A detailed list of all the descriptors considered is provided here xxxxx.\
To calculate the catchment descriptor variables we use the *catch_characteristics* function from the *f_catch_characteristics.py* file. In this function you specify the variables of interest, the catchment ID and your in- and output folders. Then, based on all the timeseries you have generated in the preceding codes it will return a table with the catchment descriptor variables for all your catchments (that is saved as csv in your *work_dir/catchment_characteristics.csv*).

In [11]:
# define in and output folder
fol_in=f'{work_dir}/output/'
fol_out=f'{work_dir}/output/'

# define variables of interest
var=['p_mean','ep_mean','q_mean','t_mean','ai','si_p','si_ep','phi','tc','ntc','nonveg']

# run catch_characteristics (defined in f_catch_characteristics.py) for the catchments in your catch_id_list
catch_characteristics(var, catch_id_list, fol_in, fol_out)

Unnamed: 0,p_mean,ep_mean,q_mean,t_mean,ai,si_p,si_ep,phi,tc,ntc,nonveg
br_0000495,5.985058,3.635537,3.21034,27.17695,1.646265,0.610597,0.051158,6,40.389229,47.804183,11.806588
fr_0000326,3.540341,2.105863,1.701488,10.755472,1.681183,0.36535,0.303236,6,28.631765,56.481887,14.886349
us_0002247,4.276623,2.636243,3.9137,16.772313,1.622242,0.179611,0.364593,6,39.580416,49.29533,11.124253
