# Interactive Moving Window Kriging Pipeline
---
The preprocessing pipeline is executed in the following sequence. It assumes that masks have been generated. If a custom mask is required, work through the `mask.ipynb` file to see how the default masks were generated.

1. Place netCDF models into `climpyrical/data/model_inputs`. Ensemble models must have:
    * lat, lon, rlat, rlon and a 2D data field variable
2. Place station files into `climpyrical/data/station_inputs`. Input stations must have:
    * A data column with the design value of interest in the same units as the ensemble model. Note that the units need to be placed in parentheses next to the data variable name. i.e "RL50 (kPa)" or "HDD (degC-day)" are examples of valid names
    * latitude or longitude columns
    * Additional columns, like province name, elevation, and station name are optional
3. The data produced in the pipeline will go in various subdirectories of `climpyrical/data/results/` using the PCIC design value naming standards (outlined below)
    * figures will be in `climpyrical/data/results/figures/`
    * tables will be in `climpyrical/data/results/TableC2/`
    * netCDF files in `climpyrical/data/results/netcdf/`
    * intermediate notebooks for troubleshooting will be in `climpyrical/data/results/intermediate/`
    * preprocessed statations and models are in `climpyrical/data/results/intermediate/` subdirectories

```
climpyrical/data/results
├── netcdf
│   └── 
├── figures
│   ├── 
├── intermediate
│   ├── notebooks
│   │   ├── model_log_{design value}.ipynb
│   │   ├── plotting_log_{design value}.ipynb
│   │   ├── MWOrK_log_{design value}.ipynb
│   │   ├── station_log_{design value}.ipynb
│   ├── preprocessed_netcdf
│   │   ├── {design value}.nc
│   └── preprocessed_stations
│       └── {design value}.csv
└── TableC2
     └── {design_vale}_TableC2.csv
```

# Executing the pipeline
---

`pipeline.ipynb` is the central processing code that calls the other scripts to perform the kriging. The best way to run the pipeline is by opening pipeline.ipynb with Jupyter, and editing the input config file. `pipeline_parallel.ipynb` achieves the same thing as `pipeline.ipynb` but parallelizes the design values in the config file and is faster if there are more than one design values in the config.

## With Jupyter
Run each cell in the pipeline with `shift+enter` within Jupyter

## With the CLI (Command Line Interface)
`pipeline.ipynb` and `pipeline_parallel.ipynb` take `.yml` files as configurations. If the file the user has configured is in `config.yml` then the appropriate method of running the pipeline from the command line is
```bash
papermill pipeline.ipynb pipeline_log.ipynb -p config_yml config.yml
```
This will produce a version of `pipeline.ipynb` (`pipeline_log.ipynb`) that has the cells executed.

## Configuration Example
---
Here is an example configuration for design values RL50 and mean RH (%).

Place into a `.yml` file, and change `config_yml` variable in the `pipeline.ipynb`
```yaml
# Parameterize the pipeline
# The pipeline will iterate through each parent tree
# in dvs and provide the associated parameters

# Which notebooks to use in the pipeline
steps: [
"preprocess_model.ipynb", 
"stations.ipynb", 
"MWOrK.ipynb", 
"plots.ipynb", 
"nbcc_stations.ipynb", 
"combine_tables.ipynb"
]

n_jobs: 2

# To be placed in climpyrical/
paths:
    output_notebook_path: /data/results/intermediate/notebooks/
    preprocessed_model_path: /data/results/intermediate/preprocessed_netcdf/
    preprocessed_stations_path: /data/results/intermediate/preprocessed_stations/
    output_reconstruction_path: /data/results/netcdf/
    output_tables_path: /data/results/TableC2/
    output_figure_path: /data/results/figures/
    mask_path: /data/masks/canada_mask_rp.nc
    north_mask_path: /data/masks/canada_mask_north_rp.nc
    nbcc_loc_path: /data/station_inputs/NBCC_2020_new_coords.xlsm

nbcc_correction: True
dvs:
    RL50:
        station_dv: "RL50 (kPa)"
        station_path: /data/station_inputs/sl50_rl50_for_maps.csv
        input_model_path: /data/model_inputs/snw_rain_CanRCM4-LE_ens35_1951-2016_max_rl50_load_ensmean.nc
        medians: 
            value: 0.3
            action: multiply
        fill_glaciers: True
        
    RHann:
        station_dv: "mean RH (%)"
        station_path: /data/station_inputs/rh_annual_mean_10yr_for_maps.csv
        input_model_path: /data/model_inputs/hurs_CanRCM4-LE_ens15_1951-2016_ensmean.nc
        medians:
            value: None
            action: None
        fill_glaciers: True
```

a full example can be found in `config_example.yml` or `config_example_means.yml`.

If a user wishes to run only the preprocessing of the CanRCM4 model and the station processing steps for RL50 and mean RH, simply remove those steps of the pipeline, while keeping the rest of the configuration the same as before.

```yaml
# Which notebooks to use in the pipeline
steps: [
"preprocess_model.ipynb", 
"stations.ipynb", 
]
```

Note however, they are meant to be run in sequence since each notebook produces outputs in `climpyrical/data/results/` that are used by some subsequent notebook. So if it is the first time running the pipeline, then users must be ware that the subsequent data exists. Furthermore, if a user is running a notebook that relies on data existing, be sure that the most up to date data is in `climpyrical/data/results`.