# Study Configuration

range_driver allows users can configure settings to match their own study conditions. For example, users can configure range_driver to fetch specific environmental data, aggregate detection events by a specific time scale, set appropriate geospatial boundaries, etc. These settings are stored in configuration files which can be saved and shared independently of any analysis code.

## Intro to Configuration Files
In range_driver, configuration options are organized in YAML files &mdash; text files that contain key/value pairs of strings, numbers, and lists. Keys can be nested via indentation, to organize information hierarchically. 

The example YAML file below tells range_driver [1] detections should be aggregated using a 60 minute bin size, [2] when (Jan 2021) and where (off the west coast of Vancouver Island) the study was conducted, and [3] we will fetch environmental data for three variables from [ERA5](https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era5).

```yaml
settings:
    time_bin_length: 60Min
bounds:
    start: 2021-01-01
    end: 2021-02-01
    north: 49
    south: 48
    east: -126
    west: -127
data:
    sources: 
        'load_waveheight': 'era5'
        'load_wind_u': 'era5'
        'load_wind_v': 'era5'
        
```

To have these options take effect in range_driver, you'll need to (1) load the configuration into Python, (2) prepare it for use with range_driver, and (3) use the configuration object during the instantiation of the Detections object (this last step is covered in-depth in the [load detections](LoadDetections.ipynb) tutorial).  

In [None]:
# Load raw configuration
config = rd.yload(rd.load_file("path/to/example_config.yaml"))

# Prepare (and update) config for range_driver
rd.prepare_config(config)

# Pass config into the Detections object instantiation
dets = rd.Detections(config, do_processing=True)

## What can be Configured?
There are X top-level keys that can be configured.

|                         |                         |
|-------------------------|-------------------------|
| [`reader`](#Reader)     | Reader options to use for loading detection data & metadata |
| [`data`](#Data)         | |
| [`bounds`](#Bounds)     | The geospatial & temporal boundaries of the study|
| [`file_map`](#File-Map) | Locations of any custom environmental data|
| [`view`](#View)         | |
| [`settings`](#Settings) | |

### Reader

There are currently two readers supported by range_driver -- `otn` and `nsog`. Both readers require the location of key data files to work properly. These readers take in slightly different files. Each are described below. 

#### `otn`
The `otn` reader is range_driver's primary reader. To configure the `otn` reader, you need to provide the names of three files &mdash; `detections_csv`, `otn_metadata`, and `vendor_tag_specs` &mdash; and the path to the directory containing these three files (`data_dir`). To learn more about the data and format contained within these three files, checkout the [Data Setup]() tutorial. 

In the code snippet below you can see an example of the otn reader configuration. This is the configuration used for the [Mahone Bay]() case study. 

```yaml
reader:
  otn:
    data_dir: '{repo_path}/data/MahoneBay2016/'
    detections_csv: range_test_raw.csv
    otn_metadata: metadata-from-initial-range-test.xls
    vendor_tag_specs: tag-summary-mahone-bay-range-test.xls
```

#### `nsog`
The `nsog` reader is a custom reader created for the [Northern Strait of Georgia]() case study. It is a slight variation on the `otn` reader where only the data directory and two file names are provided &mdash; `detections_csv` and `vender_tag_specs`. Read more about the requirements for data files provided to the `nsog` reader in the [Data Setup]() tutorial. 

The code snippet below shows how the `nsog` reader is configured for the [Northern Straight of Georgia]() case study. 

```yaml
reader:
  nsog:
    data_dir: '{repo_path}/data/NSOG_Jan2018'
    detections_csv: nsog_detections_jan2018.csv
    vendor_tag_specs: nsog_tags.xlsx
```

#### Adding your own reader
If none of the current reange_driver readers work for your study, you can create your own custom reader. To do this, you will need to add code to two places in [`data_prep/__init__.py`](). 

First, you will need to add a custom function to [`data_prep/__init__.py`](). This function needs to return a dataframe containing detections data and a dictionary containing metadata. Any required arguments to this function should be provided in the configuration file (see example below). 
```yaml
reader:
  MY_CUSTOM_READER:
    custom_input1: VALUE1
    custom_input2: VALUE2
```

Second, you will need to add an additional `elif` clause to the `read_via_config()` function. This clause should check if your custom reader has been specified in the configuration file. If so, it should make a call to the custom function you defined earlier. 

```python
elif 'MY_CUSTOM_READER' in rdconf.keys():
    return read_custom_data(**rdconf.MY_CUSTOM_READER)
```

### Data
**TO BE WRITTEN**

```yaml
data:
  sources:
    load_wavedir: 'era5'
    load_waveheight: 'era5'
    load_waveperiod: 'era5'
    load_wind_uv: 'era5'
    load_wind_u: 'era5'
    load_wind_v: 'era5'
  tidal:
    tidal_times_ods: '{repo_path}/data/MahoneBay2016/TidalTimes/Extracted_tidal_times_for_Halifax_2016_2sheets.ods'
    tidal_times_output_csv: "{repo_path}/data/MahoneBay2016/TidalTimes/output_tidal_times_for_Halifax_2016_flat.csv"
    tidal_interpolation_output_csv: "{repo_path}/data/MahoneBay2016/TidalTimes/output_tidal_times_for_Halifax_2016_5min.csv"
    year: 2016
  calculated_columns:
  - water_vel
  - water_vel_bottom
```

### Bounds
**TO BE WRITTEN**

``` yaml
bounds:
  description: Mahone Bay, NS
  start: 2016-03-09
  end: 2016-03-11
  lat_center: 44.5541333
  lon_center: -64.17682
  n_offset: 0.2
  s_offset: 1
  e_offset: 1
  w_offset: 1
```


### File Map
**TO BE WRITTEN**
```yaml
  data_dir: '{repo_path}/data/MahoneBay2016/HYCOM'
  salinity_bottom: bottom_sal_20160309_20160404_expt_56.3.nc
  salinity: column_sal_20160309_20160404_expt_56.3.nc
  water_v: column_v_vel_20160309_20160404_expt_56.3.nc
  water_v_bottom: bottom_v_vel_20160309_20160404_expt_56.3.nc
  surf_el: surf_el_20160309_20160404_expt_56.3.nc
  water_temp_bottom: bottom_temp_20160309_20160404_expt_56.3.nc
  water_temp: column_temp_20160309_20160404_expt_56.3.nc
  water_u_bottom: bottom_u_vel_20160309_20160404_expt_56.3.nc
  water_u: column_u_vel_20160309_20160404_expt_56.3.nc
```

### View

The `view` key is used to configure reporting and plotting options. Below, there is an example of how the `view` options can be configured. Each of the available keys within `view` are explained below. 


```yaml
view:
  show_dr_plots: true
  tidal: false
  colnames:
    water_vel: 'water velocity'
    water_vel_bottom: 'water velocity bottom'
  column: 'water_vel'
  params:
    t2bin_stepsize: 0.05
    t2bin_max: 2
    min_detections: 100
    scatter_alpha: .5
  rcParams:
    figure.figsize: [16, 5]
    font.size: 11
    figure.max_open_warning: 50
```

#### `show_dr_plots`
This is a boolean (true/false) that specifies whether detection rate plots should be shown. These plots show how the detection rate fluctuates over the study period for each receiver/transmitter group and is a useful report for data screening. 
```yaml
view:
  show_dr_plots: true
```

####  `tidal`
This is a boolean (true/false) that specifies whether a tidal report should be created when `report_tidal()` is used. If there is no tidal data provided, then this should be false.

```yaml
view:
  tidal: false
```

#### `columns`
**Is this option used in the range_driver code??**

#### `colnames` 
This allows you to specify alternate column names to be used in report text. For example, the YAML configuration below would use "water velocity" in any reports that reference the `water_vel` column. 
```yaml
view:
  colnames:
    water_vel: 'water velocity'
    water_vel_bottom: 'water velocity bottom'
```

#### `column`
Specify the column to be further analyzed in the report of all group plots. For example, the below configuration  indicates that the water velocity column will be singled out and compared against detection performance in the reports.

```yaml
view:
  column: 'water_vel'
```

#### `params`
Specify other parameters specific to plotting and reporting. 
**Review these parameters.** There are a few (like t2bin_stepsize, and t2bin_max) that don't seem to change and perhaps aren't very useful. Also if we do keep them, they need to be much better explained in this documentation. 

* `t2bin_stepsize`: ... 
* `t2bin_max`: ...
* `min_detections`: The minimum number of detections a receiver/transmitter pair must have to be included in reports. 
* `scatter_alpha`: The alpha value (transparency) to use for points in scatter plots (0=opaque, 1=translucent). 

```yaml
view:
  params:
    t2bin_stepsize: 0.05
    t2bin_max: 2
    min_detections: 100 
    scatter_alpha: .5
```
    
#### `rcParams`
Allows customizations to matplotlib rcParams. In matplotlib, rcParams are used to customize the look and feel of plots. You can specify the same parameters in range_driver using the configuration options. For more information on what customizations are available, see the [matplotlib documentation](https://matplotlib.org/stable/tutorials/introductory/customizing.html). 

```yaml
view:
  rcParams:
    figure.figsize: [16, 5]
    font.size: 11
    figure.max_open_warning: 50
```


### Settings
There are a couple of other settings that can specified using the `settings` key. ... 

```yaml
settings:
  otn_transmitter_patch: false
  auto_dr: true
  time_bin_length: 60Min
  show_details: false
  
```

## Using Multiple Configuration Files
Often, it will be helpful to re-use configurations between multiple studies. This can be done in two ways. 

The first option is to duplicate & update the configuration file from a previous study. If you want to update an option across multiple studies, you will need to update each study's configuration file accordingly. 

The second option (which we recommend for all but the simplest setups) is to use multiple, layered configuration files. One configuration file (e.g. `base_config.yaml`) holds configuration options shared across studies. Then, a configuration file is created to hold the study-specific configuration options for each individual study. Using layered configuration files means that you can make changes across multiple studies by updating a single file, while still allowing study-specific option specification. 

For example, suppose you were studying the differences in detection performance between two different regions, using the same time period & environmental data sources. You'd want to standardize some of the configuration options across studies (time, data sources, etc), but you'd also need to specify different geospatial boundaries. In this case, you could use the following 3 configuration files. 

1. You'd have a `base_config.yml` to specify the shared configuration options. 

    ```yaml
    settings:
      time_bin_length: 60Min
    bounds:
      top: 0
      bottom: 0
      start: 2021-01-01
      end: 2021-04-30
    data:
      sources:
        load_wavedir: 'era5'
        load_waveheight: 'era5'
        load_waveperiod: 'era5'
        load_wind_uv: 'era5'
        load_wind_u: 'era5'
        load_wind_v: 'era5'
    ```

2. You'd have a `comox_config.yml` file to specify the bounds for the first study location (in this case near Comox, BC).  
    ```yaml
    bounds:
      north: 49.9
      south: 49.4
      east: -124.5
      west: -125
    ```

3. And finally, you'd have a `port_renfrew_config.yml` file to specify the bounds for the second study location (in this case near Port Renfrew, BC).  

    ```yaml
    bounds:
      north: 48.7
      south: 48.2
      east: -124.25
      west: -124.75
    ```

Then, you can load & prepare the `config` dictionary using nearly the same steps as when using a single configuration file. The only difference is that you need to load both configurations and then merge them together before you can prepare them using `prepare_config()`.

In [3]:
import range_driver as rd

# Load each configuration file individually
base_config = rd.yload(rd.load_file("configs/base_config.yaml"))
study_config = rd.yload(rd.load_file("configs/study_config.yaml"))

# Combine configurations (study_config will overwrite any options that are also set in base_config)
config = rd.merge_dicts(base_config, study_config)

# Use the merged configuration to prepare config
rd.prepare_config(config)

# See the prepared configuration file
import pprint as pp
pp.pprint(config)

## Choosing Study Boundaries
**TO BE WRITTEN**

Introduce the use of mapping to see the bounds you've chosen and what verifying the boundaries once you've retreived data (to ensure the bounds don't include too many or two few data node locations). 