# Helpers :: NWM 'Standard Suite (v1)' Benchmark

These are custom-defined Python objects which bundle common functions to be run against time-series data. 

These statistics adapted from the originals in <https://github.com/USGS-python/hytest-evaluation-workflows/blob/main/gallery/streamflow/02_nwm_benchmark_analysis.ipynb> 

<details>
  <summary>Guide to pre-requisites and learning outcomes...&lt;click to expand&gt;</summary>
  
  <table>
    <tr>
      <td>Pre-Requisites
      <td>To get the most out of this notebook, you should already have an understanding of these topics: 
        <ul>
        <li>pre-req one
        <li>pre-req two
        </ul>
    <tr>
      <td>Expected Results
      <td>At the end of this notebook, you should be able to: 
        <ul>
        <li>outcome one
        <li>outcome two
        </ul>
  </table>
</details>

## NOTE:  
This notebook describes how to use the 'helper' library that we've put together to bundle
the following statistics.  You are not obligated to use it for outside work -- it is a convenience interface to these
standard metrics, meant to make these tutorials easier.  The computation of each is possible using your own code, 
custom code (see the above-mentioned reference notebook), or even external 
libraries ([`hydroeval`](https://github.com/thibhlln/hydroeval) is one such).


## The Metrics:
This suite of metrics describes the NWM benchmark:
| Metric                              | Reference                                                           |
| ----- | ----- |
| Nash-Sutcliffe efficiency (NSE)     | Nash, J. E., & Sutcliffe, J. V. (1970). River flow forecasting through conceptual models part I—A discussion of principles. Journal of hydrology, 10(3), 282-290. https://www.sciencedirect.com/science/article/pii/0022169470902556?via%3Dihub
| Kling-Gupta efficiency (KGE)        | Gupta, H. V., Kling, H., Yilmaz, K. K., & Martinez, G. F. (2009).  Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling. Journal of hydrology, 377(1-2), 80-91. https://www.sciencedirect.com/science/article/pii/S0022169409004843 |
| logNSE                              | Oudin, L., Andréassian, V., Mathevet, T., Perrin, C., & Michel, C. (2006). Dynamic averaging of rainfall‐runoff model simulations from complementary model parameterizations. Water Resources  Research, 42(7).|
| percent bias                        | A measure of the mean tendency of simulated values to be greater or less than associated observed values, units of percent |
| ratio of standard deviation         | standard deviation of simulated values divided by the standard deviation of observed values |
| Pearson Correlation                 | K. Pearson (1896, 1900, 1920)                                       |
| Spearman Correlation                | Charles Spearman (1904, 1910)                                       |
| percent bias in midsegment slope of the flow-duration curve (FDC) between Q20-Q70 | Yilmaz, K. K., Gupta, H. V., & Wagener, T. (2008). A process‐based diagnostic approach to model evaluation: Application to the NWS distributed hydrologic model. Water Resources Research, 44(9).      |
| percent bias in FDC low-segment volume (Q0-Q30) | Yilmaz, K. K., Gupta, H. V., & Wagener, T. (2008). A process‐based diagnostic approach to model evaluation: Application to the NWS distributed hydrologic model. Water Resources Research, 44(9).      |
| percent bias in FDC high-segment volume (Q98-Q100) | Yilmaz, K. K., Gupta, H. V., & Wagener, T. (2008). A process‐based diagnostic approach to model evaluation: Application to the NWS distributed hydrologic model. Water Resources Research, 44(9).      |


This notebook will briefly describe each of the above metrics, and show some results using sample data. The specific code to implement each metric is available in the helper library -- some of the important details of coding these metrics is included as notes in this notebook.  These are included for those who would like to adapt the code or create similar metrics using their own code.

In [None]:
# Get access to helper library
%run ../setup.ipynb
import matplotlib.pyplot as plt


# Loading Data
First thing we need is some sample data... 


In [None]:
import pandas as pd
df = pd.read_csv(r"TestData.csv", index_col='date').dropna()
df

# Import Benchmark Code
Import the specific benchmark you want to use.  We have a few helper benchmarks available (see [this notebook](./xx_Benchmarks_General.ipynb) for more information on how benchmark helpers can be used). 


In [None]:
from HyTEST.benchmarks.NWMStandardSuite import NWMStandardSuite as Benchmark
nwm = Benchmark.from_df(df, 'obs', 'nwm')


These statements imported a symbol (`Benchmark`) from the `HyTEST.benchmarks.NWMStandardSuite` helper library. 

That name/symbol is a Python `Object`, which means it can hold both data and the methods which operate on that data. Because those things all go together logically, we've bundled them in this way. 

Note that the new object is called `nwm`.  We can now use that to access the data, as well as common stats computed over that specific data. 

The `NWMStandardSuite` benchmark is an extension of the interface described in the 'General Benchmark' notebook.  Consult that document for some details on why/how objects are used here instead of static procedures.   This notebook will focus on the ten specific statistics specified in the above table. 


## Notes on each benchmark component:

### NSE
| Metric | Reference |
| ----- | ----- |
| Nash-Sutcliffe efficiency (NSE)     | Nash, J. E., & Sutcliffe, J. V. (1970). River flow forecasting through conceptual models part I—A discussion of principles. Journal of hydrology, 10(3), 282-290. https://www.sciencedirect.com/science/article/pii/0022169470902556?via%3Dihub


In [None]:
# Example: 
nwm.NSE()

**Special note on NSE** &mdash; A component within the calculation of NSE is _variance_ computed over the 
observed values. Different python libraries calculated this in different ways, so some of the details matter
when calculating.  
In particular, `numpy` assumes that `ddof` (Delta Degrees of Freedom) 
is [zero](https://numpy.org/doc/stable/reference/generated/numpy.var.html), while others (notably, `pandas`), 
assumes a `ddof` of [one](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.var.html).  

Without explicit instructions, these two common libraries will return different results for the '_same_' calculation. 
If you should decide to build your own functions involving variance, it will matter how you calculate that value: 
```python
df['obs'].var()  # using pandas
```
will yield a **different** result than
```python
np.var(df['obs']) # using numpy
```
The key (in either case) is to **explicitly** define the `ddof`: 
```python
df['obs'].var(ddof=0)
# or
np.var(df['obs'], ddof=0)
```
The **NSE** benchmark helper we provide explicitly uses `ddof=0` to compute variance.

### KGE
| Metric                              | Reference                                                           |
| ----- | ----- |
| Kling-Gupta efficiency (KGE)        | Gupta, H. V., Kling, H., Yilmaz, K. K., & Martinez, G. F. (2009).  Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling. Journal of hydrology, 377(1-2), 80-91. https://www.sciencedirect.com/science/article/pii/S0022169409004843 |

In [None]:
# Example:
nwm.KGE()

### logNSE
| Metric                              | Reference                                                           |
| ----- | ----- |
| logNSE                              | Oudin, L., Andréassian, V., Mathevet, T., Perrin, C., & Michel, C. (2006). Dynamic averaging of rainfall‐runoff model simulations from complementary model parameterizations. Water Resources  Research, 42(7).|

In [None]:
# Example 
nwm.logNSE()

**Special Note for logNSE** &mdash; This metric takes the natural log of data values before computing NSE. The data is '_sanitized_' before the log is computed so that log does not attempt to take the logarithm of zero or a negative number. A few strategies are available concerning how to treat these observations (we could drop observations with zero or negative values, for example). The helper we use sanitizes this by way of a `clip()` function with a threshold of **0.01**. Any data value below this threshold is temporarily adjusted for the purposes of the logarithm; `log()` never operates on a value below the clip value. 

 This threshold is adjustable if you would prefer to clip at a different value:


In [None]:

nwm.logNSE(threshold=1.0) 

In [None]:
# Note that you can't clip to zero or a negative number
nwm.logNSE(threshold=-0.45)
# The benchmark will assume that's a typo and fall back to the default threshold of 0.01

Lastly &mdash; this data sanitization is handled differently within other libraries, notably `hydroeval`.  That package uses a slightly more complex strategy to ensure that `log()` gets clean data to work on. The `hydroeval` developer 
references [Pushpalatha et al. (2012)](https://doi.org/10.1016/j.jhydrol.2011.11.055) regarding their strategy.  The details of that method are beyond scope here -- just know that if you compare results with `hydroeval`, this metric may yield very slightly different results. 

### Percent Bias
| Metric                              | Reference                                                           |
| ----- | ----- |
| percent bias                        | A measure of the mean tendency of simulated values to be greater or less than associated observed values, units of percent |

In [None]:
nwm.pbias()

**Special Note on pbias** -- as relates to `hydroeval` and other libraries.
* The result we compute here mimics the behavior of the `hydroGOF` R package, and is the result of the code provided in 
the [model notebook](https://github.com/USGS-python/hytest-evaluation-workflows/blob/main/gallery/streamflow/02_nwm_benchmark_analysis.ipynb) 
mentioned above. 
* This differs from the `hydroeval` Python package in an important way.  
* `hydroGOF` (and this benchmark) returns:  <br> $100 × \frac{\sum_{i=1}^{n}(\hat{x}_{i} - x_{i})}{\sum_{i=1}^{n}x_{i}}$ <br>where $x$ is 'observed' and $\hat{x}$ is 'modeled'

* `hydroeval` on the other hand, returns:  <br> $100 × \frac{\sum_{i=1}^{n}(x_{i} - \hat{x}_{i})}{\sum_{i=1}^{n}x_{i}}$<br>Note
  tht the numerator has switched the ordering of $x$ and $\hat{x}$. 

The end result is that these two libraries return values of different sign. `hydroGOF` returns a positive value if the 'modeled' tends to be higher than 'observed', while `hydroeval` will return a negative number in this case. The absolute value of these calulations are the same. 

The developer for `hydroeval` points to [this document](https://elibrary.asabe.org/abstract.asp?aid=23153) as the source of the math used in that package. 

This code library uses the same ordering as `hydroGOF`, which is describe in EQN A1 of Yilmaz et al. (2008)


### FDC - Flow Duration Curves
| Metric | Reference |
| ----- | ----- |
| percent bias in midsegment slope of the flow-duration curve (FDC) between Q20-Q70 | Yilmaz, K. K., Gupta, H. V., & Wagener, T. (2008). A process‐based diagnostic approach to model evaluation: Application to the NWS distributed hydrologic model. Water Resources Research, 44(9).      |
| percent bias in FDC low-segment volume (Q0-Q30) | Yilmaz, K. K., Gupta, H. V., & Wagener, T. (2008). A process‐based diagnostic approach to model evaluation: Application to the NWS distributed hydrologic model. Water Resources Research, 44(9).      |
| percent bias in FDC high-segment volume (Q98-Q100) | Yilmaz, K. K., Gupta, H. V., & Wagener, T. (2008). A process‐based diagnostic approach to model evaluation: Application to the NWS distributed hydrologic model. Water Resources Research, 44(9).      |



**pBiasFMS** - %bias over mid-segment slope. This is the percent bias of the **slope** of the FDC in the mid-segment part of the curve. See equation A2 of Yilmaz

$\%BiasFMS = 100 × \cfrac{ [log(QS_{m1}) - log(QS_{m2})] - [log(QO_{m1}) - log(QO_{m2})] }
                         { [log(QO_{m1}) - log(QO_{m2})] }$

In [None]:
# Mid-Segment slope %bias
nwm.pBiasFMS()

In [None]:
# Plot FDC
fig, ax = plt.subplots(1, 1, figsize=(3, 2), dpi=300)
nwm.FDCplot(ax, segment='mid')
ax.set_title("Mid-Segment")
plt.show()


**pBiasFLV** - %bias in low-flow segment **volume**.  Note that in low flow segment, a log transform is used to increase sensitivity to very low flows.

$\%BiasFHV = -100 × \cfrac{
    \displaystyle\sum_{l=1}^L[log(QS_l) - log(QS_L)] - 
    \displaystyle\sum_{l=1}^L[log(QO_l) - log(QO_l)]
    }{
        \displaystyle\sum_{l=1}^L[log(QO_l) - log(QO_L)]
    }$


In [None]:
# Low-Volume Segment %bias
nwm.pBiasFLV()

In [None]:
# Plot
fig, ax = plt.subplots(1, 1, figsize=(3, 2), dpi=300)
nwm.FDCplot(ax, segment='lo')
ax.set_title("Low-Flow Segment")
plt.show()



**pBiasFHV** - %bias in high-flow segment **volume**.  See equation A3 of Yilmaz


$100 × \cfrac{
    \displaystyle\sum_{h=1}^H(QS_h - QO_h)
    }{
    \displaystyle\sum_{h=1}^H QO_h
    }$

In [None]:
# High-Volume Segment %bias
nwm.pBiasFHV()

In [None]:
# plot
fig, ax = plt.subplots(1, 1, figsize=(3, 2), dpi=300)
nwm.FDCplot(ax, segment='hi')
ax.set_title("High-Flow Segment")
plt.show()

With a little manipulation in `matplotlib`, all three FDC plots can be rendered in a single figure:

In [None]:
fig, ax = plt.subplots(1, 3, figsize=(9, 3), dpi=600)

nwm.FDCplot(ax[1], segment='mid')
ax[1].set_title("Mid-Segment")

nwm.FDCplot(ax[0], segment='lo')
ax[0].set_title("Low-Flow")

nwm.FDCplot(ax[2], segment='hi')
ax[2].set_title("High-Flow")


plt.show()

## Full NWM Standard Suite
A single call to the `Benchmark`'s `suite()` method can report the entire suite as one series:

In [None]:
nwm.suite()