# D-Score Benchmark Usage
**NOTE**: 
_This notebook adapted from originals by Timothy Hodson and Rich Signell. See that upstream work at:_
* https://github.com/thodson-usgs/dscore
* https://github.com/USGS-python/hytest-evaluation-workflows/

This notebook will demonstrate how to call the specific functions defined in the [D-Score Suite v1](./D-Score_Suite_v1.ipynb)
notebook, using a small demonstration dataset.

In [1]:
import pandas as pd
import numpy as np

## Sample Data

In [7]:
ds = pd.read_csv(r"./NWM_Benchmark_SampleData.csv", index_col='date', parse_dates=True).dropna()
print(len(ds.index), " Records")

12145  Records


In [8]:
ds.head()

Unnamed: 0_level_0,site_no,obs,nwm,nhm
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1983-10-01,1104200,1.121347,6.175417,1.469472
1983-10-02,1104200,1.214793,6.250417,1.848861
1983-10-03,1104200,0.872159,6.215833,2.169456
1983-10-04,1104200,0.419089,6.105,2.200083
1983-10-05,1104200,0.849505,5.9525,1.931588


## Import Benchmark Functions
The benchmark functions are defined in an [adjacent notebook](./D-Score_Suite_v1.ipynb).  They are imported here by 
running that notebook from within the following cell:

In [9]:
%run ./D-Score_Suite_v1.ipynb
# This defines the same functions in this notebook's namespace.

The functions are now available here, to run against our sample data:

In [16]:
# Mean Square Error
mse(ds['obs'], ds['nwm'])

55.73589185136414

In [17]:
seasonal_mse(ds['obs'], ds['nwm'])

winter    13.205368
spring    11.135375
summer    14.120221
fall      17.274927
dtype: float64

## Create Composite Benchmark
It is useful to combine several of these metrics into a single benchmark routine, which returns a pandas Series of the assembled metrics.

This 'wrapper' composite benchmark also handles any transforms of the data before calling the metric functions. In this case, we will log transform the data. 

In [18]:
def compute_benchmark(df):
    obs = np.log(df['obs'].clip(lower=0.01)) # clip to remove zeros and negative values
    sim = np.log(df['nwm'].clip(lower=0.01))
    
    mse_ = pd.Series(
        [ mse(obs, sim) ], 
        index=["mse"], 
        dtype='float64'
    )
    return pd.concat([
            mse_,
            bias_distribution_sequence(obs, sim), 
            seasonal_mse(obs, sim),
            quantile_mse(obs, sim)
            ],
        )

In [19]:
compute_benchmark(ds)

mse          0.874842
e_bias       0.409683
e_dist       0.224187
e_seq        0.241010
winter       0.057879
spring       0.033822
summer       0.396487
fall         0.386654
low          0.653889
below_avg    0.127766
above_avg    0.052214
high         0.040973
dtype: float64