# d-score

## Overview
This lab 

Steps
1. Decompose your error into orthogonal components
2. Pass each component through a scoring function

In fact, the order doesn't matter, which is one of the keys to `d-score`'s flexibility.


## Select your objective
The first step in the `d-score` approach is to select an appropriate distribution to represent the error (or uncertainty) in your model predictions.
If this all sounds unfamiliar, relax;
we're really just asking "what is the best objective function for your model?"

For example, if the errors are normally distributed and /iid/, 
then the optimal objective function (or benchmark) is mean squared error. 
We'll review why shortly. 
However, for many models, the error distribution is neither normal nor iid.
Runoff models are a pertenant example.
Their errors tend to scale with the magnitude of flow (violating /iid/).
Logging the data can help remove that scaling effect,
which is what most hydrologists want,
even if their choice of objective says otherwise.

The next sections will demonstrate a basic procedure for selecting a reasonable objective function for your model.

## Benchmarking objectives
A model has a MSE of 1 and a MSLE of 1.
Which is a better score?
The short answer is we don't know.
These scores have different scales, so we can't compare them without more information.

Fortunately, we already have all the information we need.
The trick is to transform the objectives into likelihoods,
thereby putting them all on a common scale.
We'll demonstrate the simplest approach, which is to use maximum likelihood estimators.

### log likelihoods
The first, and arguably de facto, objective is MSE, 
which corresponds to the log likelihood of the normal distribution,
\begin{equation}
\ell_2 = -n \ln \sigma  - \frac{n}{2} \ln(2\pi) - \frac{1}{2\sigma^2} \sum_{i=1}^n (y_i - \hat y_i)^2 \text{,}
\end{equation}
where $y_i$ are the observations, $\hat y_i$ are the model predictions, and $\sigma$ is standard deviation of the error.
The final term is the most important.
Stare at it for a moment and you'll recognize it as the L2 norm,
which is essentially just the MSE.
The remaining terms normalize the result, which we need in order to compare other objective functions.
If we calibrate a model to just MSE, we could drop these terms.
However, to compare different objectives, we need to keep them.

Another common objective is mean absolute error (MAE),
which corresponds to the log likelihood of the Laplace distribution (Figure \ref{figure1}),
\begin{equation}
\ell_1 = -n \ln(2b)  - \frac{1}{b} \sum_{i=1}^n | y_i - \hat y_i| \text{,}
\end{equation}
where $b$ is mean absolute error
(also known as the L1 norm).

### Changing variables
Fortunately, most of the hard work
Now, there are other "classic" probability distributions 
Likelihoods for a variety of other objective functions are obtained by changing variables.
For example, the mean squared log error (MSLE), which corresponds to the lognormal log likelihood $\ell_3$,
is obtained from $\ell_2$ by changing variables
\begin{equation}
\ell_3 = \ell_2(v(y)) + \ln|v'(y)| \text{,}
\end{equation}
where $v$, the natural log in this case.

Can you define more?

### Additional reading
```
@Article{Hodson_2022,
  doi = {10.48550/ARXIV.2212.06566},
  author = {Hodson, Timothy O. and Over, Thomas M. and Smith, Tyler J. and Marshall, Lucy M.},
  title = {How to select an objective function using information theory},
  publisher = {arXiv},
  year = {2022}
}
```

##

In [None]:
def normal_ll(y, y_hat):
    """Compute log likelihood for the normal distribution
    
    Parameters
    ----------
    y : array_like
        Observations.
    y_hat : array_like
        Predictions.
        
    Returns
    -------
        Log likelihood
        
    Proof
    -----
    https://www.statlect.com/probability-distributions/normal-distribution
    """
    return ll


For extra credit, define `normal_ll` to accept a change of variable. Solution below.

In [None]:
def normal_ll(y, y_hat, transform=None, gradient=1):
    '''Log likelihood for the normal distribution with change of variable
    
    The normal distribution is the formal likelihood for the mean squared error (MSE).
    

    Parameters
    ----------
    y : array_like
        Observations.
    y_hat : array_like
        Predictions.
    transform : function
        Change of variable transformation.
    gradient : function
        Gradient of the transform function.
        
    Proof
    -----
    https://www.statlect.com/probability-distributions/normal-distribution
    '''
    if transform is not None:
        y = transform(y)
        y_hat = transform(y_hat)
        
    e = y - y_hat
    n = len(e)
    sigma = e.std()
    log_gradient = np.sum(np.log(np.abs(gradient)))
    ll = -n * np.log(sigma) - n/2*np.log(2*np.pi) - 1/(2*sigma**2) * (e**2).sum() + log_gradient
    return ll

In [None]:
 def compute_weights(series, base=np.e):
    '''Compute posterior weights
    
    Parameters
    ----------
    series : array_like
        Log likelihoods
    base: float
        Base of the logarithm used to compute log likelihood
    '''
    s = base**series
    return s/s.sum()
