# Treating the range estimates #

In [one of the previous notebooks](agile_estimation_2.ipynb)we have established a statistical model for predicting the actual project time and cost based on the estimates. We discussed that we can fit the estimates (both for the Agile and Waterfall projects) to a Log-Normal distribution, which guarantees the positive support. Using statistical approach to estimation allows us to give prediction with a required confidence level, and also project monetary benefits, costs and risk, as we discussed in [another notebook](agile_estimation_3.ipynb).

One thing I was asked is how the model generalizes for the case when an estimate is given as a range. Indeed, this is what everybody taught us: do not give a single number, but range. One approach is to continue to use our statistical model, and feed it a number in the middle, the mean of the two values. 

$$x = \frac{high+low}{2}$$

That way the model can be used without modifications.

There are two problems with this approach: 
1. Taking a mean of high and low is arbitrary. It reduces the information given by half. It would be better to have an algorithm learn where we need to set the variable x within the interval between low and high boundaries
2. By giving us a range of data, a developer is trying to convey to us a very important information: a degree of uncertainty in the estimates. A correct model should use that information.

To simplify the process, we will take natural logarithm of all the estimates and the actuals. Since we model estimates using log-normal distribution, our new variables `y`, `l`, `h` will be logarithms of the actual number of days, low and high estimates respectively. In this case we can use Normal distribution!
We will model `y` using linear regression:
$$ y = \theta_h h + \theta_l l $$

In case where $\theta_h$ and $\theta_l$ are equal, we get exactly the same problem as we discussed [earlier](agile_estimation_2.ipynb).

The likelihood function for a single piece of data in this case can be written as follows (following [this](https://en.wikipedia.org/wiki/Bayesian_linear_regression)). 

$$ \rho(y|h,l,\theta_h, \theta_l, \sigma) \propto \frac{1}{\sigma} \exp(-\frac{1}{2\sigma^2}(y - \theta_h h - \theta_l l)^2 )$$

As mentioned earlier, by giving a range, the developer wanted to communicate to us the uncertainty of the estimate. We should include this uncertainty in our estimate of $\sigma$. Intuitively the range is proportional to the standard deviation, and we can learn the coefficient by modeling $\sigma$ as:
$$\sigma = \sigma_0 (1 + \zeta^2 (h-l))$$

If we also use precision parameter $\tau$ in place of $\sigma_0$:
$$\tau = \frac{1}{\sigma^2}$$

Then our likelihood function will be:
$$ \rho(y|h,l,\theta_h, \theta_l, \tau, \zeta) \propto \frac{\sqrt{\tau}}{1 + \zeta^2 (h-l)} \exp(-\frac{\tau}{2(1 + \zeta^2 (h-l))^2}(y - \theta_h h - \theta_l l)^2 )$$
    

The priors for $\tau$ and $\theta$ are traditionally Gamma and Normal distribution respectively:

$$\rho(\tau) \propto \tau^{\alpha-1}e^{-\beta \tau}$$

$$\rho(\theta|\tau) \propto \tau \exp(-\frac{\tau \lambda}{2}(\theta_h^2+\theta_l^2))$$

Here $\alpha$, $\beta$, $\lambda$ are hyperparameters

The choice of prior for $\zeta$ is more difficult. None of the conjugate priors exist for the kind of likelihood function we have chosen. For now we can select the normal distribution. Zero mean of this distribution means that a priori we don't trust the ranges (we know that many consultants the range is always 20% and does not convey any information). High mean of the prior distribution means that we pay more attention to the estimated degree of uncertainty.

For simplicity  we set the mean to zero.

$$\rho(\zeta) \propto \sqrt{\tau} \exp(-\frac{\tau \lambda_\zeta}{2}\zeta^2)$$

The negative log-posterior function is:
$$ \mathscr{L}(\theta_h, \theta_l,\zeta, \tau) = \sum_{i=0}^{N-1}[\log(1 + \zeta^2 (h^{(i)}-l^{(i)})) +
\frac{\tau}{2(1 + \zeta^2 (h^{(i)}-l^{(i)}))^2}(y - \theta_h h^{(i)} - \theta_l h^{(i)})^2 ]
- \frac{N+1+2\alpha}{2}\log{\tau} + \beta \tau + \frac{\tau \lambda}{2}(\theta_h^2+\theta_l^2))
+ \frac{\tau \lambda_\zeta}{2}\zeta^2
$$ 

In this notebook I will find parameters, corresponding to the maximum posterior. And to avoid making errors in differentiating, we will use TensorFlow. We will follow [this example](https://github.com/tensorflow/tensorflow/blob/r1.11/tensorflow/examples/get_started/regression/custom_regression.py) to build our code

In [2]:
import numpy as np
import pandas as pd
import tensorflow as tf

In [None]:
seed=1389
tf.reset_default_graph()
tf.set_random_seed(seed)
np.random.seed(seed)