# Backlog Forecasting

Fitting data to known distributions use is done using [Chi2Fit](https://hex.pm/packages/chi2fit).

## Table of contents

* [Set-up](#Set-up)
* [Data](#Data)
* [Simulation set-up](#Simulation-set-up)
* [Preparation](#preparation)
* [Simple forecast using the empirical data](#Simple-forecast-using-the-empirical-data)
* [Forecasting using a Poisson distribution](#Forecasting-using-a-Poisson-distribution)
* [References](#References)
* [Tear-down](#Tear-down)

## Set-up

In [1]:
Boyle.mk("chi2fit")

All dependencies up to date


{:ok, ["chi2fit"]}

In [2]:
Boyle.list()

{:ok, ["chi2fit"]}

In [3]:
Boyle.activate("chi2fit")

All dependencies up to date


:ok

In [4]:
Boyle.install({:chi2fit, path: "/app/chi2fit"})

Resolving Hex dependencies...
Dependency resolution completed:
Unchanged:
  exalgebra 0.0.5
  exboost 0.2.3
==> exboost
make: 'priv/libboostnif.so' is up to date.



:ok

In [5]:
require Chi2fit.Distribution
alias Chi2fit.Distribution, as: D
alias Chi2fit.Fit, as: F
alias Chi2fit.Matrix, as: M
alias Chi2fit.Utilities, as: U

Chi2fit.Utilities

In [84]:
defmodule Forecast do
    def forecast(fun, size, tries \\ 0)
    def forecast(fun, size, tries) when size>0 do
        forecast(fun, size-fun.(),tries+1)
    end
    def forecast(_fun,_size,tries), do: tries
end

{:module, Forecast, <<70, 79, 82, 49, 0, 0, 5, 60, 66, 69, 65, 77, 65, 116, 85, 56, 0, 0, 0, 135, 0, 0, 0, 15, 15, 69, 108, 105, 120, 105, 114, 46, 70, 111, 114, 101, 99, 97, 115, 116, 8, 95, 95, 105, 110, 102, 111, ...>>, {:forecast, 3}}

## Data

In [81]:
# Raw throughput data: the number of backlog items completed every 2 weeks
data = [3,3,4,4,7,5,1,11,5,6,3,6,6,5,4,10,4,5,8,2,4,12,5]

[3, 3, 4, 4, 7, 5, 1, 11, 5, 6, 3, 6, 6, 5, 4, 10, 4, 5, 8, 2, 4, 12, 5]

## Simulation set-up

In [78]:
# The size of the backlog, e.g. 100 backlog items
size = 100

# Number of iterations to use in the Monte Carlo
iterations = 1000

1000

## Preparation

In [82]:
hdata = U.to_bins data

[{1.5, 0.043478260869565216, 0.0058447102657877, 0.13689038224309594}, {2.5, 0.08695652173913043, 0.02977628442357071, 0.1905937209791003}, {3.5, 0.21739130434782608, 0.1263699563343216, 0.33774551477037923}, {4.5, 0.43478260869565216, 0.3160946312914347, 0.5600249434832333}, {5.5, 0.6521739130434783, 0.5263221461493021, 0.7626298741894857}, {6.5, 0.782608695652174, 0.6622544852296207, 0.8736300436656784}, {7.5, 0.8260869565217391, 0.709703289667655, 0.9080712005244068}, {8.5, 0.8695652173913043, 0.7585954661422599, 0.9405937209791002}, {10.5, 0.9130434782608695, 0.8094062790208998, 0.9702237155764294}, {11.5, 0.9565217391304348, 0.8631096177569041, 0.9941552897342123}, {12.5, 1.0, 0.9225113769324543, 1.0}]

## Simple forecast using the empirical data

In [64]:
tries = 1..iterations |> Enum.map(fn _ -> Forecast.forecast(fn -> Enum.random(data) end, size) end)
avg = U.moment tries, 1
sd = :math.sqrt U.momentc tries,2

IO.puts "50% with      #{:math.ceil(avg)}"
IO.puts "84% within    #{:math.ceil(avg+sd)} iterations"
IO.puts "97.5% within  #{:math.ceil(avg+2*sd)} iterations"
IO.puts "99.85% within #{:math.ceil(avg+3*sd)} iterations"

50% with      20.0
84% within    22.0 iterations
97.5% within  24.0 iterations
99.85% within 26.0 iterations


:ok

## Forecasting using a Poisson distribution

Instead of directly using the raw data captured one can also use a known probability distribution. The parameter of the distribution is matched to the data. After matching the parameter value one uses the known distribution to forecast.

Here, we will use the __Poisson distribution__ [1]. This basically assumes that the data points are independent of each other.

The code below uses basic settings of the commands provided by `Chi2Fit`. More advanced options can be found at [2]. First a fixed number of random parameter values are tried to get a rough estimate. The option `probes` equals the number of tries. Furthermore, since we are fitting a probability distribution which has values on the interval `[0,1]` the errors are asymmetrical. This is specified by the option `linear`. 

In [65]:
model = D.model "poisson"
options = [probes: 10_000, smoothing: false, model: :linear]
{chi2, parameters,errors} = F.chi2probe hdata, [{1,10}], {model[:fun], &F.nopenalties/2}, options

IO.puts "Initial guess:"
IO.puts "    chi2:\t\t#{chi2}"
IO.puts "    pars:\t\t#{inspect parameters}"
IO.puts "    errors:\t\t#{inspect errors}\n"

Initial guess:
    chi2:		3.7680928408143695
    pars:		[5.468463586964908]
    errors:		{[5.262886308299022, 5.68704613959866]}



:ok

The errors reported is the found range of parameter values where the corresponding `chi2` values are within 1 of the found minimum value.

After roughly locating the minimum we do a more precise (and computationally more expensive) search for the minimum.

In [83]:
{chi2, cov, parameters, errors} = F.chi2fit hdata, {parameters, model[:fun], &F.nopenalties/2}, 10, options

param_errors = cov |> M.diagonal |> Enum.map(fn x->x|>abs|>:math.sqrt end)

IO.puts "Final:"
IO.puts "    chi2:\t\t#{chi2}"
IO.puts "    Degrees of freedom:\t#{length(hdata)-model[:df]}"
IO.puts "    covariance:\t\t["
cov |> Enum.each(fn row -> IO.puts "    \t\t\t  #{inspect row}" end)
IO.puts "    \t\t\t]"
IO.puts "    gradient:\t\t#{inspect U.jacobian(parameters,&F.chi2(hdata,fn x->model[:fun].(x,&1) end,fn _->0.0 end,options),options)}"
IO.puts "    parameters:\t\t#{inspect parameters}"
IO.puts "    errors:\t\t#{inspect param_errors}"
IO.puts "    ranges:"
U.puts_errors errors

Final:
    chi2:		3.768088744230755
    Degrees of freedom:	10
    covariance:		[
    			  [0.04665516250250693]
    			]
    gradient:		[-1.3372622014991359e-9]
    parameters:		[5.468026377336737]
    errors:		[0.21599806133969568]
    ranges:
			chi2:		3.768088744230755	-	3.768088744230786
			parameter:	5.468026377306263	-	5.468026377398795


:ok

In [76]:
[rate] = parameters
[sd_rate] = param_errors

tries_min = 1..iterations |> Enum.map(fn _ -> Forecast.forecast(D.poisson(rate-sd_rate), size) end)
tries_avg = 1..iterations |> Enum.map(fn _ -> Forecast.forecast(D.poisson(rate        ), size) end)
tries_max = 1..iterations |> Enum.map(fn _ -> Forecast.forecast(D.poisson(rate+sd_rate), size) end)

avg = U.moment tries_avg, 1
sd = :math.sqrt U.momentc tries_avg,2

sd_min = abs(avg-U.moment(tries_min, 1))
sd_plus = abs(U.moment(tries_max, 1)-avg)

IO.puts "Number of iterations to complete the backlog:"
IO.puts "#{Float.round(avg,1)} (+/- #{Float.round(sd,1)}) (-#{Float.round(sd_plus,1)} +#{Float.round(sd_min,1)})"
IO.puts ""
IO.puts "50% with      #{:math.ceil(avg)}"
IO.puts "84% within    #{:math.ceil(avg+sd)} iterations"
IO.puts "97.5% within  #{:math.ceil(avg+2*sd)} iterations"
IO.puts "99.85% within #{:math.ceil(avg+3*sd)} iterations"

Number of iterations to complete the backlog:
18.8 (+/- 1.9) (-0.7 +0.8)

50% with      19.0
84% within    21.0 iterations
97.5% within  23.0 iterations
99.85% within 25.0 iterations


:ok

## References

[1] Poisson distribution, https://en.wikipedia.org/wiki/Poisson_distribution/<br>
[2] Chi2Fit, https://hex.pm/packages/chi2fit

## Tear-down

In [8]:
Boyle.deactivate()

:ok