## Assumption Providers

```PyProtolinc``` allows to make use of user provided assumption sets which are internally stored in
*assumption provider* objects. During the simulation an *assumptions set* is linked to a certain state transition and it provides information on how probable the transition. It can basically be understood as a (multi-dimensional) table of probabilities and each dimension is linked to a *risk factor*.

### Constant Rate Providers

These can be used in simple situation (e.g. testing). When constructing a ```ConstantRateProvider``` a float constant is passed in. Later the provider will return vectors containing this constant.

In [1]:
import pyprotolinc._actuarial as act
import numpy as np

const_prvdr = act.ConstantRateProvider(0.4)
const_prvdr.get_rate()

0.4

In [2]:
# return a vector of the given length
const_prvdr.get_rates(7)

array([0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4])

### Standard Rate Providers and Risk Factors

While not obvious from the previous example, in general a *rate provider* will depend on *risk factors* which are used to determine the entries to be selected in an assumptions table.
The following *risk factors* are currently supported by ```PyProtolinc```:

In [3]:
[rf for rf in act.CRiskFactors]

[<CRiskFactors.Age: 0>,
 <CRiskFactors.Gender: 1>,
 <CRiskFactors.CalendarYear: 2>,
 <CRiskFactors.SmokerStatus: 3>,
 <CRiskFactors.YearsDisabledIfDisabledAtStart: 4>]

To create a *StandardRateProvider* with a **1D lookup** we can proceed as follows.

In [4]:
std_prvdr_1d = act.StandardRateProvider(rfs=[act.CRiskFactors.Age], 
                                        values=np.array([0.1, 0.2, 0.3]),
                                        offsets=np.zeros(1, dtype=int))

We have now created a provider which depends on the risk factor *Age*.  It essentially prescribes:

  * age=0 -> 0.1
  * age=1 -> 0.2
  * age=2 -> 0.3

We can now query the object a follows.

In [5]:
std_prvdr_1d.get_rate([0]), std_prvdr_1d.get_rate(np.array([1], dtype=int))

(0.1, 0.2)

The next query is for five datapoints of ages 0, 0, 1, 0, 2:

In [6]:
ages = np.array([0, 0, 1, 0, 2], dtype=int)
std_prvdr_1d.get_rates(len(ages), age=ages)

array([0.1, 0.1, 0.2, 0.1, 0.3])

We can also make this example 2 dimensional.

In [7]:
from pyprotolinc.models.risk_factors import Gender, SmokerStatus

std_prvdr_2d = act.StandardRateProvider(rfs=[act.CRiskFactors.Gender, act.CRiskFactors.Age], 
                                        values=np.array([[0.1, 0.2, 0.3],
                                                         [1.1, 1.2, 1.3]]),
                                        offsets=np.zeros(2, dtype=int))

The ```values``` array passed in is 2D and the first dimension (the *rows*) corresponds with the first risk factor (Gender)
and the second one (the *columns*) with the second risk factor (Age).

In [8]:
genders = np.array([Gender.M, Gender.F, Gender.M, Gender.F, Gender.M], dtype=int)
std_prvdr_2d.get_rates(len(ages), age=ages, gender=genders)

array([0.1, 1.1, 0.2, 1.1, 0.3])

The first entry of the returned vector corresponds with a 0-year old of Gender=M (index 0) and the **fourth** corresponds with a 0 year old of gender F (index 1).

In the next 3D example we will demonstrate the use a non-zero **offset**.

In [9]:
values3D = np.array([
                     [[0.1, 0.2, 0.3],
                      [1.1, 1.2, 1.3]],
                   
                     [[-0.1, -0.2, -0.3],
                      [-1.1, -1.2, -1.3]]
])

We want to use the risk factor *CalendarYear* and the example is meant such that the first dimension corresponds
with the CalendarYear 2019 (this is the first group of six values above) and the second with 2020 (second group of six values).
Furthermore, the second dimension is gender and the third age where this time the ages are supposed to start at 20.

In [10]:
std_prvdr_3d = act.StandardRateProvider(rfs=[act.CRiskFactors.CalendarYear, act.CRiskFactors.Gender, act.CRiskFactors.Age], 
                                        values=values3D,
                                        offsets=np.array([2019, 0, 20], dtype=int))

In [11]:
ages = np.array([20, 20, 21, 20, 22], dtype=int)
genders = np.array([Gender.M, Gender.F, Gender.M, Gender.F, Gender.M], dtype=int)
calendaryears = np.array([2019, 2019, 2020, 2020, 2020], dtype=int)

std_prvdr_3d.get_rates(len(ages), age=ages, gender=genders, calendaryear=calendaryears)

array([ 0.1,  1.1, -0.2, -1.1, -0.3])

The third value returned (-0.2) is parametrized with age=21, gender=M and calendaryear=2020. Therefore, in view of the offset of 2019 the calendaryear implies it must be read off from
the second group of six values, then in the first row (gender M) and the second column (age=21 with offset of 20). There we find -0.2 as expected.

Currently the dimension of the data is restricted to four or below.