# Example 3: Mortality Simulation

We use this example to demonstrate a few additional features of *pyprotolinc*:

  * Use of a custom state model
  * Use of a customized product
  * (Use of another standard table, still a todo)


### A Custom State Model

The following model is in fact part of *pyprotolinc* (```pyprotolinc.models.model_mortality.MortalityStates```) but since *pyprotolinc* is meant to be used (also) as a library 
it supports the integration of user-provided state models. We start by declaring an IntEnum containing the states.

In [1]:
from enum import IntEnum, unique

import numpy as np
from pyprotolinc.results import CfNames
from pyprotolinc.product import register_product
from pyprotolinc.results import ProbabilityVolumeResults
from pyprotolinc.models import check_states, register_state_model


@unique
class MortalityStates2(IntEnum):
    ACTIVE = 0      # the "alive state"
    DEATH = 1       
    LAPSED = 2      
    MATURED = 3   

    @classmethod
    def to_std_outputs(cls):
        return {
            ProbabilityVolumeResults.VOL_ACTIVE: cls.ACTIVE,
            ProbabilityVolumeResults.VOL_DEATH: cls.DEATH,
            ProbabilityVolumeResults.VOL_LAPSED: cls.LAPSED,
            ProbabilityVolumeResults.VOL_MATURED: cls.MATURED,

            ProbabilityVolumeResults.MV_ACTIVE_DEATH: (cls.ACTIVE, cls.DEATH),
            ProbabilityVolumeResults.MV_ACT_LAPSED: (cls.ACTIVE, cls.LAPSED),
            ProbabilityVolumeResults.MV_ACT_MATURED: (cls.ACTIVE, cls.MATURED),
        }

# check and register the model
register_state_model(MortalityStates2)

Note that in the last statement above the model is registered with *pyprotolinc*. The class method ```to_std_outputs``` is required to map the states in the current state model to the standard output model. We parametrize which state belongs to which *volume vector* (=summed head count probability) in the standard output format and which state transition corresponds with the volume movements.

### Configuration

Importing the configuration from the current working directory we see that the above state model is selected there:

In [2]:
from pyprotolinc.main import get_config_from_file, project_cashflows
run_config = get_config_from_file(config_file='config.yml')
print(run_config.state_model_name)

MortalityStates2


### Product Definition

With these tools at hand we can now come to the product definition. Note that the product references the above state model in a class variable.

In [3]:
from pyprotolinc.product import calc_term_end_indicator, calc_term_start_indicator, calc_terminal_months, calc_maturity_transition_indicator

class Product_MortalityTerm2:
    """ Simple product that pays out on death."""

    STATES_MODEL = MortalityStates2

    def __init__(self, portfolio):
        self.portfolio = portfolio
        self.length = len(self.portfolio)

        # monthly sum insured (=annuity per year) as an (n, 1)-array
        self.sum_insured_per_month = self.portfolio.sum_insured[:, None] / 12.0

        self.year_last_month, self.last_month = calc_terminal_months(self.portfolio.df_portfolio)

    def get_bom_payments(self, time_axis):
        """ Return the 'conditional payments', i.e. those payments that are due if an
            insured is in the corresponding state at the given time. """

        multiplier_term_end = calc_term_end_indicator(time_axis,
                                                      self.year_last_month,
                                                      self.last_month)
        multiplier_term_start = calc_term_start_indicator(time_axis,
                                                          self.portfolio.policy_inception_yr,
                                                          self.portfolio.policy_inception_month
                                                          )
        multiplier_term = multiplier_term_end * multiplier_term_start

        return {
            self.STATES_MODEL.ACTIVE: [
                (CfNames.PREMIUM,
                 0.0005 * multiplier_term * self.sum_insured_per_month * 12.0
                 )
            ]
        }

    def get_state_transition_payments(self, time_axis):
        # a flat mortality benefit in this product

        multiplier_term_end = calc_term_end_indicator(time_axis,
                                                      self.year_last_month,
                                                      self.last_month)
        multiplier_term_start = calc_term_start_indicator(time_axis,
                                                          self.portfolio.policy_inception_yr,
                                                          self.portfolio.policy_inception_month
                                                          )
        multiplier_term = multiplier_term_end * multiplier_term_start
        return {
            (self.STATES_MODEL.ACTIVE, self.STATES_MODEL.DEATH): [
                (CfNames.DEATH_PAYMENT,
                 -multiplier_term * self.sum_insured_per_month * 12.0
                 )
             ]
        }

    def contractual_state_transitions(self, time_axis):
        """ This method returns a datastructure which encodes
            when and for which records contractual state transitions
            are due.

            Returns: Iterable consisting of three-tuples where
              - first member = from-state
              - sencond member = to-state
              - third member is a binary matrix of the structure "insured x time"
                where a "1" represents a contractual move.    
        """
        # for the mortality term product there is only the transition
        # ACTIVE -> MATURED
        return [
            (self.STATES_MODEL.ACTIVE,
             self.STATES_MODEL.MATURED,
             calc_maturity_transition_indicator(time_axis, self.year_last_month, self.last_month)
             )
        ]

register_product("TERM2", Product_MortalityTerm2)

In the final statement we register the product under the name "TERM2". For a more in depth understanding of the structure of the product please refer to the concepts section of the documentation. In short, each product definition must provide methods
for:

  * *payments due at the beginning of a months* when in a certain state (```get_bom_payments```)
  * *payments due at the end of the months* when a certain state transitons occurs (```get_state_transition_payments```)
  * *state transitions* that do not originate from biometric the projection assumptions (```contractual_state_transitions```)
  

The following example demonstrates the principle.

In [4]:
# load a portfolio
from pyprotolinc.portfolio import Portfolio
portfolio = Portfolio("portfolio/portfolio_small.xlsx", states_model=MortalityStates2)
portfolio.df_portfolio

INFO - 2022-07-03 22:01:06,883 - pyprotolinc.portfolio - Reading portfolio data from file portfolio/portfolio_small.xlsx.


Unnamed: 0,DATE_PORTFOLIO,ID,DATE_OF_BIRTH,DATE_START_OF_COVER,SUM_INSURED,CURRENT_STATUS,SEX,PRODUCT,PRODUCT_PARAMETERS,SMOKERSTATUS,RESERVING_RATE
0,2021-12-31,1,1976-04-23,2022-01-01,120000,ACTIVE,m,TERM2,10,S,0.04
1,2021-12-31,2,1962-09-01,2015-10-01,100000,ACTIVE,m,TERM2,10,N,0.03


The portfolio shows two insureds in state active at 2021-12-31 both having the *PRODUCT*
set as "TERM2" and the *PRODUCT_PARAMETERS* set as 10. The latter parameter is meant to indicate the duration of the policy in years.

To test the product definition we still need a time-axis object. We create one with in total 150 months for this test.

In [5]:
from pyprotolinc.runner import TimeAxis
time_axis = TimeAxis(portfolio.portfolio_date, 150)

Now we can test the product.

In [6]:
prod = Product_MortalityTerm2(portfolio)
bom_pay = prod.get_bom_payments(time_axis)

# this outputs the premium vectors (index 0 is the first and only
# payment type parametrized when in in active state above)
bom_pay[prod.STATES_MODEL.ACTIVE][0]

(<CfNames.PREMIUM: 0>,
 array([[ 0., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60.,
         60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60.,
         60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60.,
         60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60.,
         60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60.,
         60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60.,
         60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60.,
         60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60.,
         60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60.,
         60., 60., 60., 60.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [50., 50., 50., 50., 50., 50., 50., 50., 50., 50., 50., 50., 50.,
         50., 50., 50., 50., 50., 50., 

We see that there are 120 times "60" for the first insured and 46 times "50" for the second. Since the TimeAxis starts in December 2021 this mean that for insured #1 there are up to 120 payments (each month for ten years) to be made in the future and 45 for insured #2. This is explained as follows when looking at the portfolio:

In [7]:
portfolio.df_portfolio[["DATE_PORTFOLIO", "DATE_START_OF_COVER", "PRODUCT_PARAMETERS"]]

Unnamed: 0,DATE_PORTFOLIO,DATE_START_OF_COVER,PRODUCT_PARAMETERS
0,2021-12-31,2022-01-01,10
1,2021-12-31,2015-10-01,10


For insured #1 (row 0) the case is clear: The cover starts only on January 1st and the full 10 years of the term are in the future, hence there must be 120 payments (conditional on being active). For the insured in the second row the cover started in October 2015. That mean that at the end of December 2021 the policy exists for already three months and six years, i.e. for 75 months. Hence there remain 120 - 75 = 45 months.

### Assumptions

Let's have a look at the assumptions next.

In [8]:
with open('mortality_assumptions_simple.yml', 'r') as f:
    print(f.read())


assumptions_spec:

  be:
    # active -> death
    - [0, 1, ["Scalar", 0.0015]]

    # active -> lapse
    - [0, 2, ["Scalar", 0.05]]

  res:
    # active -> death
    - [0, 1, ["Scalar", 0.0015]]

    # active -> lapse
    - [0, 2, ["Scalar", 0.0]]



The state transition (0, 1) is the death while (0, 2) corresponds with lapse. We use simple assumptions for now.
To work with the DAV2008T table we need to download it first by running ```pyprotolinc download_dav_tables``` on the command shell.

### Run

Now we run our custom state model and term:

In [9]:
project_cashflows(run_config);

INFO - 2022-07-03 22:01:07,375 - pyprotolinc.main - Multistate run with config: {'model_name': 'GenericMultiState', 'years_to_simulate': 121, 'portfolio_path': 'portfolio/portfolio_small.xlsx', 'assumptions_path': 'mortality_assumptions_simple.yml', 'steps_per_month': 1, 'state_model_name': 'MortalityStates2', 'timestep_duration': 0.08333333333333333, 'outfile': 'results/ncf_out_generic.csv', 'portfolio_cache': 'portfolio/portfolio_cache', 'profile_out_dir': '.', 'portfolio_chunk_size': 1024, 'use_multicore': False}
DEBUG - 2022-07-03 22:01:07,382 - pyprotolinc.portfolio - Porfolio file not found in cache.
INFO - 2022-07-03 22:01:07,383 - pyprotolinc.portfolio - Reading portfolio data from file C:\Users\marti\programming\PyProtolinc\examples\03_mortality\portfolio\portfolio_small.xlsx.
INFO - 2022-07-03 22:01:07,407 - pyprotolinc.portfolio - Created directory for portfolio cache C:\Users\marti\programming\PyProtolinc\examples\03_mortality\portfolio\portfolio_cache
INFO - 2022-07-03 22:

Let's inspect the result.

In [10]:
import pandas as pd
pd.read_csv("results/ncf_out_generic.csv", index_col=0).head()

Unnamed: 0,YEAR,QUARTER,MONTH,PREMIUM,ANNUITY_PAYMENT1,ANNUITY_PAYMENT2,DEATH_PAYMENT,DI_LUMPSUM_PAYMENT,RESERVE_BOM(ACTIVE),RESERVE_BOM(DEATH),...,MV_ACTIVE_DIS1,MV_ACT_DIS2,MV_ACT_LAPSED,MV_ACT_MATURED,MV_DIS1_DEATH,MV_DIS1_DIS2,MV_DIS1_ACT,MV_DIS2_DEATH,MV_DIS2_DIS1,MV_DIS2_ACT
0,2021,4,12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2022,1,1,110.0,0.0,0.0,-27.5,0.0,-6044.40823,0.0,...,0.0,0.0,0.008333,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,2022,1,2,109.527917,0.0,0.0,-27.381979,0.0,-5955.167198,0.0,...,0.0,0.0,0.008298,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,2022,1,3,109.057859,0.0,0.0,-27.264465,0.0,-5866.393392,0.0,...,0.0,0.0,0.008262,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,2022,2,4,108.589819,0.0,0.0,-27.147455,0.0,-5778.084686,0.0,...,0.0,0.0,0.008227,0.0,0.0,0.0,0.0,0.0,0.0,0.0


The premium of 110 can be explained as follows: It is obtained as 0.005 * 220000. The factor 0.0005 is introduced in the function ```get_bom_payments``` and 220000 is the *sum insured* of the full portfolio, i.e. both policies together.

Note that the reserves are negative indicating the high profitability.

### Re-Run with DAV2008T

In order to rerun with other assumptions we can simple swap out the ```assumptions_path``` attribute of
the ```run_config``` object:

In [11]:
run_config.assumptions_path = 'mortality_assumptions_simple_dav2008t.yml'

The corresponding file looks as follows:

In [12]:
with open(run_config.assumptions_path, 'r') as f:
    print(f.read())


assumptions_spec:

  be:
    # active -> death
    - [0, 1, ["DAV2008T", "estimate_type:BE",
              "base_directory:tables/Germany_Endowments_DAV2008T"]]

    # active -> lapse
    - [0, 2, ["Scalar", 0.05]]

  res:
    # active -> death
    - [0, 1, ["DAV2008T", "estimate_type:LOADED",
              "base_directory:tables/Germany_Endowments_DAV2008T"]]

    # active -> lapse
    - [0, 2, ["Scalar", 0.0]]



As one can see we have parametrized the mortality assumption by the DAV2008T tables and we are assuming no
lapse for the reserve calculations.

In [13]:
pd.DataFrame(project_cashflows(run_config)).head()

INFO - 2022-07-03 22:01:07,784 - pyprotolinc.main - Multistate run with config: {'model_name': 'GenericMultiState', 'years_to_simulate': 121, 'portfolio_path': 'portfolio/portfolio_small.xlsx', 'assumptions_path': 'mortality_assumptions_simple_dav2008t.yml', 'steps_per_month': 1, 'state_model_name': 'MortalityStates2', 'timestep_duration': 0.08333333333333333, 'outfile': 'results/ncf_out_generic.csv', 'portfolio_cache': 'portfolio/portfolio_cache', 'profile_out_dir': '.', 'portfolio_chunk_size': 1024, 'use_multicore': False}
INFO - 2022-07-03 22:01:07,832 - pyprotolinc.portfolio - Porfolio loaded from cache
INFO - 2022-07-03 22:01:07,832 - pyprotolinc.portfolio - Portolio rows: 2
DEBUG - 2022-07-03 22:01:07,832 - pyprotolinc.portfolio - Splitting portfolio for product TERM2.
DEBUG - 2022-07-03 22:01:07,832 - pyprotolinc.portfolio - Initializing portfolio from dataframe
DEBUG - 2022-07-03 22:01:07,845 - pyprotolinc.portfolio - Initializing portfolio from dataframe
INFO - 2022-07-03 22:0

Unnamed: 0,YEAR,QUARTER,MONTH,PREMIUM,ANNUITY_PAYMENT1,ANNUITY_PAYMENT2,DEATH_PAYMENT,DI_LUMPSUM_PAYMENT,RESERVE_BOM(ACTIVE),RESERVE_BOM(DEATH),...,MV_ACTIVE_DIS1,MV_ACT_DIS2,MV_ACT_LAPSED,MV_ACT_MATURED,MV_DIS1_DEATH,MV_DIS1_DIS2,MV_DIS1_ACT,MV_DIS2_DEATH,MV_DIS2_DIS1,MV_DIS2_ACT
0,2021,4,12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2022,1,1,110.0,0.0,0.0,-73.688333,0.0,2949.569942,0.0,...,0.0,0.0,0.008333,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,2022,1,2,109.504822,0.0,0.0,-73.35476,0.0,2949.892834,0.0,...,0.0,0.0,0.008296,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,2022,1,3,109.011875,0.0,0.0,-73.022698,0.0,2950.213167,0.0,...,0.0,0.0,0.008258,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,2022,2,4,108.521148,0.0,0.0,-72.692139,0.0,2950.53093,0.0,...,0.0,0.0,0.008221,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Calculating the ratio of the claims in this and the previous projection we find that the latest claims are 2.68 times as high as the previous ones:

-73.7 / -27.5 = 2.68

Looking at the portfolio we find that we have:

  * a male smoker born in 1976 (being, say, 45 years at year-end 2021) *and*
  * a male non-smoker born in 1962 being 59 years at year-end 2021
  
We check the mortality rates we would expect to be applied: 

<img src="img/dav_extract.png"/>

We can validate that these are the rate that are used by ```pyprotolinc```.

In [14]:
from pyprotolinc.assumptions.dav2008t import DAV2008T
dav2008t = DAV2008T(base_directory="tables/Germany_Endowments_DAV2008T")
dav2008_provider = dav2008t.rates_provider(estimate_type="BE")
rates = dav2008_provider.get_rates(age=portfolio.initial_ages // 12,
                                   smokerstatus=portfolio.smokerstatus,
                                   gender=portfolio.gender)
rates

array([0.003013, 0.005227])

To demonstrate that this explains the increase in the claims level we calculated the sum-insured weighted average rate and
divide it by 0.0015:

In [15]:
rates.dot(portfolio.sum_insured) / portfolio.sum_insured.sum() / 0.0015

2.6795757575757575

So the increase in claims in indeed explained by the higher mortality rates in the DAV2008 table compared to our previous 
simple assumption.