In [1]:
import jupyter_black

jupyter_black.load()

These tutorials are written to help you get familiar with some of the common functionalities that most actuaries can use in their day-to-day responsibilities that are provided by the `chainladder` package. We will also be using the datasets that already come included with the package, allowing you to follow and reproduce the results as shown here.

Keep in mind that these tutorials were written to only demonstrate the functionalities of the package, and the user should always follow all applicable laws, the Code of Professional Conduct, applicable Actuarial Standards of Practice, and exercise their best actuarial judgement. These tutorials are not written in a way that encourage certain workflow, or recommendation, when it comes to analyzing a dataset or rendering an actuarial opinion.

The tutorials assume that you have the basic understanding of commonly used actuarial terms, and can independently perform an actuarial analysis in another tool, such as Microsoft Excel or another actuarial software. Furthermore, it is assumed that you already have some familiarity with Python, and that you have the basic knowledge and experience in using some common packages that are popular in the Python community, such as `pandas` and `numpy`.

All tutorials and exercises rely on `chainladder` v0.8.18 and later. If you have trouble reconciling the results from your workflow to this tutorial, you should verify the versions of the packages installed in your work environment and check the release notes in case updates patches are issued subsequently.

In [2]:
import pandas as pd
import numpy as np
import chainladder as cl

print("pandas: " + pd.__version__)
print("numpy: " + np.__version__)
print("chainladder: " + cl.__version__)

pandas: 2.1.4
numpy: 1.26.3
chainladder: 0.8.18


# Working with Triangles

## Importing Data

Let's begin by looking at a raw triangle dataset and load it into a `pandas.DataFrame`. We'll use the data `raa`, which is available from the repository. Note that this dataset is currently in `csv` format.

In [3]:
raa_df = pd.read_csv(
    "https://raw.githubusercontent.com/casact/chainladder-python/master/chainladder/utils/data/raa.csv"
)
raa_df.head(20)

Unnamed: 0,development,origin,values
0,1981,1981,5012.0
1,1982,1982,106.0
2,1983,1983,3410.0
3,1984,1984,5655.0
4,1985,1985,1092.0
5,1986,1986,1513.0
6,1987,1987,557.0
7,1988,1988,1351.0
8,1989,1989,3133.0
9,1990,1990,2063.0


The dataset has three columns: 
* development: or valuation time, in this case, the valuation year
* origin: or accident date, in this case, the accident year
* values: the values recorded for the specific accident date at the specific valuation time (such as incurred losses, paid losses, or claim counts), in this case, these are just "values" within the triangle, and has no specific metrics unit associated with them

A table of loss experience showing total losses for a certain period (origin) at various, regular valuation dates (development), reflects the change in amounts as claims mature and emerge. Older periods in the table will have one more entry than the next youngest period, leading to the triangle shape of the data in the table or any other measure that matures over time from an origin date. Loss triangles can be used to determine loss development for a given risk.

Let's put our data into the `chainladder.Triangle` format.

In [4]:
raa = cl.Triangle(
    data=raa_df,
    origin="origin",
    development="development",
    columns="values",
    cumulative=True,
)
raa

Unnamed: 0,12,24,36,48,60,72,84,96,108,120
1981,5012,8269.0,10907.0,11805.0,13539.0,16181.0,18009.0,18608.0,18662.0,18834.0
1982,106,4285.0,5396.0,10666.0,13782.0,15599.0,15496.0,16169.0,16704.0,
1983,3410,8992.0,13873.0,16141.0,18735.0,22214.0,22863.0,23466.0,,
1984,5655,11555.0,15766.0,21266.0,23425.0,26083.0,27067.0,,,
1985,1092,9565.0,15836.0,22169.0,25955.0,26180.0,,,,
1986,1513,6445.0,11702.0,12935.0,15852.0,,,,,
1987,557,4020.0,10946.0,12314.0,,,,,,
1988,1351,6947.0,13112.0,,,,,,,
1989,3133,5395.0,,,,,,,,
1990,2063,,,,,,,,,


In the above example,
* `data` is the single `DataFrame` that contains columns representing all other arguments to the Triangle constructor. In our example, the dataset `raa_df`.
* `origin` is the representation of the accident, reporting or more generally the origin period of the triangle that will map to the `origin` dimension. In our example, the `origin` column
* `development` is the representation of the development/valuation periods of the triangle that will map to the `development` dimension. In our example, the `development` column.
* `columns` is the representation of the numeric data of the triangle that will map to the `columns` dimension. If `None`, then a single 'Total' key will be generated. In our example, the `values` column.
* `columuative` is the indicator of whether the triangle is cumulative or incremental. In our example, while it is not super obvious from looking at the raw data, our triangle dataset is actually a cumulative triangle. So we'll set this to `True`.
  
## Triangles Attributes

Now that we have our `Triangle` object declared within the `chainladder` package, we can get a lot of its attributes. First, let's get the latest diagonal of the Triangle with `.latest_diagonal`.

In [5]:
raa.latest_diagonal

Unnamed: 0,1990
1981,18834
1982,16704
1983,23466
1984,27067
1985,26180
1986,15852
1987,12314
1988,13112
1989,5395
1990,2063


Another attribute that is commonly used is `.link_ratio` to get the LDFs of the triangle.

In [6]:
raa.link_ratio

Unnamed: 0,12-24,24-36,36-48,48-60,60-72,72-84,84-96,96-108,108-120
1981,1.6498,1.319,1.0823,1.1469,1.1951,1.113,1.0333,1.0029,1.0092
1982,40.4245,1.2593,1.9766,1.2921,1.1318,0.9934,1.0434,1.0331,
1983,2.637,1.5428,1.1635,1.1607,1.1857,1.0292,1.0264,,
1984,2.0433,1.3644,1.3489,1.1015,1.1135,1.0377,,,
1985,8.7592,1.6556,1.3999,1.1708,1.0087,,,,
1986,4.2597,1.8157,1.1054,1.2255,,,,,
1987,7.2172,2.7229,1.125,,,,,,
1988,5.1421,1.8874,,,,,,,
1989,1.722,,,,,,,,


Another useful feature is the `.heatmap()` method.

In [7]:
raa.link_ratio.heatmap()

Unnamed: 0,12-24,24-36,36-48,48-60,60-72,72-84,84-96,96-108,108-120
1981,1.6498,1.319,1.0823,1.1469,1.1951,1.113,1.0333,1.0029,1.0092
1982,40.4245,1.2593,1.9766,1.2921,1.1318,0.9934,1.0434,1.0331,
1983,2.637,1.5428,1.1635,1.1607,1.1857,1.0292,1.0264,,
1984,2.0433,1.3644,1.3489,1.1015,1.1135,1.0377,,,
1985,8.7592,1.6556,1.3999,1.1708,1.0087,,,,
1986,4.2597,1.8157,1.1054,1.2255,,,,,
1987,7.2172,2.7229,1.125,,,,,,
1988,5.1421,1.8874,,,,,,,
1989,1.722,,,,,,,,


Here are some other attributes that might be useful to the user:
* `is_cumulative`: returns True if the data across the development periods is cumulative, or False if it is incremental.
* `is_ultimate`: returns True if the ultimate values are contained in the triangle.
* `is_val_tri`: returns True if the development period is stated as a valuation data as opposed to an age, i.e. Schedule P style triangle (True) or the more commonly used triangle by development age (False).
* `is_full`: returns True if the triangle has been "squared".

In [8]:
print("Is triangle cumulative?", raa.is_cumulative)
print("Does triangle contain ultimate projections?", raa.is_ultimate)
print("Is this a valuation triangle?", raa.is_val_tri)
print('Has the triangle been "squared"?', raa.is_full)

Is triangle cumulative? True
Does triangle contain ultimate projections? False
Is this a valuation triangle? False
Has the triangle been "squared"? False


We can also inspect the triangle to understand its data granularity with `origin_grain` and `development_grain`. The supported `grains` are:
* monthly: denoted with `M`
* quarterly: denoted with `Q`
* semi-annually: denoted with `S`
* annually: denoted with `Y`

In [9]:
print("Origin grain:", raa.origin_grain)
print("Development grain:", raa.development_grain)

Origin grain: Y
Development grain: Y


## Manipulating Triangles
There are also useful methods to convert an cumulative triangle into an incremental one with `.cum_to_incr()`.

In [10]:
raa.cum_to_incr()

Unnamed: 0,12,24,36,48,60,72,84,96,108,120
1981,5012,3257.0,2638.0,898.0,1734.0,2642.0,1828.0,599.0,54.0,172.0
1982,106,4179.0,1111.0,5270.0,3116.0,1817.0,-103.0,673.0,535.0,
1983,3410,5582.0,4881.0,2268.0,2594.0,3479.0,649.0,603.0,,
1984,5655,5900.0,4211.0,5500.0,2159.0,2658.0,984.0,,,
1985,1092,8473.0,6271.0,6333.0,3786.0,225.0,,,,
1986,1513,4932.0,5257.0,1233.0,2917.0,,,,,
1987,557,3463.0,6926.0,1368.0,,,,,,
1988,1351,5596.0,6165.0,,,,,,,
1989,3133,2262.0,,,,,,,,
1990,2063,,,,,,,,,


You can also convert an incremental triangle to a cumuative one with `.incr_to_cum()`.

In [11]:
raa.cum_to_incr().incr_to_cum()

Unnamed: 0,12,24,36,48,60,72,84,96,108,120
1981,5012,8269.0,10907.0,11805.0,13539.0,16181.0,18009.0,18608.0,18662.0,18834.0
1982,106,4285.0,5396.0,10666.0,13782.0,15599.0,15496.0,16169.0,16704.0,
1983,3410,8992.0,13873.0,16141.0,18735.0,22214.0,22863.0,23466.0,,
1984,5655,11555.0,15766.0,21266.0,23425.0,26083.0,27067.0,,,
1985,1092,9565.0,15836.0,22169.0,25955.0,26180.0,,,,
1986,1513,6445.0,11702.0,12935.0,15852.0,,,,,
1987,557,4020.0,10946.0,12314.0,,,,,,
1988,1351,6947.0,13112.0,,,,,,,
1989,3133,5395.0,,,,,,,,
1990,2063,,,,,,,,,


Another useful one is to convert a development triangle to a valuation triangle (Schedule P style), with `.dev_to_val()`.

In [12]:
raa.dev_to_val()

Unnamed: 0,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990
1981,5012.0,8269.0,10907.0,11805.0,13539.0,16181.0,18009.0,18608.0,18662.0,18834
1982,,106.0,4285.0,5396.0,10666.0,13782.0,15599.0,15496.0,16169.0,16704
1983,,,3410.0,8992.0,13873.0,16141.0,18735.0,22214.0,22863.0,23466
1984,,,,5655.0,11555.0,15766.0,21266.0,23425.0,26083.0,27067
1985,,,,,1092.0,9565.0,15836.0,22169.0,25955.0,26180
1986,,,,,,1513.0,6445.0,11702.0,12935.0,15852
1987,,,,,,,557.0,4020.0,10946.0,12314
1988,,,,,,,,1351.0,6947.0,13112
1989,,,,,,,,,3133.0,5395
1990,,,,,,,,,,2063


And of course, you can convert it back with `.val_to_dev()`.

In [13]:
raa.dev_to_val().val_to_dev()

Unnamed: 0,12,24,36,48,60,72,84,96,108,120
1981,5012,8269.0,10907.0,11805.0,13539.0,16181.0,18009.0,18608.0,18662.0,18834.0
1982,106,4285.0,5396.0,10666.0,13782.0,15599.0,15496.0,16169.0,16704.0,
1983,3410,8992.0,13873.0,16141.0,18735.0,22214.0,22863.0,23466.0,,
1984,5655,11555.0,15766.0,21266.0,23425.0,26083.0,27067.0,,,
1985,1092,9565.0,15836.0,22169.0,25955.0,26180.0,,,,
1986,1513,6445.0,11702.0,12935.0,15852.0,,,,,
1987,557,4020.0,10946.0,12314.0,,,,,,
1988,1351,6947.0,13112.0,,,,,,,
1989,3133,5395.0,,,,,,,,
1990,2063,,,,,,,,,


## 4-D Triangle

The triangle described so far is a two-dimensional (accident date by valuation date) structure that spans multiple cells of data. This is a useful structure for exploring individual triangles, but becomes more problematic when working with **sets** of triangles. `Pandas` does not have a triangle `dtype`, but if it did, working with sets of triangles would be much more convenient. To facilitate working with more than one triangle at a time,  `chainladder.Triangle` acts like a `pandas.DataFrame` (with an index and columns) where each cell (row x col) is an individual triangle. This structure manifests itself as a four-dimensional space. Let's take a look at another sample dataset, `clrd`.

In [14]:
clrd_df = pd.read_csv(
    "https://raw.githubusercontent.com/casact/chainladder-python/master/chainladder/utils/data/clrd.csv"
)
clrd_df.head()

Unnamed: 0,GRCODE,GRNAME,AccidentYear,DevelopmentYear,DevelopmentLag,IncurLoss,CumPaidLoss,BulkLoss,EarnedPremDIR,EarnedPremCeded,EarnedPremNet,Single,PostedReserve97,LOB
0,86,Allstate Ins Co Grp,1988,1988,1,367404,70571,127737,400699,5957,394742,0,281872,wkcomp
1,86,Allstate Ins Co Grp,1988,1989,2,362988,155905,60173,400699,5957,394742,0,281872,wkcomp
2,86,Allstate Ins Co Grp,1988,1990,3,347288,220744,27763,400699,5957,394742,0,281872,wkcomp
3,86,Allstate Ins Co Grp,1988,1991,4,330648,251595,15280,400699,5957,394742,0,281872,wkcomp
4,86,Allstate Ins Co Grp,1988,1992,5,354690,274156,27689,400699,5957,394742,0,281872,wkcomp


Let's load the data into the sets of triangles.

In [15]:
clrd = cl.Triangle(
    data=clrd_df,
    origin="AccidentYear",
    development="DevelopmentYear",
    columns=[
        "IncurLoss",
        "CumPaidLoss",
        "BulkLoss",
        "EarnedPremDIR",
        "EarnedPremCeded",
        "EarnedPremNet",
    ],
    index=["GRNAME", "LOB"],
    cumulative=True,
)
clrd

Unnamed: 0,Triangle Summary
Valuation:,1997-12
Grain:,OYDY
Shape:,"(775, 6, 10, 10)"
Index:,"[GRNAME, LOB]"
Columns:,"[IncurLoss, CumPaidLoss, BulkLoss, EarnedPremDIR, EarnedPremCeded, EarnedPremNet]"


In this example, `data`, `origin`, `development`, and `cumulative` are all no different from what we had done before. But, we need to use `columns` a bit differently, and use declare a new variable, `index`.

* `columns` is the list of "triangles" metrics that we will have. Think of this as the type of triangle metrics that we can use to describe a segment. e.g. Paid, Incurred, Closed Claim counts, or even exposure data. Note that even exposure data do not develop over time, they can still be presented in a triangle format. In our example, `IncurLoss` for financial incurred loss (not just reported, as in paid + case + IBNR), `CumPaidLoss` for paid loss, `BulkLoss` for case reserves, `EarnedPremDIR` for direct and assumed premium earned, `EarnedPremCeded` for ceded premium earned, and `EarnedPremNet` for the net premium earned.
* `index` is our portfolio segments. In our example, a combination of `GRNAME`, the Company Name, and `LOB`, the line of business.

Since 4D structures do not fit nicely on 2D screens, we see a summary view instead that describes the structure rather than the underlying data itself. 

We see 5 rows of information:
* Valuation: the valuation date.
* Grain: the granularity of the data, `O` stands for origin, and `D` stands for development, `OYDY` represents triangles with accident year by development year.
* Shape: contains 4 numbers, represents the 4-D structure. This sample triangle represents a collection of 775x6 or 4,650 triangles that are themselves 10 accident years by 10 development periods.
    * 775: the number of segments, which is the combination of `index`, that represents the data segments. In this case, it is each of the `GRNAME` and `LOB` combination.
    * 6: the number of triangles for each segment, which is also the columns `[IncurLoss, CumPaidLoss, BulkLoss, EarnedPremDIR, EarnedPremCeded, EarnedPremNet]`.
    * 10: the number of accident periods.
    * 10: the number of valuation periods.
* Index: the segmentation level of the triangles.
* Columns: the value types recorded in the triangles.

Now we have a 4D triangle, let's do some `pandas`-style operations. First, we can filter.

In [16]:
clrd[clrd["LOB"] == "wkcomp"]

Unnamed: 0,Triangle Summary
Valuation:,1997-12
Grain:,OYDY
Shape:,"(132, 6, 10, 10)"
Index:,"[GRNAME, LOB]"
Columns:,"[IncurLoss, CumPaidLoss, BulkLoss, EarnedPremDIR, EarnedPremCeded, EarnedPremNet]"


Note that only the shape changed, from `(775, 6, 10, 10)` to `(132, 6, 10, 10)`.

Next, you can use `.loc` to filter by index name.

In [17]:
clrd.loc["Allstate Ins Co Grp"]

Unnamed: 0,Triangle Summary
Valuation:,1997-12
Grain:,OYDY
Shape:,"(2, 6, 10, 10)"
Index:,[LOB]
Columns:,"[IncurLoss, CumPaidLoss, BulkLoss, EarnedPremDIR, EarnedPremCeded, EarnedPremNet]"


Let's see what LOB `Allstate Ins Co Grp` writes.

In [18]:
clrd.loc["Allstate Ins Co Grp"].index

Unnamed: 0,LOB
0,prodliab
1,wkcomp


Since we have `.loc`, we must also have `.iloc` by index location. You can even chain them together. 

Let's get `Allstate Ins Co Grp`'s `prodliab` by calling `iloc[0]` and get the `CumPaidLoss` triangle.

In [19]:
clrd.loc["Allstate Ins Co Grp"].iloc[0]["CumPaidLoss"]

Unnamed: 0,12,24,36,48,60,72,84,96,108,120
1988,1501,3916.0,8834.0,17450.0,22495.0,28687.0,31311.0,32039.0,36357.0,36358.0
1989,1697,5717.0,10442.0,18125.0,23284.0,30092.0,34338.0,41094.0,41164.0,
1990,1373,4002.0,10829.0,16695.0,21788.0,25332.0,34875.0,34893.0,,
1991,1069,4594.0,6920.0,9996.0,13249.0,19221.0,19256.0,,,
1992,1134,3068.0,5412.0,8210.0,19164.0,19187.0,,,,
1993,979,3079.0,6407.0,16113.0,16131.0,,,,,
1994,1397,2990.0,25688.0,26030.0,,,,,,
1995,1016,21935.0,22095.0,,,,,,,
1996,9852,10071.0,,,,,,,,
1997,319,,,,,,,,,


`iloc[...]` actually can take in 4 parameters, index, columns, origin, and development. For example, if we want `Allstate Ins Co Grp`'s `prodliab` index, we can search for it first, then call the indices of `CumPaidLoss`, and origin `1990`.

In [20]:
clrd.index[clrd.index["GRNAME"] == "Allstate Ins Co Grp"]

Unnamed: 0,GRNAME,LOB
21,Allstate Ins Co Grp,prodliab
22,Allstate Ins Co Grp,wkcomp


In [21]:
clrd.iloc[21, 1, 2, :]

Unnamed: 0,12,24,36,48,60,72,84,96,108,120
1990,1373,4002,10829,16695,21788,25332,34875,34893,,


We can also use other `pandas` filter functions, for example, getting the four `CumPaidLoss` diagonals between 1990 and 1993 wit hthe help of `.valuation`.

In [22]:
paid_tri = clrd.loc["Allstate Ins Co Grp"].iloc[0]["CumPaidLoss"]
paid_tri[(paid_tri.valuation >= "1990") & (paid_tri.valuation < "1994")]["CumPaidLoss"]

Unnamed: 0,12,24,36,48,60,72
1988,,,8834.0,17450.0,22495.0,28687.0
1989,,5717.0,10442.0,18125.0,23284.0,
1990,1373.0,4002.0,10829.0,16695.0,,
1991,1069.0,4594.0,6920.0,,,
1992,1134.0,3068.0,,,,
1993,979.0,,,,,


Another commonly used filter is `.development`, let's get the three columns between age 36 and age 60.

In [23]:
paid_tri[(paid_tri.development >= 36) & (paid_tri.development <= 60)]

Unnamed: 0,36,48,60
1988,8834.0,17450.0,22495.0
1989,10442.0,18125.0,23284.0
1990,10829.0,16695.0,21788.0
1991,6920.0,9996.0,13249.0
1992,5412.0,8210.0,19164.0
1993,6407.0,16113.0,16131.0
1994,25688.0,26030.0,
1995,22095.0,,
1996,,,
1997,,,


With complete flexibility in the ability to slice subsets of triangles, we can use basic arithmetic to derive new triangles, which is commonly used as diagnostics to explore trends.

In [24]:
clrd["CaseIncurLoss"] = clrd["IncurLoss"] - clrd["BulkLoss"]
clrd["CaseIncurLoss"]

Unnamed: 0,Triangle Summary
Valuation:,1997-12
Grain:,OYDY
Shape:,"(775, 1, 10, 10)"
Index:,"[GRNAME, LOB]"
Columns:,[CaseIncurLoss]


Note that even though `clrd["CaseIncurLoss"]` is declared as a new variable, it actually comes with all 775 "indexes", i.e. we have 775 `clrd["CaseIncurLoss"]` triangles. But we can use `.sum()` to see the sum of them.

In [25]:
clrd["CaseIncurLoss"].sum()

Unnamed: 0,12,24,36,48,60,72,84,96,108,120
1988,7778398,9872876.0,10537707.0,10973808.0,11175391.0,11265524.0,11288288.0,11305023.0,11323995.0,11327627.0
1989,8734319,10844720.0,11822136.0,12279311.0,12481505.0,12567543.0,12608487.0,12633539.0,12639258.0,
1990,9325252,11913461.0,12985113.0,13459843.0,13646077.0,13718445.0,13755879.0,13768960.0,,
1991,9564486,12159826.0,13216383.0,13659541.0,13821032.0,13903084.0,13964163.0,,,
1992,10539619,13125930.0,14120971.0,14563964.0,14755405.0,14850140.0,,,,
1993,11402448,14043343.0,15095232.0,15576086.0,15775057.0,,,,,
1994,12411107,15005424.0,16095699.0,16650937.0,,,,,,
1995,12686394,15140099.0,16223016.0,,,,,,,
1996,12627293,14956778.0,,,,,,,,
1997,12705993,,,,,,,,,


Let's look at the (sum of) Paid to (sum of) Incurred ratio triangle. Does it look like the ratios are changing over time? Using `.heatmap()` usually helps with spotting trends.

In [26]:
(clrd["CumPaidLoss"].sum() / clrd["CaseIncurLoss"].sum()).heatmap()

Unnamed: 0,12,24,36,48,60,72,84,96,108,120
1988,0.46,0.7151,0.8376,0.8987,0.9373,0.96,0.9739,0.9811,0.9865,0.9891
1989,0.4683,0.7344,0.8406,0.9038,0.9427,0.9643,0.9765,0.9843,0.9884,
1990,0.491,0.7394,0.846,0.9086,0.9438,0.965,0.978,0.9848,,
1991,0.486,0.737,0.844,0.9085,0.9473,0.9672,0.977,,,
1992,0.4876,0.7434,0.8518,0.9125,0.9483,0.9661,,,,
1993,0.4958,0.7548,0.8581,0.9176,0.9512,,,,,
1994,0.5033,0.7594,0.8602,0.9158,,,,,,
1995,0.5103,0.767,0.8636,,,,,,,
1996,0.522,0.7671,,,,,,,,
1997,0.5078,,,,,,,,,


Compare the result to `(clrd["CumPaidLoss"] / clrd["CaseIncurLoss"]).sum()`, which looks odd, can you figure out why? This is because we are summing all of the quotients of paid losses over incurred losses at each index.

In [27]:
(clrd["CumPaidLoss"] / clrd["CaseIncurLoss"]).sum()

Unnamed: 0,12,24,36,48,60,72,84,96,108,120
1988,183.17,292.57,331.87,398.92,494.88,473.44,485.36,490.86,489.8,491.31
1989,200.69,303.32,373.15,420.3,463.58,484.43,497.13,505.38,503.29,
1990,215.34,369.6,378.76,476.51,518.93,505.84,513.9,520.72,,
1991,164.69,327.69,396.14,444.79,485.42,503.48,514.19,,,
1992,218.25,340.39,411.63,476.67,499.92,516.52,,,,
1993,231.65,353.0,423.39,483.57,517.4,,,,,
1994,235.05,355.94,436.77,498.18,,,,,,
1995,235.37,369.36,445.79,,,,,,,
1996,245.99,384.58,,,,,,,,
1997,268.78,,,,,,,,,


## Triangle Adjustments

Another adjustment we can make to the triangle is to apply a trend. We can do that by calling `chainladder.Trend()`, which is actually an estimator. It takes in a few variables:
- `trends`: the list containing the annual trends expressed as a decimal. For example, 5% decrease should be stated as -0.05.
- `dates`: a list-like of (start, end) dates to correspond to the trend list.
- `axis` (options: [‘origin’, ‘valuation’]): the axis on which to apply the trend.

Let's say we want a 5% trend from `1992-12-31` to `1991-01-01`. You can then call `.trend_` attribute to view the trend factors.

In [28]:
cl.Trend(trends=[0.05], dates=[("1992-12-31", "1991-01-01")], axis="origin").fit(
    clrd["CumPaidLoss"].sum()
).trend_

Unnamed: 0,12,24,36,48,60,72,84,96,108,120
1988,1.098,1.098,1.098,1.098,1.098,1.098,1.098,1.098,1.098,1.098
1989,1.098,1.098,1.098,1.098,1.098,1.098,1.098,1.098,1.098,
1990,1.098,1.098,1.098,1.098,1.098,1.098,1.098,1.098,,
1991,1.05,1.05,1.05,1.05,1.05,1.05,1.05,,,
1992,1.0,1.0,1.0,1.0,1.0,1.0,,,,
1993,1.0,1.0,1.0,1.0,1.0,,,,,
1994,1.0,1.0,1.0,1.0,,,,,,
1995,1.0,1.0,1.0,,,,,,,
1996,1.0,1.0,,,,,,,,
1997,1.0,,,,,,,,,


Multipart trend is also possible, since `trends` and `dates` can accept lists.

In [29]:
cl.Trend(
    trends=[0.05, -0.10],
    dates=[("1992-12-31", "1991-01-01"), ("1990-12-31", "1989-01-01")],
    axis="origin",
).fit(clrd["CumPaidLoss"].sum()).trend_

Unnamed: 0,12,24,36,48,60,72,84,96,108,120
1988,0.8972,0.8972,0.8972,0.8972,0.8972,0.8972,0.8972,0.8972,0.8972,0.8972
1989,0.9882,0.9882,0.9882,0.9882,0.9882,0.9882,0.9882,0.9882,0.9882,
1990,1.098,1.098,1.098,1.098,1.098,1.098,1.098,1.098,,
1991,1.05,1.05,1.05,1.05,1.05,1.05,1.05,,,
1992,1.0,1.0,1.0,1.0,1.0,1.0,,,,
1993,1.0,1.0,1.0,1.0,1.0,,,,,
1994,1.0,1.0,1.0,1.0,,,,,,
1995,1.0,1.0,1.0,,,,,,,
1996,1.0,1.0,,,,,,,,
1997,1.0,,,,,,,,,


`chainladder.Triangle()` objects are awesome, but what if you need to get back out to `pandas`? `.to_frame()` is a very handy method to know. It converts the `chainladder.Triangle()` objects back to a `pandas.DataFrame()` object.

In [30]:
clrd["CumPaidLoss"].sum().to_frame()

Unnamed: 0,12,24,36,48,60,72,84,96,108,120
1988-01-01,3577780.0,7059966.0,8826151.0,9862687.0,10474698.0,10814576.0,10994014.0,11091363.0,11171590.0,11203949.0
1989-01-01,4090680.0,7964702.0,9937520.0,11098588.0,11766488.0,12118790.0,12311629.0,12434826.0,12492899.0,
1990-01-01,4578442.0,8808486.0,10985347.0,12229001.0,12878545.0,13238667.0,13452993.0,13559557.0,,
1991-01-01,4648756.0,8961755.0,11154244.0,12409592.0,13092037.0,13447481.0,13642414.0,,,
1992-01-01,5139142.0,9757699.0,12027983.0,13289485.0,13992821.0,14347271.0,,,,
1993-01-01,5653379.0,10599423.0,12953812.0,14292516.0,15005138.0,,,,,
1994-01-01,6246447.0,11394960.0,13845764.0,15249326.0,,,,,,
1995-01-01,6473843.0,11612151.0,14010098.0,,,,,,,
1996-01-01,6591599.0,11473912.0,,,,,,,,
1997-01-01,6451896.0,,,,,,,,,


Sometimes you just want the content copied to your clipboard, you can call `.to_clipboard()` and paste the result anywhere you like. 

In [31]:
clrd["CumPaidLoss"].sum().to_clipboard()

Other data I/O methods that you might want to know are `.to_json()` and `.to_pickle`. The inverse `chainladder.read_json()` and `chainladder.read_pickle()` are also available, but we won't explore them anymore here.

Now that we feel comfortable going in and out of `chainladder`, let's jump back in `chainladder` and explore some of the functions that an actuary often perform when working with triangles.

# Triangle Development

## Compute Loss Development Factors

Actuaries often spend lots of time trying to fine-tune their development factors, so let's explore the ways that `chainladder` can help us do that. `chainladder.Develompent()` is another helpful estimator, and has many useful attributes such as `.ldf_` or `.cdf_`.

Let's look at the dataset that we are already familiar with `clrd["CumPaidLoss"].sum()`.

In [32]:
cl.Development().fit(clrd["CumPaidLoss"].sum()).ldf_

Unnamed: 0,12-24,24-36,36-48,48-60,60-72,72-84,84-96,96-108,108-120
(All),1.8645,1.2309,1.1091,1.055,1.0283,1.0158,1.0089,1.0059,1.0029


In [33]:
cl.Development().fit(clrd["CumPaidLoss"].sum()).cdf_

Unnamed: 0,12-Ult,24-Ult,36-Ult,48-Ult,60-Ult,72-Ult,84-Ult,96-Ult,108-Ult
(All),2.8549,1.5312,1.244,1.1216,1.0631,1.0338,1.0178,1.0088,1.0029


Remember `incr_to_cum()` from earlier? It works with development factors too!

In [34]:
cl.Development().fit(clrd["CumPaidLoss"].sum()).ldf_.incr_to_cum()

Unnamed: 0,12-Ult,24-Ult,36-Ult,48-Ult,60-Ult,72-Ult,84-Ult,96-Ult,108-Ult
(All),2.8549,1.5312,1.244,1.1216,1.0631,1.0338,1.0178,1.0088,1.0029


You may have noticed that these attributes have a trailing underscore (\_). This is scikit-learn’s API convention, as its documentation states, “attributes that have been estimated from the data must always have a name ending with trailing underscore”. For example, the coefficients of some regression estimator would be stored in a `coef_` attribute after fit has been called. In summary, the trailing underscore in class attributes is a scikit-learn’s convention to denote that the attributes are estimated, or to denote that they are fitted attributes.

Now, you might ask, how are the averages calculated? By default, `chainladder.Development()` calculates these averages using all data and volume average the factors. Note that since the `.average` attribute is not estimated, it has no underscore following it.

In [35]:
cl.Development().fit(clrd["CumPaidLoss"].sum()).average

'volume'

Other averaging methods are `simple` and `regression`. The OLS `regression` estimate of development factors where the regression equation is Y = mX + 0.

In [36]:
simple_ldf = cl.Development(average="simple").fit(clrd["CumPaidLoss"].sum()).ldf_
simple_ldf

Unnamed: 0,12-24,24-36,36-48,48-60,60-72,72-84,84-96,96-108,108-120
(All),1.8782,1.2333,1.1099,1.0555,1.0286,1.0158,1.0089,1.006,1.0029


In [37]:
regression_ldf = (
    cl.Development(average="regression").fit(clrd["CumPaidLoss"].sum()).ldf_
)
regression_ldf

Unnamed: 0,12-24,24-36,36-48,48-60,60-72,72-84,84-96,96-108,108-120
(All),1.8516,1.2285,1.1083,1.0546,1.0281,1.0157,1.0089,1.0058,1.0029


Remember, you can do simple arithmetic with any Triangle object.

In [38]:
simple_ldf - regression_ldf

Unnamed: 0,12-24,24-36,36-48,48-60,60-72,72-84,84-96,96-108,108-120
(All),0.0267,0.0047,0.0016,0.0009,0.0005,0.0001,0.0001,0.0001,


We can also vary the `average` used for each age-to-age factors. Here we have 9 age-to-age factors, so we can supply an array of averages to use. Here we set the averages to `volume`, `simple`, `regression`, `volume`, `simple`, `regression`, `volume`, `simple`, `regression` for 12-24, 24-36, etc, respectively.

In [39]:
cl.Development(average=["volume", "simple", "regression"] * 3).fit(
    clrd["CumPaidLoss"].sum()
).ldf_

Unnamed: 0,12-24,24-36,36-48,48-60,60-72,72-84,84-96,96-108,108-120
(All),1.8645,1.2333,1.1083,1.055,1.0286,1.0157,1.0089,1.006,1.0029


By default, `chainladder.Development()` sets `n_periods` as `-1` (use all data), which is the number of most recent periods to include in the calculation of averages. Let's try using only the most recent 3 periods.

In [40]:
cl.Development(average="simple", n_periods=3).fit(clrd["CumPaidLoss"].sum()).ldf_

Unnamed: 0,12-24,24-36,36-48,48-60,60-72,72-84,84-96,96-108,108-120
(All),1.7862,1.2146,1.1032,1.0526,1.0268,1.0155,1.0089,1.006,1.0029


## Discarding Problematic Link Ratios

From time to time, there might be certain data points that we may want to exclude from the calculation of loss development factors. For example, let's say we want to discard valuation 1991 (the diagonal).

In [41]:
cl.Development(drop_valuation="1991").fit(clrd["CumPaidLoss"].sum()).ldf_

Unnamed: 0,12-24,24-36,36-48,48-60,60-72,72-84,84-96,96-108,108-120
(All),1.8576,1.2287,1.108,1.0539,1.0283,1.0158,1.0089,1.0059,1.0029


Or that we want to drop a specific `origin` year's `age`.

In [42]:
cl.Development(drop=("1992", 24)).fit(clrd["CumPaidLoss"].sum()).ldf_

Unnamed: 0,12-24,24-36,36-48,48-60,60-72,72-84,84-96,96-108,108-120
(All),1.8645,1.2306,1.1091,1.055,1.0283,1.0158,1.0089,1.0059,1.0029


Or calculating the averages using the Olympic Average method, discarding the highest or lowest, or the highest or lowest n factors.

In [43]:
cl.Development(drop_high=[True, True, False, True], drop_low=[1, 2, 0, 3]).fit(
    clrd["CumPaidLoss"].sum()
).ldf_

Unnamed: 0,12-24,24-36,36-48,48-60,60-72,72-84,84-96,96-108,108-120
(All),1.8761,1.2379,1.1091,1.0574,1.0283,1.0158,1.0089,1.0059,1.0029


## Setting Development Factors Manually

Sometimes, we want to manually set the age-to-age factors. `chainladder.DevelopmentConstant()` does exactly that. Note that we must also specify the `style` as `ldf` or `cdf`.

In [44]:
manual_patterns = {
    12: 1.8,
    24: 1.25,
    36: 1.10,
    48: 1.05,
    60: 1.03,
    72: 1.02,
    84: 1.01,
    96: 1.005,
    108: 1.002,
}
cl.DevelopmentConstant(patterns=manual_patterns, style="ldf").fit(
    clrd["CumPaidLoss"].sum()
).ldf_

Unnamed: 0,12-24,24-36,36-48,48-60,60-72,72-84,84-96,96-108,108-120
(All),1.8,1.25,1.1,1.05,1.03,1.02,1.01,1.005,1.002


Finally, before we go further, in `scikit-learn`, there are two types of estimators: transformers and predictors. A transformer transforms the input data (X) in some ways, and a predictor predicts a new value (or values, Y) by using the input data X.

`chainladder.Development()` and `chainladder.DevelopmentConstant()` are both transformers. The returned object is a means to create development patterns, which is used to estimate ultimates, but itself is not a IBNR estimation model, or a predictor.

In addition to `fit`, transformers come with the `transform` and `fit_transform` method. These will return a `chainladder.Triangle` object, but augment it with additional information for use in a subsequent IBNR model (a predictor). For example, `drop_high` can take an array of boolean variables, indicating if the highest factor should be dropped for each of the LDF calculation. 

Look at this example, calling `cl.Development().fit(clrd["CumPaidLoss"].sum()).ldf_` again actually doesn't give the transformed loss development factors that we had manually set, but using `fit_transform()` will actually modify the underlying attribute of the triangle, so we get the updated loss development factors.

In [45]:
cl.Development().fit(clrd["CumPaidLoss"].sum()).ldf_

Unnamed: 0,12-24,24-36,36-48,48-60,60-72,72-84,84-96,96-108,108-120
(All),1.8645,1.2309,1.1091,1.055,1.0283,1.0158,1.0089,1.0059,1.0029


In [46]:
transformed_paid = cl.DevelopmentConstant(
    patterns=manual_patterns, style="ldf"
).fit_transform(clrd["CumPaidLoss"].sum())
transformed_paid.ldf_

Unnamed: 0,12-24,24-36,36-48,48-60,60-72,72-84,84-96,96-108,108-120
(All),1.8,1.25,1.1,1.05,1.03,1.02,1.01,1.005,1.002


One of the major benefits of `chainladder` is that it can handle several (or all) triangles simultaneously. While this can be a convenient shorthand, all these estimators will use the same assumptions across every triangle, as expected.

In [47]:
clrd_lob = cl.load_sample("clrd").groupby("LOB").sum()["CumPaidLoss"]
print("Fitting to " + str(len(clrd_lob.index)) + " industries simultaneously.")
cl.Development().fit_transform(clrd_lob).ldf_

Fitting to 6 industries simultaneously.


  if isinstance(self.groups.dtypes.index, pd.MultiIndex):
  index = pd.DataFrame(self.groups.dtypes.index)


Unnamed: 0,Triangle Summary
Valuation:,2261-12
Grain:,OYDY
Shape:,"(6, 1, 1, 9)"
Index:,[LOB]
Columns:,[CumPaidLoss]


## Correlation Tests

`chainladder` also has functionality to tests for possible violation of assumptions. The two main tests are:

1. The `valuation_correlation` test:
    * This test tests for the assumption of independence of `origin` years. In fact, it tests for correlation across calendar periods (diagonals), and by extension, origin periods (rows).
    * An additional parameter, `total`, can be passed, depending on if we want to calculate valuation correlation in total across all origins (`True`), or for each origin separately (`False`).
    * The test uses Z-statistic.
2. The `development_correlation` test:
    * This test tests for the assumption of independence of the chain ladder method that assumes that subsequent development factors are not correlated (columns).
    * The test uses T-statistic.

In [48]:
print(
    "Are valuation years correlated? I.e., are origins years correlated?",
    clrd["CumPaidLoss"]
    .sum()
    .valuation_correlation(p_critical=0.1, total=True)
    .z_critical.values,
)
print(
    "Are development periods coorelated?",
    clrd["CumPaidLoss"].sum().development_correlation(p_critical=0.5).t_critical.values,
)

Are valuation years correlated? I.e., are origins years correlated? [[False]]
Are development periods coorelated? [[ True]]


  med = xp.nanmedian(


# Extending Development Patterns with Tail

## Setting Tail Factors Manually

Often, a tail factor is necessary to supplement our loss development factors, since our triangle is too "small". `chainladder.TailConstant()` is a useful estimator that has `tail` and `decay` that allow us to fine-tune our fail factors.
* `tail`: The constant to apply to all LDFs within a triangle object.
* `decay`: An exponential decay constant that allows for decay over future development periods. A decay rate of 0.5 sets the development portion of each successive LDF to 50% of the previous LDF.

In [49]:
cl.TailConstant(tail=1.005, decay=0.50).fit(clrd["CumPaidLoss"].sum()).cdf_

Unnamed: 0,12-Ult,24-Ult,36-Ult,48-Ult,60-Ult,72-Ult,84-Ult,96-Ult,108-Ult,120-Ult,132-Ult
(All),2.8692,1.5388,1.2502,1.1272,1.0684,1.039,1.0229,1.0138,1.0079,1.005,1.0025


In [50]:
cl.TailConstant(tail=1.005, decay=0.50).fit(
    clrd["CumPaidLoss"].sum()
).cdf_ / cl.Development().fit(clrd["CumPaidLoss"].sum()).cdf_

Unnamed: 0,12-Ult,24-Ult,36-Ult,48-Ult,60-Ult,72-Ult,84-Ult,96-Ult,108-Ult,120-Ult,132-Ult
(All),1.005,1.005,1.005,1.005,1.005,1.005,1.005,1.005,1.005,,


## Compute Tail Factors

`chainladder.TailCurve()` is another class of tail transformers. Similar to the `chainladder.Development()` or `chainladder.TailConstant()` estimator, it comes with `fit`, `transform` and `fit_transform` methods. Also, like our `chainladder.Development()` estimator, you can define a tail in the absence of data or if you believe development will continue beyond your latest evaluation period.

Here, we can extend our development factors from 120 months to 144 months. 

In [51]:
clrd["CumPaidLoss"].sum().development.max()

120

In [52]:
tail = cl.TailCurve()
tail.fit(clrd["CumPaidLoss"].sum()).ldf_

Unnamed: 0,12-24,24-36,36-48,48-60,60-72,72-84,84-96,96-108,108-120,120-132,132-144
(All),1.8645,1.2309,1.1091,1.055,1.0283,1.0158,1.0089,1.0059,1.0029,1.0012,1.0013


These extra twelve months (144 - 120, or one year) of development patterns are included, as it is typical for actuaries to track IBNR run-off over a 1-year time horizon from the valuation date. The tail extension is currently fixed at one year and there is no ability to extend it even further. However, a subsequent version of `chainladder` may address this issue.

Curve fitting takes selected development patterns and extrapolates them using either an `exponential` or `inverse_power` fit. In most cases, the `inverse_power` produces a thicker (more conservative) tail.

In [53]:
exp = cl.TailCurve(curve="exponential").fit(clrd["CumPaidLoss"].sum())
exp.tail_

Unnamed: 0,120-Ult
(All),1.002558


In [54]:
inv_power = cl.TailCurve(curve="inverse_power").fit(clrd["CumPaidLoss"].sum())
inv_power.tail_

Unnamed: 0,120-Ult
(All),1.026366


When fitting a tail, by default, all of the data will be used; however, we can specify which period of development patterns we want to begin including in the curve fitting process with `fit_period`, which takes a tuple of `start` and `stop` period. `None` can be used to ignore `start` or `stop`. For example, `(48, None)` will use development factors for age 48 and beyond. Alternatively, passing a list of booleans `[True, False, …]` will allow for excluding (False) any development patterns from fitting.

Patterns will also be generated for 100 periods beyond the end of the triangle by default, or we can specify how far beyond the triangle to project the tail factor to before dropping the age-to-age factor down to 1.0 using `extrap_periods`.

Note that even though we can extrapolate the curve many years beyond the end of the triangle for computational purposes, the resultant development factors will compress all ldf_ beyond one year into a single age-ultimate factor.

Let's ignore the first 3 development patterns for curve fitting but including the rest. Let's also allow our tail extrapolation to go 50 periods beyond the end of the triangle. Note that both `fit_period` and `extrap_periods` follow the `development_grain` of the underlying triangle being fit.

In [55]:
cl.TailCurve(fit_period=(36, None), extrap_periods=50).fit(
    clrd["CumPaidLoss"].sum()
).ldf_

Unnamed: 0,12-24,24-36,36-48,48-60,60-72,72-84,84-96,96-108,108-120,120-132,132-144
(All),1.8645,1.2309,1.1091,1.055,1.0283,1.0158,1.0089,1.0059,1.0029,1.0016,1.002


# IBNR Models

## Chainladder Model

Now that we have set and transformed the triangles' loss development factors, the IBNR estimators are the final stage in analyzing reserve estimates in the `chainladder` package. These estimators have a `predict` method as opposed to a `transform` method.

The most popular method, the chainladder method, can be called with `chainladder.Chainladder()`. The basic chainladder method is entirely specified by its development pattern selections. For this reason, the `chainladder.Chainladder()` estimator takes no additional assumptions, i.e. no additional arguments is needed.

In [58]:
cl_mod = cl.Chainladder().fit(clrd["CumPaidLoss"].sum())
cl_mod

All IBNR models come with common attributes. First, the `.ultimate_` attribute, which gives the ultimate estimates from using the underlying model.

In [59]:
cl_mod.ultimate_

Unnamed: 0,2261
1988,11203949
1989,12529085
1990,13678774
1991,13884829
1992,14832204
1993,15951756
1994,17103605
1995,17428397
1996,17568511
1997,18419602


Note that ultimates are measured at a valuation date way into the future. The library is extraordinarily conservative in picking this date, and sets it to December 31, 2261. This is set globally and can be viewed by referencing the `ULT_VAL` constant. This is a very common maximum time value across multiple python packages and holds no additional meaning other than that is commonly chosen.

In [62]:
cl.options.get_option("ULT_VAL")

'2261-12-31 23:59:59.999999999'

If for some reason, year `2261` is not far enough out the future for you, you can change this to whatever value you like.

In [67]:
cl.options.set_option("ULT_VAL", "2050-12-31 23:59:59.999999999")
print(cl.options.get_option("ULT_VAL"))
cl.options.set_option("ULT_VAL", "2261-12-31 23:59:59.999999999")  # Resetting it back

2050-12-31 23:59:59.999999999


Another commonly used attribute that is shared across all models is the `.ibnr_` attribute, which is calculated as the difference between `.ultimate_` and `.latest_diagonal`.

In [68]:
cl_mod.ibnr_

Unnamed: 0,2261
1988,
1989,36186.0
1990,119217.0
1991,242415.0
1992,484933.0
1993,946618.0
1994,1854279.0
1995,3418299.0
1996,6094599.0
1997,11967706.0


Other attributes that actuaries might be interested in are the `.full_triangle_` and `.full_expectation_` attributes. While the `.full_expectation_` is entirely based on `.ultimate_` values and development patterns, the `.full_triangle_` is a blend of the existing triangle. These are useful for conducting an analysis of actual results vs model expectations.

In [69]:
cl_mod.full_triangle_

Unnamed: 0,12,24,36,48,60,72,84,96,108,120,132,9999
1988,3577780,7059966,8826151,9862687,10474698,10814576,10994014,11091363,11171590,11203949,11203949,11203949
1989,4090680,7964702,9937520,11098588,11766488,12118790,12311629,12434826,12492899,12529085,12529085,12529085
1990,4578442,8808486,10985347,12229001,12878545,13238667,13452993,13559557,13639268,13678774,13678774,13678774
1991,4648756,8961755,11154244,12409592,13092037,13447481,13642414,13763816,13844727,13884829,13884829,13884829
1992,5139142,9757699,12027983,13289485,13992821,14347271,14573249,14702934,14789366,14832204,14832204,14832204
1993,5653379,10599423,12953812,14292516,15005138,15430219,15673254,15812728,15905684,15951756,15951756,15951756
1994,6246447,11394960,13845764,15249326,16088634,16544409,16804993,16954539,17054207,17103605,17103605,17103605
1995,6473843,11612151,14010098,15538906,16394152,16858582,17124115,17276500,17378061,17428397,17428397,17428397
1996,6591599,11473912,14122731,15663829,16525951,16994115,17261782,17415392,17517770,17568511,17568511,17568511
1997,6451896,12029756,14806895,16422650,17326537,17817381,18098015,18259066,18366403,18419602,18419602,18419602


In [70]:
cl_mod.full_expectation_

Unnamed: 0,12,24,36,48,60,72,84,96,108,120,132,9999
1988,3924445,7317247,9006475,9989278,10539078,10837640,11008339,11106301,11171590,11203949,11203949,11203949
1989,4388605,8182687,10071707,11170750,11785578,12119452,12310340,12419888,12492899,12529085,12529085,12529085
1990,4791310,8933543,10995903,12195797,12867041,13231552,13439957,13559557,13639268,13678774,13678774,13678774
1991,4863486,9068117,11161544,12379512,13060868,13430870,13642414,13763816,13844727,13884829,13884829,13884829
1992,5195326,9686843,11923107,13224178,13952024,14347271,14573249,14702934,14789366,14832204,14832204,14832204
1993,5587475,10418017,12823076,14222354,15005138,15430219,15673254,15812728,15905684,15951756,15951756,15951756
1994,5990937,11170284,13749009,15249326,16088634,16544409,16804993,16954539,17054207,17103605,17103605,17103605
1995,6104703,11382404,14010098,15538906,16394152,16858582,17124115,17276500,17378061,17428397,17428397,17428397
1996,6153781,11473912,14122731,15663829,16525951,16994115,17261782,17415392,17517770,17568511,17568511,17568511
1997,6451896,12029756,14806895,16422650,17326537,17817381,18098015,18259066,18366403,18419602,18419602,18419602


And of course, you can back test to see how close the actuals are compared to what the model thinks in the upper left side of our triangle.

In [76]:
residuals = cl_mod.full_expectation_ - cl_mod.full_triangle_
residuals[residuals.valuation < clrd.valuation_date]

Unnamed: 0,12,24,36,48,60,72,84,96,108
1988,346665,257281.0,180324.0,126591.0,64380.0,23064.0,14325.0,14938.0,
1989,297925,217985.0,134187.0,72162.0,19090.0,662.0,-1289.0,-14938.0,
1990,212868,125057.0,10556.0,-33204.0,-11504.0,-7115.0,-13036.0,,
1991,214730,106362.0,7300.0,-30080.0,-31169.0,-16611.0,,,
1992,56184,-70856.0,-104876.0,-65307.0,-40797.0,,,,
1993,-65904,-181406.0,-130736.0,-70162.0,,,,,
1994,-255510,-224676.0,-96755.0,,,,,,
1995,-369140,-229747.0,,,,,,,
1996,-437818,,,,,,,,


With this, we can also force the IBNR run-off of future periods, let's say we want the next three years'.

In [79]:
cl_mod.full_triangle_.dev_to_val().cum_to_incr().loc[..., "1998":"2000"]

Unnamed: 0,1998,1999,2000
1988,,,
1989,36186.0,,
1990,79711.0,39507.0,
1991,121402.0,80911.0,40102.0
1992,225978.0,129685.0,86432.0
1993,425081.0,243035.0,139474.0
1994,839308.0,455775.0,260584.0
1995,1528808.0,855246.0,464431.0
1996,2648819.0,1541098.0,862122.0
1997,5577860.0,2777139.0,1615756.0


Most of the above methods from the `chainladder.Chainladder()` model apply to actuarial models inside `chainladder`, which we will not repeatedly demonstrate. 

Let's look at another model, the `chainladder.ExpectedLoss()` model, which is when we know the ultimate loss already (but then why would we estimate ultimate losses?). The Expected Loss model requires one input assumption, the `aprior`, which is a scalar multiplier that will be applied to an exposure vector, that will produce an a priori ultimate estimate vector that we can use for the model.

Let's assume that our `aprior` is 80% of the scalar multiplier, and that this multiplier should be applied to `clrd["EarnedPremDIR"]`. But first, let's revisit `clrd["EarnedPremDIR"].sum()`.

In [87]:
clrd["EarnedPremDIR"].sum()

Unnamed: 0,12,24,36,48,60,72,84,96,108,120
1988,14759891,14759891.0,14759891.0,14759891.0,14759891.0,14759891.0,14759891.0,14759891.0,14759891.0,14759891.0
1989,16251494,16251494.0,16251494.0,16251494.0,16251494.0,16251494.0,16251494.0,16251494.0,16251494.0,
1990,17967080,17967080.0,17967080.0,17967080.0,17967080.0,17967080.0,17967080.0,17967080.0,,
1991,19662971,19662971.0,19662971.0,19662971.0,19662971.0,19662971.0,19662971.0,,,
1992,21208358,21208358.0,21208358.0,21208358.0,21208358.0,21208358.0,,,,
1993,22951940,22951940.0,22951940.0,22951940.0,22951940.0,,,,,
1994,24758613,24758613.0,24758613.0,24758613.0,,,,,,
1995,26121518,26121518.0,26121518.0,,,,,,,
1996,26810956,26810956.0,,,,,,,,
1997,27076444,,,,,,,,,


## Expected Loss Model

Remember that `chainladder.ExpectedLoss()` applies the scaler to a vector, and not a triangle. But we also see that the premium does not develop over time, so we can just get any vector we want. With that said, we will use the `.latest_diagonal` premium vector.

In [84]:
cl.ExpectedLoss(apriori=0.80).fit(
    clrd["CumPaidLoss"].sum(), sample_weight=clrd["EarnedPremDIR"].latest_diagonal.sum()
).ultimate_

Unnamed: 0,2261
1988,11807913
1989,13001195
1990,14373664
1991,15730377
1992,16966686
1993,18361552
1994,19806890
1995,20897214
1996,21448765
1997,21661155


A very common question that one might ask is what is the difference between:

```python
cl.ExpectedLoss(apriori=0.80).fit(..., sample_weight=weight_vector)
```

versus 

```python
cl.ExpectedLoss(apriori=1.00).fit(..., sample_weight=weight_vector*0.80)
```

Here is where `chainladder` follows `scikit-learn`'s implementation philosophy closely. The `apriori=...` inside the estimator, `chainladder.ExpectedLoss()` is a model parameter, whereas the inputs inside `fit()` are the data that the model will be applied to. With that said, only the first code block's syntax is theoretically correct, because in the second scenario, we are technically modifying our data, while not using the model assumption (i.e. `apriori`) correctly, even though they will yield identical results.

Let's apply the same model to `clrd["IncurLoss"].sum()` just to make sure that we get the exact same ultimates.

In [91]:
cl.ExpectedLoss(apriori=0.80).fit(
    clrd["IncurLoss"].sum(), sample_weight=clrd["EarnedPremDIR"].latest_diagonal.sum()
).ultimate_

  xp.nansum(w * x * y, axis) - xp.nansum(x * w, axis) * xp.nanmean(y, axis)
  intercept = xp.nanmean(y, axis) - slope * xp.nanmean(x, axis)


Unnamed: 0,2261
1988,11807913
1989,13001195
1990,14373664
1991,15730377
1992,16966686
1993,18361552
1994,19806890
1995,20897214
1996,21448765
1997,21661155


## Bornhuetter-Ferguson

The `chainladder.BornhuetterFerguson()` estimator is another method having many of the same attributes as the `chainladder.Chainladder()` estimator. It comes with one input assumption, the a priori (`apriori`), a scalar multiplier that will be applied to an exposure vector, which will produce an a priori ultimate estimate vector that we can use for the model, which works exactly like `chainladder.ExpectedLoss()`.

In [92]:
cl.BornhuetterFerguson(apriori=0.80).fit(
    clrd["IncurLoss"].sum(), sample_weight=clrd["EarnedPremDIR"].latest_diagonal.sum()
).ultimate_

  xp.nansum(w * x * y, axis) - xp.nansum(x * w, axis) * xp.nanmean(y, axis)
  intercept = xp.nanmean(y, axis) - slope * xp.nanmean(x, axis)


Unnamed: 0,2261
1988,11396981
1989,12722246
1990,13891129
1991,14077556
1992,14978689
1993,15934742
1994,16908850
1995,16860625
1996,16303462
1997,15825251


## Benktander

The `chainladder.Benktander()` method is similar to the Bornhuetter-Ferguson method, but allows for the specification of one additional assumption, `n_iters`, the number of iterations to recalculate the ultimates. The Benktander method generalizes both the Bornhuetter-Ferguson and the Chainladder estimator through this assumption.

- When `n_iters = 1`, the result is equivalent to the Bornhuetter-Ferguson estimator.
- When `n_iters` is sufficiently large, the result converges to the Chainladder estimator.