These tutorials are written to help you get familiar with some of the common functionalities that most actuaries can use in their day-to-day responsibilities that are provided by the `chainladder` package. We will also be using the datasets that already come included with the package, allowing you to follow and reproduce the results as shown here.

Keep in mind that these tutorials were written to only demonstrate the functionalities of the package, and the user should always follow all applicable laws, the Code of Professional Conduct, applicable Actuarial Standards of Practice, and exercise their best actuarial judgement. These tutorials are not written in a way that encourage certain workflow, or recommendation, when it comes to analyzing a dataset or rendering an actuarial opinion.

The tutorials assume that you have the basic understanding of commonly used actuarial terms, and can independently perform an actuarial analysis in another tool, such as Microsoft Excel or another actuarial software. Furthermore, it is assumed that you already have some familiarity with Python, and that you have the basic knowledge and experience in using some common packages that are popular in the Python community, such as `pandas` and `numpy`.

All tutorials and exercises rely on `chainladder` v0.8.18 and later. If you have trouble reconciling the results from your workflow to this tutorial, you should verify the versions of the packages installed in your work environment and check the release notes in case updates patches are issued subsequently.

In [2]:
import pandas as pd
import numpy as np
import chainladder as cl

print("pandas: " + pd.__version__)
print("numpy: " + np.__version__)
print("chainladder: " + cl.__version__)

pandas: 2.1.4
numpy: 1.26.3
chainladder: 0.8.18


# Working with Triangles

Let's begin by looking at a raw triangle dataset and load it into a `pandas.DataFrame`. We'll use the data `raa`, which is available from the repository. Note that this dataset is currently in `csv` format.

In [3]:
raa_df = pd.read_csv(
    "https://raw.githubusercontent.com/casact/chainladder-python/master/chainladder/utils/data/raa.csv"
)
raa_df.head(20)

Unnamed: 0,development,origin,values
0,1981,1981,5012.0
1,1982,1982,106.0
2,1983,1983,3410.0
3,1984,1984,5655.0
4,1985,1985,1092.0
5,1986,1986,1513.0
6,1987,1987,557.0
7,1988,1988,1351.0
8,1989,1989,3133.0
9,1990,1990,2063.0


The dataset has three columns: 
* development: or valuation time, in this case, the valuation year
* origin: or accident date, in this case, the accident year
* values: the values recorded for the specific accident date at the specific valuation time (such as incurred losses, paid losses, or claim counts), in this case, these are just "values" within the triangle, and has no specific metrics unit associated with them

A table of loss experience showing total losses for a certain period (origin) at various, regular valuation dates (development), reflects the change in amounts as claims mature and emerge. Older periods in the table will have one more entry than the next youngest period, leading to the triangle shape of the data in the table or any other measure that matures over time from an origin date. Loss triangles can be used to determine loss development for a given risk.

Let's put our data into the `chainladder.Triangle` format.

In [6]:
raa = cl.Triangle(
    data = raa_df,
    origin="origin",
    development="development",
    columns="values",
    cumulative=True,
)
raa

Unnamed: 0,12,24,36,48,60,72,84,96,108,120
1981,5012,8269.0,10907.0,11805.0,13539.0,16181.0,18009.0,18608.0,18662.0,18834.0
1982,106,4285.0,5396.0,10666.0,13782.0,15599.0,15496.0,16169.0,16704.0,
1983,3410,8992.0,13873.0,16141.0,18735.0,22214.0,22863.0,23466.0,,
1984,5655,11555.0,15766.0,21266.0,23425.0,26083.0,27067.0,,,
1985,1092,9565.0,15836.0,22169.0,25955.0,26180.0,,,,
1986,1513,6445.0,11702.0,12935.0,15852.0,,,,,
1987,557,4020.0,10946.0,12314.0,,,,,,
1988,1351,6947.0,13112.0,,,,,,,
1989,3133,5395.0,,,,,,,,
1990,2063,,,,,,,,,


In the above example,
* `data` is the single `DataFrame` that contains columns representing all other arguments to the Triangle constructor. In our example, the dataset `raa_df`.
* `origin` is the representation of the accident, reporting or more generally the origin period of the triangle that will map to the `origin` dimension. In our example, the `origin` column
* `development` is the representation of the development/valuation periods of the triangle that will map to the `development` dimension. In our example, the `development` column.
* `columns` is the representation of the numeric data of the triangle that will map to the `columns` dimension. If `None`, then a single 'Total' key will be generated. In our example, the `values` column.
* `columuative` is the indicator of whether the triangle is cumulative or incremental. In our example, while it is not super obvious from looking at the raw data, our triangle dataset is actually a cumulative triangle. So we'll set this to `True`.