# Chain Ladder Analysis for Claims Reserving

## Introduction

In this notebook, I'll be performing a Chain Ladder analysis to predict insurance claims and calculate reserves, specifically focusing on IBNR (Incurred But Not Reported) claims.

## Set Up Our Environment

Loading all of the necessary libraries for this project.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression

## Import the Data

Import the data contained in a csv file into a pandas dataframe.

In [84]:
# Reading the CSV file
df = pd.read_csv(r"C:\Users\muthomig\Downloads\ibnr.csv")
df

Unnamed: 0,AY,0,1,2,3,4
0,2014,100,150.0,180.0,200.0,225.0
1,2015,112,173.0,215.0,245.0,
2,2016,118,185.0,233.0,,
3,2017,122,195.0,,,
4,2018,125,,,,


## Set AY- Accident Year as the index

Set the accident year (AY) as the index of the DataFrame for easier data manipulation and analysis. Additionally, I remove the AY column from the DataFrame, as it's no longer needed after being set as the index.

In [87]:
df.set_index(df['AY'], inplace=True)
del df['AY']
df

Unnamed: 0_level_0,0,1,2,3,4
AY,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2014,100,150.0,180.0,200.0,225.0
2015,112,173.0,215.0,245.0,
2016,118,185.0,233.0,,
2017,122,195.0,,,
2018,125,,,,


## Calculate Chain Ladder Development Factors

Calculate the development factors for each period in the dataset. The development factors indicate how much claims tend to grow from one period to the next.

In [90]:
factors = []
for col in df.columns[:-1]:
    factors.append(df[str(int(col)+1)].sum() / df[col][:-int(col)-1].sum())
factors = np.array(factors)
factors

array([1.55530973, 1.23622047, 1.12658228, 1.125     ])

## Calculate the cumulative development factors (CDFs)

The CDFs indicate how much claims accumulate over time, providing a more comprehensive view of the overall claims development.

In [108]:
cum_factors = np.cumprod(factors[::-1])[::-1]
cum_factors

array([2.43684698, 1.56679209, 1.26740506, 1.125     ])

## Appending the Final Cumulative Factor

Append the value 1 to the cumulative development factors (CDFs) array. This is done because, in the final development period, the cumulative factor is always 1 — indicating that no further development is expected.

In [112]:
# Append the value 1
cum_factors = np.append(cum_factors, 1)

# Print the updated array
cum_factors

array([2.43684698, 1.56679209, 1.26740506, 1.125     , 1.        ])

## Extracting the Last Non-NaN Value for Each Row (Incurred Claims)

loop through each row of the DataFrame df to extract the last non-NaN value in each row. This value represents the latest incurred claim amount for that accident year (AY), which will be used to estimate the ultimate claim.

In [116]:
# 
Incurred = []

# Loop through each row in the DataFrame
for i, row in df.iterrows():
    # Get the last non-NaN value from the row
    last_value = row.dropna().iloc[-1]  # Get the last valid entry
    Incurred.append(last_value)  # Add it to the result list

# Display the result
print(Incurred)

[225.0, 245.0, 233.0, 195.0, 125.0]


## Reversing the Incurred Claims Array

Reverse the Incurred array to ensure that the highest cumulative development factor (CDF) matches with the lowest incurred claim.reverse the Incurred array to ensure that the highest cumulative development factor (CDF) matches with the lowest incurred claim.

In [120]:
# Reverse the Incurred array so that the highest CDF matches with the lowest Incurred
Incurred_loss = Incurred[::-1]
Incurred_loss

[125.0, 195.0, 233.0, 245.0, 225.0]

## Calculating Ultimate Claims and Creating a DataFrame

Calculate the ultimate claim amounts by multiplying the cumulative development factors (CDFs) with the reversed incurred claims (Incurred_loss). The ultimate values represent the total expected claims for each accident year, once all claims have been fully developed.

In [124]:
# Calculate Ultimate values by multiplying CDFs with Incurred loss
Ultimate = cum_factors * Incurred_loss

# Create a DataFrame with the required columns
df = pd.DataFrame({
    'CDFs': cum_factors,
    'Incurred': Incurred_loss,
    'Ultimate': Ultimate
})
df

Unnamed: 0,CDFs,Incurred,Ultimate
0,2.436847,125.0,304.605873
1,1.566792,195.0,305.524457
2,1.267405,233.0,295.30538
3,1.125,245.0,275.625
4,1.0,225.0,225.0


## Calculating IBNR (Incurred But Not Reported)

Calculate the IBNR (Incurred But Not Reported) values for each accident year by subtracting the incurred claims from the ultimate claims. IBNR represents the reserves that an insurer needs to set aside for claims that have occurred but have not yet been reported.

In [128]:
df['IBNR'] = df['Ultimate'].astype(float) - df['Incurred'].astype(float)
df

Unnamed: 0,CDFs,Incurred,Ultimate,IBNR
0,2.436847,125.0,304.605873,179.605873
1,1.566792,195.0,305.524457,110.524457
2,1.267405,233.0,295.30538,62.30538
3,1.125,245.0,275.625,30.625
4,1.0,225.0,225.0,0.0


## Calculating Total IBNR

calculate the total IBNR (Incurred But Not Reported) by summing all the individual IBNR values for each accident year. This provides an overall estimate of the reserves that need to be set aside for claims that have occurred but have not yet been reported.

In [132]:
total_ibnr = df['IBNR'].sum()

# Display the total IBNR
print("Total IBNR:", total_ibnr)

Total IBNR: 383.0607094762064


Thank you for taking the time to view my project!