# Regression models for IBNR estimates

After estimating IBNR using traditional methods, we're going to model run-off triangles data with some additional methods, in order to accomplish main project's objective. 

The models that are presented next are a traditional linear regression, ridge regression and lasso regression, taking as input a variables `X` and `Y` for every company in the reequired timespan.

## Model construction

This method is based on Kremer's(1982) approach exposed on Verrall(1985). Under lognormal and identically distributed assumptions (and every other that applies to a regression model) the chainladder procedure based on multiplicative display could be described as the following equation: 

$$E(Z_{i,j})=U_iS_j$$

Where $U_i$ is a parameter for row i and $S_j$ is a parameter for column j. Then, if $Y_{i,j} = \operatorname{ln}(Z_{i,j})$ and if $U_i = e^{\alpha_i+\mu}\sum^t_{j=1}e^{\beta_j}$, we have the equation 

$$y=X\Beta + \varepsilon$$

Where $X$ is a non-singular design matrix.

In [2]:
import os
import itertools
import pandas as pd
import re
import math
import numpy as np

from sklearn.metrics import mean_absolute_percentage_error
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import Lasso
from sklearn.linear_model import Ridge
from sklearn.linear_model import LinearRegression

In [3]:
def columnas(valores,variable):
    y = [re.findall("\\d+", j)[0] for j in valores]
    y = [int(i) for i in y]
    todas = list(set(y))
    df = pd.DataFrame()
    df[f"y_{variable}"] = y
    for k in todas:
        #print(k)
        df[f"{variable}_{k}"] = ([1 if k == j else 0 for j in y])
    return df

def matrix_X(df_triangulo):
    k = len(df_triangulo.columns)
    alpha = [f'a_{i}' for i in range(1,k+1)]
    mu    = [f'u_{i}' for i in range(1,k+1)]
    lists = [alpha, mu]
    df    = pd.DataFrame(list(itertools.product(*lists)), columns=['a', 'u'])

    alpha    = columnas(valores  = df.a, variable = 'a')
    mu       = columnas(valores=df.u, variable = 'u')
    df_col= pd.concat([alpha, mu], axis=1)


    df_col['y_a'] = df_col['y_a'].astype(str) + df_col['y_u'].astype(str) 
    df_col['y_a'] = [int(i) for i in df_col['y_a']]
    df_col = df_col.drop(['y_u', 'u_1'], axis=1)
    df_col['a_1'] = 1
    df_col.rename(columns={'a_1': 'b0'}, inplace=True)
    df_col.rename(columns={'y_a': 'y_ii'}, inplace=True)
    #df_col = df_col.drop(['y_ii'], axis=1)
    return df_col

def matrix_y(df_triangulo):
    k = len(df_triangulo.columns)
    d0 = pd.DataFrame()
    for i in range(k):
        for j in range(k):
            d1 = pd.DataFrame({'y_ii': [int(f'{i+1}{j+1}')], 'Y': [math.log(df_triangulo.iloc[i, j])]})
            d0 = pd.concat([d0, d1], axis=0)
    return d0

def triangulo(df, grcode, entreno):
    
    if entreno:
        df_trinagulo = df[(df['GRCODE']== grcode ) & (df['DevelopmentYear']<=1997)].copy()
    else: 
        df_trinagulo = df[df['GRCODE']== grcode].copy()
        
    df_g         = df_trinagulo.groupby(["AccidentYear", "DevelopmentLag"]).agg({'IncurLoss_B': ['max']})
    df_g.columns = ['Pagos']
    df_g         = df_g.reset_index()
    pivot_data   = df_g.pivot(index='AccidentYear',columns='DevelopmentLag',values='Pagos').reset_index()
    pivot_data   = pivot_data.drop('AccidentYear', axis=1).cumsum(axis=1)
    
    return pivot_data

In [4]:
input = pd.read_csv(os.path.normpath(os.getcwd() + os.sep + os.pardir)+"/data/ppauto_pos.csv")

input = input[input.DevelopmentYear <= 1997]

cleaning_cond = np.array(['Adriatic Ins Co', 'Aegis Grp', 'Agency Ins Co Of MD Inc',
       'Allegheny Cas Co', 'American Modern Ins Grp Inc',
       'Armed Forces Ins Exchange', 'Auto Club South Ins Co',
       'Baltica-Skandinavia Rein Co Of Amer', 'Bancinsure Inc',
       'Bell United Ins Co', 'Century-Natl Ins Co', 'Co-Operative Ins Co',
       'Consumers Ins Usa Inc', 'Cornerstone Natl Ins Co',
       'Federated Natl Ins Co', 'First Amer Ins Co',
       'Florists Mut Ins Grp', 'Harbor Ins Co', 'Homestead Ins Co',
       'Inland Mut Ins Co', 'Interstate Auto Ins Co Inc', 'Lancer Ins Co',
       'Lumber Ins Cos', 'Manhattan Re Ins Co', 'Mennonite Mut Ins Co',
       'Middle States Ins Co Inc', 'National Automotive Ins',
       'Nevada General Ins Co', 'New Jersey Citizens United Rcp Exch',
       'Nichido Fire & Marine Ins Co Ltd', 'Northwest Gf Mut Ins Co',
       'Ocean Harbor Cas Ins Co', 'Overseas Partners Us Reins Co',
       'Pacific Ind Ins Co', 'Pacific Pioneer Ins Co',
       'Pacific Specialty Ins Co', 'Penn Miller Grp',
       'Pennsylvania Mfg Asn Ins Co', 'Pioneer State Mut Ins Co',
       'Protective Ins Grp', 'San Antonio Reins Co',
       'Seminole Cas Ins Co', 'Southern Group Ind Inc',
       'Southern Mut Ins Co', 'Southland Lloyds Ins Co', 'Star Ins Grp',
       'Sterling Ins Co', 'Usauto Ins Co', 'Vanliner Ins Co',
       'Wea Prop & Cas Ins Co', 'Wellington Ins Co', 'State Farm Mut Grp', 'United Services Automobile Asn Grp',
       'US Lloyds Ins Co', 'Toa-Re Ins Co Of Amer', 'FL Farm Bureau Grp'])

input = input[~input.GRNAME.isin(cleaning_cond)]

Lista_entidades_ceros = input[input.IncurLoss_B <= 0]["GRCODE"].unique()
input = input[~input.GRCODE.isin(Lista_entidades_ceros)]
#input.IncurLoss_B = input.IncurLoss_B+1 #deal with NaN from log transformation.

In [5]:
df_data        = input #pd.read_csv('medmal_pos.csv')
df_trg_entreno = triangulo(df_data, grcode=43, entreno=True)
df_trg_prueba  = triangulo(df_data, grcode=43, entreno=False)
df_trg_entreno

DevelopmentLag,1,2,3,4,5,6,7,8,9,10
0,607.0,1254.0,1836.0,2434.0,3048.0,3663.0,4278.0,4892.0,5506.0,6120.0
1,2254.0,5113.0,8092.0,10856.0,13682.0,16699.0,19689.0,22667.0,25645.0,
2,5843.0,13267.0,21574.0,30245.0,39311.0,48237.0,57004.0,65769.0,,
3,11422.0,27515.0,46163.0,65258.0,83911.0,102380.0,120787.0,,,
4,19933.0,44095.0,72834.0,101163.0,129234.0,156956.0,,,,
5,24604.0,56734.0,90309.0,123078.0,156800.0,,,,,
6,40735.0,84679.0,127190.0,168902.0,,,,,,
7,43064.0,86769.0,129678.0,,,,,,,
8,41837.0,83141.0,,,,,,,,
9,44436.0,,,,,,,,,


In [6]:
Y = matrix_y(df_trg_entreno)
X = matrix_X(df_trg_entreno)

Y_X          = pd.merge(Y, X, on='y_ii', how='inner')
data_entreno = Y_X[Y_X['Y'].notna()]
data_entreno = data_entreno.drop(['y_ii'], axis=1)
data_entreno.head()

Unnamed: 0,Y,b0,a_2,a_3,a_4,a_5,a_6,a_7,a_8,a_9,a_10,u_2,u_3,u_4,u_5,u_6,u_7,u_8,u_9,u_10
0,6.408529,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,7.134094,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
2,7.515345,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
3,7.797291,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
4,8.022241,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0


In [7]:
Y_prueba      = matrix_y(df_trg_prueba)                   
x_prueba      = Y_X[Y_X['Y'].notna()].drop(['Y'], axis=1)
data_prueba_  = pd.merge(Y_prueba, x_prueba, on='y_ii', how='inner')
data_prueba=data_prueba_.drop(['y_ii'], axis=1)
y_ii = data_prueba_['y_ii']

x_entreno = data_entreno.drop('Y', axis=1)  # Features
y_entreno = data_entreno['Y']  # Target variable
x_prueba  = data_prueba.drop('Y', axis=1)  # Features
y_prueba  = data_prueba['Y']  # Target variable

## Models considered:

The next steps require estimating three type of models:

### Ordinary Least Squares regression

The simple linear regression consists of generating a regression model (equation of a line) that explains the linear relationship between two variables. The dependent or response variable is identified as $Y$, and the predictor or independent variable is identified as $X$.

The simple linear regression model is described according to the equation:
$$Y =\beta_{0} + \beta_{1}X_{1}+\varepsilon$$

Here, $\beta_0$ is the y-intercept, $\beta_1$ is the slope, and $\varepsilon is the random error. The random error represents the difference between the value adjusted by the line and the actual value. It captures the effect of all those variables that influence $Y$ but are not included in the model as predictors. The random error is also known as the residual.

In the vast majority of cases, the population values of $\beta_0$ and \beta_1$ are unknown. Therefore, from a sample, their estimations are obtained, these estimates are known as regression coefficients or least square coefficient estimates since they take values that minimize the sum of squared residuals, resulting in the line that passes closest to all points.

### Ridge regression

Ridge regularization penalizes the sum of the coefficients squared. This penalty has the effect of proportionally reducing the value of all coefficients in the model without letting them reach zero. The degree of penalization is controlled by the hyperparameter $\lambda$. When $\lambda=0$, the penalty is null, and the result is equivalent to that of a linear model by ordinary least squares (OLS). As λ increases, the penalty becomes stronger, and the values of the predictors decrease.


$$\sum^n_{i=1}(y_i - \beta_0 - \sum^p_{j=1} \beta_j x_{ij})^2 + \lambda \sum^p_{j=1} \beta_j^2 = \text{residual squared sum} + \lambda \sum^p_{j=1} \beta_j^2$$

The main advantage of applying ridge over ordinary least squares (OLS) fitting is the reduction of variance. Generally, in situations where the relationship between the response variable and predictors is approximately linear, least squares estimates have little bias but can still suffer from high variance (small changes in the training data have a significant impact on the resulting model). This problem is accentuated as the number of predictors introduced into the model approaches the number of training observations, reaching the point where, if $p>n$, it is not possible to fit the model by ordinary least squares. By using an appropriate value of λ, the ridge method can reduce variance without significantly increasing bias, thus achieving lower total error.

The disadvantage of the ridge method is that the final model includes all predictors. This is because, although the penalty forces the coefficients to tend toward zero, they never become exactly zero (only if $\lambda=\inf$). This method minimizes the influence on the model of predictors less related to the response variable, but in the final model, they will still appear. Although this is not a problem for the accuracy of the model, it is a challenge for its interpretation

### Lasso regression

Lasso regularization penalizes the sum of the absolute values of the regression coefficients. This penalty is known as l1 and has the effect of forcing the coefficients of the predictors to tend towards zero. Since a predictor with a regression coefficient of zero does not influence the model, lasso manages to exclude the less relevant predictors. Similar to ridge, the degree of penalization is controlled by the hyperparameter λ. When λ = 0, the result is equivalent to that of a linear model by ordinary least squares. As λ increases, the penalty becomes stronger, and more predictors are excluded.

$$\sum^n_{i=1}(y_i - \beta_0 - \sum^p_{j=1} \beta_j x_{ij})^2 + \lambda \sum^p_{j=1} |\beta_j| = \text{residual squared sum} + \lambda \sum^p_{j=1} |\beta_j|$$

The main practical difference between lasso and ridge is that the former manages to make some coefficients exactly zero, thus performing predictor selection, while the latter does not exclude any. This represents a significant advantage of lasso in scenarios where not all predictors are important for the model, and it is desired that the least influential ones be excluded.

On the other hand, when there are highly correlated predictors (linearly), ridge reduces the influence of all of them simultaneously and proportionally, while lasso tends to select one of them, giving it all the weight and excluding the others. In the presence of correlations, this selection varies a lot with small perturbations (changes in the training data), so lasso solutions are very unstable if predictors are highly correlated, which is taken into account in the next steps.

In [8]:
Regresion_lineal = LinearRegression()
Regresion_lineal.fit(x_entreno, y_entreno)
LR_coef = Regresion_lineal.coef_
y_pred = Regresion_lineal.predict(x_prueba)

mse = mean_squared_error(y_prueba, y_pred)   # Considerar que se debe aplicar la exponencial a los rsultados
mape = mean_absolute_percentage_error(y_prueba, y_pred)   # COnsiderar que se debe aplicar la exponencial a los rsultados
[mse, mape]

[0.002230701729399499, 0.0038343781268415497]

In [9]:
alpha = 0.00001
ridge_model = Ridge(alpha = alpha) #aplicación regresión de ridge
ridge_model.fit(x_entreno, y_entreno) #entrenamiento regresión de ridge
ridge_coef = ridge_model.coef_ #coeficientes regresión de ridge
y_pred = ridge_model.predict(x_prueba)

mse = mean_squared_error(y_prueba, y_pred)   # COnsiderar que se debe aplicar la exponencial a los rsultados
mape = mean_absolute_percentage_error(y_prueba, y_pred)   # COnsiderar que se debe aplicar la exponencial a los rsultados
[mse, mape]

[0.002230702303162153, 0.003834869524506071]

In [10]:
lasso_model = Lasso(alpha = alpha) #aplicación regresión de lasso
lasso_model.fit(x_entreno, y_entreno) #entrenamiento regresión de lasso
lasso_coef = lasso_model.coef_ #coeficientes regresión de lasso
y_pred = lasso_model.predict(x_prueba)

mse = mean_squared_error(y_prueba, y_pred)   # Considerar que se debe aplicar la exponencial a los rsultados
mape = mean_absolute_percentage_error(y_prueba, y_pred)   # Considerar que se debe aplicar la exponencial a los rsultados
[mse, mape]

[0.0022309333284786215, 0.0038438188771605904]

In [173]:
np.exp(y_pred)

array([   523.69588732,   1138.91125798,   1800.60267558,   2481.25822179,
         3197.47927039,   3891.61694156,   4565.05488313,   5164.35159533,
         5680.89885172,   6130.        ,   2295.86135051,   4992.94037277,
         7893.76848391,  10877.73455923,  14017.61834235,  17060.68950208,
        20013.01389388,  22640.30616852,  24904.82821343,   6338.44464237,
        13784.57638498,  21793.22133009,  30031.39467575,  38700.02770899,
        47101.37916013,  55252.19572384,  62505.6592827 ,  13208.63690653,
        28725.57459323,  45414.72929312,  62582.19649294,  80646.69538382,
        98154.20820492, 115139.6332668 ,  20832.59948126,  45305.8400049 ,
        71627.89564198,  98704.34348534, 127195.58545728, 154808.35164171,
        25717.11423791,  55928.47230123,  88422.12785315, 121847.05415536,
       157018.49424527,  38030.54940143,  82707.20070384, 130758.92070899,
       180187.806828  ,  40167.54770632,  87354.65256788, 138106.47672126,
        39993.8249981 ,  

## Cross validation

Next to model implementation and checking on the similar results of the three modeling procedures, we need to train, validate and test to find the best model of IBNR. Thus, we proceed next with a Cross Validation, running on a list of 10 insurers and excluding one at a time on a loop that splits every insurer's sub-sample on training, validation and testing data.

With this in mind, the next loop estimate the three models for every company. The process is completed once every company has already trained, validated and tested each of the models.

In [74]:
df_CV = input[input["GRCODE"].isin(list(input["GRCODE"].unique()[0:10]))]
lista_aseguradoras = df_CV["GRCODE"].unique()
mejore_modelos_test_full = {}
mejores_mape = {}

for i in range(len(lista_aseguradoras)): #recorre las aseguradoras de test
    print("aseguradora de test:", i)
    conj_test = lista_aseguradoras[i] #codigo de aseguradora de testeo
    datos_test = df_CV[df_CV["GRCODE"].isin([conj_test])] #datos de aseguradora de testeo
    conj_entre_valid = np.delete(lista_aseguradoras, i, axis=0) #Conjunto de validación y entrenamiento

    for j in range(len(conj_entre_valid)): #recorre los datos de entrenamiento y validación

        conj_vali = conj_entre_valid[j] #datos de aseguradora de validación
        conj_entre = np.delete(conj_entre_valid, j, axis=0) #aseguradoras de entrenamiento
        datos_train = df_CV[df_CV["GRCODE"].isin(conj_entre)] #datos de aseguradoras de entrenamiento
        datos_validacion = df_CV[df_CV["GRCODE"].isin([conj_vali])] #datos de aseguradora de validación

        #Se crea la clase que calculará las regresiones
        df_data        = input #pd.read_csv('medmal_pos.csv')
        df_trg_entreno = triangulo(datos_train, entreno=True, grcode=43)
        df_trg_prueba  = triangulo(datos_validacion, entreno=False, grcode=conj_vali)

        Y_prueba      = matrix_y(df_trg_prueba)                   
        x_prueba      = Y_X[Y_X['Y'].notna()].drop(['Y'], axis=1)
        data_prueba_  = pd.merge(Y_prueba, x_prueba, on='y_ii', how='inner')
        data_prueba=data_prueba_.drop(['y_ii'], axis=1)
        y_ii = data_prueba_['y_ii']

        x_entreno = data_entreno.drop('Y', axis=1)  # Features
        y_entreno = data_entreno['Y']  # Target variable
        x_prueba  = data_prueba.drop('Y', axis=1)  # Features
        y_prueba  = data_prueba['Y']  # Target variable

        Regresion_lineal.fit(x_entreno, y_entreno)
        LR_coef = Regresion_lineal.coef_
        y_pred_1 = Regresion_lineal.predict(x_prueba)
        mape_1 = mean_squared_error(y_prueba, y_pred_1)
        Regresion_lineal_sum = [Regresion_lineal.intercept_, Regresion_lineal.coef_, Regresion_lineal.score(x_prueba, y_prueba)]

        ridge_model.fit(x_entreno, y_entreno)
        ridge_coef = Regresion_lineal.coef_
        y_pred_2 = ridge_model.predict(x_prueba)
        mape_2 = mean_squared_error(y_prueba, y_pred_2)
        ridge_sum = [ridge_model.intercept_, ridge_model.coef_, ridge_model.score(x_prueba, y_prueba)]

        lasso_model.fit(x_entreno, y_entreno)
        lasso_coef = Regresion_lineal.coef_
        y_pred_3 = lasso_model.predict(x_prueba)
        mape_3 = mean_squared_error(y_prueba, y_pred_3)
        lasso_sum = [lasso_model.intercept_, lasso_model.coef_, lasso_model.score(x_prueba, y_prueba)]

        results = [mape_1, mape_2, mape_3]
        models = [Regresion_lineal_sum, ridge_sum, lasso_sum]

        if (mape_1<mape_2) & (mape_1<mape_3):
            modelo_test=Regresion_lineal
            if (mape_2<mape_1) & (mape_2<mape_3):
                modelo_test=ridge_model
            else:
                modelo_test=lasso_model
                    
        y_pred = modelo_test.predict(x_prueba)
        mejores_mape[i,j] = mean_squared_error(y_prueba, y_pred)

        mejore_modelos_test_full["modelo_"+str(i)+"-"+str(j)] = modelo_test #se guardan los mejores modelos


aseguradora de test: 0
aseguradora de test: 1
aseguradora de test: 2
aseguradora de test: 3
aseguradora de test: 4
aseguradora de test: 5
aseguradora de test: 6
aseguradora de test: 7
aseguradora de test: 8
aseguradora de test: 9


In [75]:
list(dict(sorted(mejores_mape.items(), key=lambda item: item[1])).keys())[0]

(1, 0)

In [90]:
mejore_modelos_test_full #winner is number 10

{'modelo_0-0': Lasso(alpha=1e-05),
 'modelo_0-1': Lasso(alpha=1e-05),
 'modelo_0-2': Lasso(alpha=1e-05),
 'modelo_0-3': Lasso(alpha=1e-05),
 'modelo_0-4': Lasso(alpha=1e-05),
 'modelo_0-5': Lasso(alpha=1e-05),
 'modelo_0-6': Lasso(alpha=1e-05),
 'modelo_0-7': Lasso(alpha=1e-05),
 'modelo_0-8': Lasso(alpha=1e-05),
 'modelo_1-0': Lasso(alpha=1e-05),
 'modelo_1-1': Lasso(alpha=1e-05),
 'modelo_1-2': Lasso(alpha=1e-05),
 'modelo_1-3': Lasso(alpha=1e-05),
 'modelo_1-4': Lasso(alpha=1e-05),
 'modelo_1-5': Lasso(alpha=1e-05),
 'modelo_1-6': Lasso(alpha=1e-05),
 'modelo_1-7': Lasso(alpha=1e-05),
 'modelo_1-8': Lasso(alpha=1e-05),
 'modelo_2-0': Lasso(alpha=1e-05),
 'modelo_2-1': Lasso(alpha=1e-05),
 'modelo_2-2': Lasso(alpha=1e-05),
 'modelo_2-3': Lasso(alpha=1e-05),
 'modelo_2-4': Lasso(alpha=1e-05),
 'modelo_2-5': Lasso(alpha=1e-05),
 'modelo_2-6': Lasso(alpha=1e-05),
 'modelo_2-7': Lasso(alpha=1e-05),
 'modelo_2-8': Lasso(alpha=1e-05),
 'modelo_3-0': Lasso(alpha=1e-05),
 'modelo_3-1': Lasso

In [77]:
winner = mejore_modelos_test_full['modelo_1-0']

As we could see, the best model is a Lasso regression with the next parameters: 

In [78]:
winner.coef_

array([0.        , 1.47844251, 2.49413039, 3.22837566, 3.68398512,
       3.89455874, 4.28571065, 4.34024567, 4.33566825, 4.44036036,
       0.77618137, 1.23415607, 1.5547379 , 1.80827364, 2.00466563,
       2.16418097, 2.28739579, 2.38248208, 2.45789794])

In [79]:
df_data        = input
df_trg_prueba  = triangulo(df_data, grcode=43, entreno=False)

Y_prueba      = matrix_y(df_trg_prueba)                   
x_prueba      = Y_X[Y_X['Y'].notna()].drop(['Y'], axis=1)
data_prueba_  = pd.merge(Y_prueba, x_prueba, on='y_ii', how='inner')
data_prueba=data_prueba_.drop(['y_ii'], axis=1)
y_ii = data_prueba_['y_ii']

X_test  = data_prueba.drop('Y', axis=1)  # Features
Y_test  = data_prueba['Y']  # Target variable


Y_pred = winner.predict(X_test)

And we recall that this model offers us a MAPE of 0.002, according to the test data:

In [80]:
mejores_mape[(1,0)]

0.0022309333284786215

Showing the predictions, we must see that the error values are short in comparison with the other models implemented. Next to this, the model's prediction are computed in terms of \$USD. Thus, model predicts that IBNR's for the largest development year must be \$USD 180,109.086, compared to actual IBNR's of \$USD 168,902 the difference is 0.53\%.

In [82]:
pd.DataFrame({"Test" : np.exp(Y_test), "Predicted" : np.exp(Y_pred), "Percentage difference" : np.round(100*(Y_test-Y_pred)/Y_pred, 2)})

Unnamed: 0,Test,Predicted,Percentage difference
0,607.0,523.674041,2.36
1,1254.0,1138.026387,1.38
2,1836.0,1799.070652,0.27
3,2434.0,2478.99206,-0.23
4,3048.0,3194.363303,-0.58
5,3663.0,3887.552493,-0.72
6,4278.0,4559.87461,-0.76
7,4892.0,5157.798995,-0.62
8,5506.0,5672.308827,-0.34
9,6120.0,6116.634925,0.01


In [88]:
print("Best Lasso estimator prediction MAPE: ",  np.round((100*(Y_test-Y_pred)/Y_pred).mean(), 3), "%")

Best Lasso estimator prediction MAPE:  0.006 %


# Conclussions

As we have seen, the Lasso regression model has an excelent performance over other methodologies, even surpassing chainladder method as we estimated an error between 7% and 0.1\%, versus a Lasso that gives us a consistent 0.006\%.

We can conclude that implementing Lasso regularization methods bring IBNR's regression a better consistent estimation and a better precision for predictions. This is very useful from business perspective, as a company will seek for optimize their liabilities with respect to their assets, as claims are the larger risk source for an insurer.

However, the model could be revisited on the strenght of Lasso methdologies, it could be a matter of study the implementation of elastic net alternatives  to Lasso regularization. Furthermore, this tunning was realized using a very short companies sample for meet with reasonable time processing of run-off triangles data.

Lastly, even when the Lasso model is the most accurate, we must observe that chainladder traditional procedure offers a practical and less time and resources consuming method that is pretty precise. So the trade-off between resources and precision has to be taken into account when selecting predictive strategies.

## References 
- Verrall, R. J. (1994). Statistical methods for the chain ladder technique. In Casualty Actuarial Society Forum (Vol. 1, pp. 393-446).