## Session Descriptions

Welcome to CAS Python Workshop


### Today's Modules

| What | How long |
|:------------------------------------------------------------------------|--------------:|
| The Basic Triangle                          |  10 min |
| The Multidimensional Triangle | 10 min |
| Large Triangles | 5 min |
| Creating Triangles | 5 min |
| Scikit-learn Review | 5 min |
| Models in chainladder | 5 min |
| Assumption Setting | 5 min |
| Composite Estimators | 5 min |
| Assumption Tuning | 5 min |
| Fitting vs Diagnostics | 5 min |
| Predictions | 5 min |
| Simulations | 5 min |
| Conclusion | 5 min |


In [None]:
pip install chainladder

### The `chainladder` library

The `chainladder` library is inspired by the R package of the same name and adopts the design principles of pandas for data manipulation and scikit-learn for modeling.







In [None]:
import pandas as pd
import numpy as np
import chainladder as cl

import matplotlib.pyplot as plt
plt.style.use('ggplot')
%config InlineBackend.figure_format = 'retina'

In [None]:
cl.__version__

### The Basic Triangle

The `chainladder` package includes many toy datasets to demonstrate library usage.

In [None]:
pd.read_html('https://chainladder-python.readthedocs.io/en/latest/utilities.html')[0]

Let's load one of these sample datasets to show a basic Triangle.

In [None]:
genins = cl.load_sample('genins')
genins

It looks like a `DataFrame`, but its not.

In [None]:
type(genins)

DataFrames are the best tool for working with tabular data, but Triangles (as a concept) don't really behave like a DataFrame.  The Triangle object does many things a DataFrame won't do easily (and vice versa). For example, you can convert it to a valuation triangle using `dev_to_val` or back using `val_to_dev`

In [None]:
genins.dev_to_val()

You can convert it to an incremental triangle using `cum_to_incr` and back using `incr_to_cum`

In [None]:
genins.cum_to_incr()

Link ratios are a property of a Triangle that you may want to see.

In [None]:
genins.link_ratio

Or quickly visualize the link ratio highs/lows.

In [None]:
genins.link_ratio.heatmap()

The latest diagonal is another useful property.

In [None]:
genins.latest_diagonal

Grabbing the last calendar period paid losses from the Triangle can be done with method chaining.

In [None]:
genins.cum_to_incr().latest_diagonal

Triangles have dedicated `origin` and `development` accessors, similar to pandas `str` and `dt` accessors.  These can be used as boolean filters.

In [None]:
genins[genins.origin<='2005']

In [None]:
genins[genins.development>72]

Implicitly, triangles have an additional time dimension representing different valuations.  Notice, we can treat the date as text and that will be converted to a datetime OR we can access parts of the valuation, like `year`, and compare it to integer types.

In [None]:
genins[genins.valuation.year.isin([2004])|
       (genins.valuation>='2007')&
       (genins.valuation.year!=2009)]

Arithmetic operations between triangles is also possible.

In [None]:
genins / genins.latest_diagonal * 100

This is a triangle with 10 origin periods and 10 development lags.  Let's examine its shape.

In [None]:
genins.shape

### The Multidimensional Triangle

Woah, why so many dimensions?  Most actuaries work with more than one Triangle and the `Triangle` handles set similar to a DataFrame of Triangles. In this way, a triangle has 4 axes. The first two are the `index`, `columns` just like a pandas dataframe.
The last two are the `origin` and `development`.

As mentioned above, the triangle also has an implicit axis of `valuation` not shown in the shape.

![](https://chainladder-python.readthedocs.io/en/latest/_images/triangle_graphic.PNG)

In [None]:
clrd = cl.load_sample('clrd')
clrd

You can think of a multidimensional Triangle as a pandas DataFrame of individual Triangles.  You can manipulate the Triangle using much of the same functionality we learned with pandas.  

In [None]:
print(clrd.columns)
clrd.index

You can subset "columns" of a Triangle

In [None]:
clrd['CumPaidLoss']

You can apply boolean filtering to get access to subsets of Triangles

In [None]:
clrd['CumPaidLoss'][clrd['LOB']=='comauto'].sum()

Groupby operations work as well.

In [None]:
clrd['CumPaidLoss'].groupby('LOB').sum().latest_diagonal

You can convert your Triangle into a DataFrame at any time using `to_frame`.

In [None]:
clrd['CumPaidLoss'].groupby('LOB').sum().latest_diagonal.T

You can plot the Triangle with the `plot` method we learned previously.

In [None]:
(clrd['CumPaidLoss'].sum() / clrd['EarnedPremDIR'].sum()).T.plot();

You can derive new columns for your `Triangle`.

In [None]:
clrd['CaseIncurLoss'] = clrd['IncurLoss'] - clrd['BulkLoss']
clrd

You can create "Calculated Fields", like an Excel pivot table, by assigning a column to a function.

In [None]:
clrd['LossRatio'] = lambda clrd: clrd['IncurLoss'] / clrd['EarnedPremDIR']

In [None]:
clrd.groupby('LOB').sum().latest_diagonal['LossRatio'].T.plot();

`loc` and `iloc` slicing functionality are also available and work across each of the 4 triangle axes.

In [None]:
clrd.loc['Aegis Grp', 'CumPaidLoss'].iloc[-1, 0, :, ::-1]

Between the Triangle-specific functionality and the pandas-style functionality, you can query any diagnostic information from your triangles.

Which Company has the highest Direct Earned Premium for 1994?

In [None]:
(clrd['EarnedPremDIR']
  .groupby('GRNAME').sum()
  .latest_diagonal.loc[:, :, '1994']
  .to_frame(origin_as_datetime=True)
  .nlargest(1))

For the five largest personal auto companies, which has the lowest loss ultimate loss ratio?

In [None]:
largest_5 = (
  clrd.loc[clrd['LOB']=='ppauto', 'EarnedPremDIR']
      .groupby('GRNAME').sum()
      .latest_diagonal
      .sum(axis='origin') # sum can be used along any axis
      .to_frame(origin_as_datetime=True)
      .nlargest(5)
)

print('Top Five:\n', largest_5, '\n\nLowest Loss Ratio:\n')

(clrd[clrd['LOB']=='ppauto']
  .loc[list(largest_5.index)]
  .latest_diagonal.sum('origin')['LossRatio']
  .to_frame(origin_as_datetime=True)
  .sort_values()
  )[:1]

**Exercise:**

For the `wkcomp` line, which company holds the highest ratio of "BulkLoss" to case reserves?

* Develop a 'CaseReserve' column as difference between CaseIncurLoss and CumPaidLoss
* Limit the Triangle to just 'wkcomp'
* Sum across all origin periods by specifying axis=2 or axis="origin" 
* Take ratio of BulkLoss to CaseReserve
* Convert `to_frame` and sort

In [None]:
# Your work here

**Exercise:**

For the `ppauto` line, which company exhibited the highest prior year development in 1997?

* Subset the triangle to LOB == 'ppauto'
* For denominator - slice out valution 1996 cumulative
* For numerator - slice out incremental latest diagonal using `cum_to_incr` and `latest_diagonal`
* Additionally for the numerator, slice out all accident years except 1997
* Convert to_frame and sort

In [None]:
# Your work here

### Large Triangles

The concept of a triangle as a format are memory-inefficient.  Almost half of the cells of a Triangle are useless and each useless cell takes up 64 bits of memory.

In [None]:
print(genins.values.dtype)
print(genins)

Unlike pandas, the Triangle will automatically switch to a sparse array representation of the Triangle data when it is large and has a high degree of sparsity.

In [None]:
prism = cl.load_sample('prism')
prism.values

However, as a practitioner, you do not have to think about this at all. `chainladder` exposes the same operations regardless of whether the data is sparse or not.

In [None]:
prism

In [None]:
prism.sum()['Paid'].iloc[..., -10:, :10]

This triangle is a claim level Triangle.  In practice, it is useful to bring Triangle data in at a very granular level.  This allows for the greatest flexibility in Triangle aggregations as analysis is conducted.

In [None]:
prism.index

Incremental triangles are inherently more sparse than cumulative, and storing as incremental will allow you to push substantially more data through the Triangle.  Cumulative triangles can easily be derived with the `incr_to_cum` method.

Here we also use the `grain` method to reaggregate from an `OMDM` (Origin Month, Development Month) grain to an `OYDY` (Origin Year, Development Year) grain.

In [None]:
print(prism.is_cumulative)
prism['Incurred'].sum().grain('OYDY').incr_to_cum().link_ratio

Notice how much longer our calculations take when we accumulate the Triangle at an earlier step.  This is because the Triangle is keeping track of far more data elements than it needs.

In [None]:
prism.incr_to_cum()['Incurred'].sum().grain('OYDY').incr_to_cum().link_ratio

Notice how much RAM a cumulative Triangle takes up relative to an incremental Triangle.  It's generally good practice to defer accumulation of large triangles until absolutely necessary.

In [None]:
prism.incr_to_cum().values

Let's try another example and look how reported counts trend by accident quarter for different deductible amounts.

In [None]:
(prism['reportedCount'].groupby('Deductible').sum()
                       .sum('development').grain('OQDM')
                       .loc[:, :, :'2014'].T).plot(
    ylabel='Reported Count');

The `Triangle` extends the functionality you've learned about in our pandas tutorials.  Some things to remember to make working with Triangles as easy as possible:

* Dealing with the first two axes (`index`, `columns`) feels a lot like dealing with a `pd.DataFrame`
* Triangles with the last two axes (`origin`, `development`) feels a lot like pandas accessors (`str`, `dt`)
* Triangles have domain-specific methods not found in pandas (`grain`, `incr_to_cum`, `dev_to_val`, etc.)
* Triangles are designed to scale up to fairly large datasets - the more you push into a Triangle, the more you will have access to.

### Creating Triangles

You'll want to try the library out on your own data.  To do so, you need to instantiate a `Triangle` from data you have on hand.  This data must be a DataFrame in **long** format.

![](https://chainladder-python.readthedocs.io/en/latest/_images/triangle_bad_good.PNG)

At a minimum, the DataFrame must also:

1. have “date-like” columns for the origin and development period of the triangle.
2. Have a numeric column(s) representing the amount(s) of the triangle.

The reason for these restriction is that the Triangle infers a lot of useful properties from your DataFrame. For example, it will determine the grain and valuation_date of your triangle which in turn are used to derive many other properties of your triangle without further prompting from you.

In [None]:
df = pd.read_csv('https://raw.githubusercontent.com/casact/chainladder-python/master/chainladder/utils/data/prism.csv')
df.head()

At a minimum, you'll want to feed the `Triangle` some `data`, and column labels for each of its four axes (`index`, `columns`, `origin`, and `development`)

A good convention is to store different measures (paid, incurred, counts, etc.) as `columns` and arbitrary dimensions (Policy Number, State, etc) in the `index`.

In [None]:
cl.Triangle(
    data=df,
    index=['ClaimNo', 'Line', 'Type', 'ClaimLiability', 'Limit', 'Deductible'],
    columns=['reportedCount', 'closedPaidCount', 'Paid', 'Incurred'],
    origin='AccidentDate',
    development='PaymentDate',
)

From our data, the `Triangle` infers that it has an `origin_grain` of **M**onth and a `development_grain` also of **M**onth.  It inferred these from the available elements in 'AccidentDate' and 'PaymentDate'.

Let's reproduce the same triangle, but with a coarser accident grain.

In [None]:
df['AccidentYear'] = df['AccidentDate'].str[:4] # Now only the year is known.

cl.Triangle(
    data=df,
    index=['ClaimNo', 'Line', 'Type', 'ClaimLiability', 'Limit', 'Deductible'],
    columns=['reportedCount', 'closedPaidCount', 'Paid', 'Incurred'],
    origin='AccidentYear',
    development='PaymentDate',
)

Generally, the datetime inference is pretty good, but if for whatever reason it doesn't do what you want, you can be explicit about the datetime format.

In [None]:
cl.Triangle(
    data=df,
    index=['ClaimNo', 'Line', 'Type', 'ClaimLiability', 'Limit', 'Deductible'],
    columns=['reportedCount', 'closedPaidCount', 'Paid', 'Incurred'],
    origin='AccidentYear',
    development='PaymentDate',
    origin_format='%Y', development_format='%Y-%m-%d'
)

The Triangle infers a lot of information during construction.  But not everything can be inferred. In particular, we have no way of knowing whether the Triangle is incremental or cumulative in nature.  You can optionally specify whether a triangle is `cumulative` explicitly during construction.  In most cases, this won't cause problems, but certain calculations will need to know whether the Triangle is cumulative or not.

In [None]:
cl.Triangle(
    data=df,
    index=['ClaimNo', 'Line', 'Type', 'ClaimLiability', 'Limit', 'Deductible'],
    columns=['reportedCount', 'closedPaidCount', 'Paid', 'Incurred'],
    origin='AccidentDate',
    development='PaymentDate',
    cumulative=False
)

### Scikit-learn Review

We want to use our triangles to estimate unpaid claims estimates.  To do so, we will construct models.  Just as Triangles emulate much of the `pandas` syntax, chainladder models emulate the `scikit-learn` modeling syntax.

Recall from scikit-learn that model specification and model fitting are separate from each other.  When specifying a model, you can configure any numer of model **hyperparameters**. Hyperparameters are just model parameters that can be set in the absence of seeing any data.

In [None]:
from sklearn.linear_model import LinearRegression

# Model specification
model_12_to_24 = LinearRegression(
    fit_intercept=False # Specify any hyperparameters
)

At this point, the model is just a specification, it has not seen any data whatsoever.  Hyperparameters are different from estimated model parameters in that they can be before seeing data.

In [None]:
X=genins.loc[..., :'2009', 12].values[0, 0]
y=genins.loc[..., :'2009',  24].values.flatten()
print(X.flatten())
print(y)   

# Model Fitting
model_12_to_24.fit(
    X=X, 
    y=y, 
    sample_weight=(1 / X).flatten())

After a model is `fit`, then additional model parameters that are estimated from the data become available.  

**Note** To distinguish estimated paramters from hyper parameters, scikit-learn convention is to have a **trailing underscore** on estimated parameters.

In [None]:
# Model Diagnostics
model_12_to_24.coef_ # A roundabout way of developing a volume-weighted link-ratio

Finally, depending on whether your model is a predictor or a transformer, you can either make predictions off of your model or transform your data based on the model.

The key difference between Transformers and Predictors:

* Transformers alter your design matrix, `X`
* Predictors make predictions of your response, `y`

In [None]:
model_12_to_24.predict(np.array([[1000]]))

### Models in `chainladder`

The chainladder package comes with several scikit-learn compliant estimators dedicated to actuarial reserving.  These estimators focus on different aspects of a reserving analysis:

* **Loss Development** - estimators for setting loss development patterns (e.g. `Development`, `MunichAdjustment`)
* **Tails Factors** - estimators for setting tail extrapolation fo factors (e.g. `TailCurve`, `TailConstant`)
* **Adjustments** - estimators for altering the data of a Triangle (e.g. `BerquistSherman`, `BootstrapODPSample`)
* **Workflow** - estimators for building composite estimators (e.g. `Pipeline`, `VotingChainladder`)
* **IBNR Models** - estimators for predicting unpaid claim etimates (e.g. `Chainladder`, `BornhuetterFerguson`)

Each of these estimators is a Transformer with the exception of the IBNR models which are Predictors.  

Let's fit a basic Chainladder model.  With no further configuration, the `Chainladder` model will fit volume-weighted factors without a tail.

In [None]:
# Model Specification
model = cl.Chainladder()

# Model Fitting
model.fit(genins)

# Model diagnostics
print(f'IBNR estimated as {"${:,}".format(int(model.ibnr_.sum()))}')

Other useful properties of IBNR models besides `ibnr_` include its selected age-to-age factors, or its `ldf_`

In [None]:
model.ldf_

Age-to-ultimates are stored in the `cdf_` attribute.

In [None]:
model.cdf_

We can explore the `full_triangle_` or the `full_expectation_`.

In [None]:
model.full_triangle_

Finally, let's explore the `ultimate_` of the model

In [None]:
(model.ultimate_/1e6).plot(
    title="Genins Ultimates ($M)",
    kind='bar', legend=False, xlabel='Accident Year');

All of these properties are themselves triangles, so they can be manipulated for further diagnostics.

In [None]:
(model.full_triangle_
      .dev_to_val()
      .cum_to_incr()
      .loc[..., '2011':'2020']
      .sum('origin')
      .T/1e6
 ).plot(
     marker='^', legend=False, linestyle='--',
     title="Expected IBNR Run-off",
     xlabel='Calendar Year', color='blue',
     ylabel=" ($ Millions)"
);

**Exercise**

Can you produce an heatmap of the difference between the expected development and the Triangle itself?
* Limit `model.full_expectation_` to the `valuation_date` of genins
* Take a difference between the expectation and the genins Triangle
* Call the `heatmap` method on the difference

In [None]:
# Your work here

### Assumption Setting (Hyperparameters)

The `Chainladder` model thus far has used a volume-weighted average set of development patterns.  It's almost always understood that the actuary will want to deviate from these patterns.  For this, we can use the `Development` transformer.

In [None]:
cl.Development?

In [None]:
cl.Development(
    average='simple', 
    n_periods=5
).fit(genins).cdf_

In [None]:
cl.Development(
    average='volume', 
    n_periods=3
).fit(genins).cdf_

We can estimate each lag independently by passing a vector of assumptions in.

In [None]:
cl.Development(
    average=['volume']*3 +['regression']*6, 
    n_periods=[5, 5, 4, 4, 3, 3, 3, 3, 3],
    drop_valuation='2008'
).fit(genins).cdf_

If we use a `Development` estimator to transform our Triangle, we can easily examine and `link_ratio` selections and ommissions we've made.

In [None]:
cl.Development(
    average=['volume']*3 +['regression']*6, 
    n_periods=[5, 5, 4, 4, 3, 3, 3, 3, 3],
    drop_valuation='2008'
).fit_transform(genins).link_ratio

### Composite Estimators (Pipeline)
The Development estimator only produces a set of patterns. It doesn't actually estimate IBNR.   To do that you need to couple it with an IBNR model.

In [None]:
# Model Specification
dev = cl.Development(
    average=['regression']*3 +['volume']*6, 
    n_periods=[5, 5, 4, 4, 3, 3, 3, 3, 3],
    drop_valuation='2008'
)
model = cl.Chainladder()

Notice, the `Chainladder` model requires a Triangle to be passed into its `fit` method.  We cannot pass in our Develoment estimator since it is not a Triangle.

In [None]:
type(dev) is cl.Triangle

Instead we ue our `Development` estimator to transform our triangle.

In [None]:
print(type(dev.fit_transform(genins)) is cl.Triangle)

# Model Fitting
model.fit(X=dev.fit_transform(genins))

print(f'IBNR estimated as {"${:,}".format(int(model.ibnr_.sum()))}')

Similar to scikit-learn, chainladder provides a `Pipeline` to make composite estimators much easier to build up.

In [None]:
# Model Specification
composite_model = cl.Pipeline(
    #       Name,    Estimator
    steps=[('dev',   dev),
           ('model', cl.Chainladder())])

# Fit your estimator
composite_model.fit(genins)

The `Pipeline` estimator exposes all features of the model from any step.  To access a specific step, you can reference the `named_steps` property of the `Pipeline`.

In [None]:
print(f'IBNR estimated as {"${:,}".format(int(composite_model.named_steps.model.ibnr_.sum()))}')

print('Selected development assumptions:')
pd.DataFrame(
    {'average': composite_model.named_steps.dev.average,
     'n_periods': composite_model.named_steps.dev.n_periods},
    index=genins.link_ratio.development)

Let's throw a Tail into the mix.  We can specify where the tail attaches using an `attachment_age`.  This allows for smoothing out patterns in the edge of a Triangle with limited data.

In [None]:
# Model Specification
composite_model = cl.Pipeline(
    steps=[('dev', dev),
           ('tail', cl.TailCurve('inverse_power', attachment_age=96)),
           ('model', cl.Chainladder())])

# Model Fitting
composite_model.fit(genins)

# Model Diagnostics
pd.concat(
    (genins.latest_diagonal.to_frame(origin_as_datetime=True).rename(columns={'2010': 'Latest'}), 
     composite_model.named_steps.model.ibnr_.to_frame(origin_as_datetime=True).rename(columns={'2261': 'IBNR'})
     ), axis=1
).plot(kind='bar', stacked=True, title='Genins with Inverse Power Tail');

### Assumption Tuning (GridSearch)
Similar to scikit-learns GridSearchCV, one can try various permutations of hyperparameters.  This is an excellent approach for testing assumption sensitivity.

To use GridSearch you have to:

* Supply an `estimator`, this can be any estimator including `Pipeline`.
* Specify a search space dictionary to the `param_grid` argument. If a Pipeline is used, then the parameter name must have the step name prepended (e.g. `stepname__parameter`)
* A `scoring` function that grabs the diagnostic info you're interested in scenario testing.

**Note** All of this can be done with loops, but `GridSearch` provides a nice shorthand for accomplishing the same as well as parallelism when your CPU has multiple cores.

In [None]:
# Model Specification
grid = cl.GridSearch(
    estimator=cl.Pipeline(steps=[
        ('dev', cl.Development()),
        ('tail', cl.TailCurve()),
        ('model', cl.CapeCod())
    ]),
    
    param_grid={'dev__average': ['simple', 'volume', 'regression'],
                'dev__n_periods': list(range(3, 10, 2)),
                'tail__curve':['exponential', 'inverse_power'],
                'tail__attachment_age':[96, 120],
                'model__decay': [0, .5, 1.0],
                'model__trend': [-.025, -.01, 0, .01, .025],
               },
    
    scoring=lambda x : x.named_steps.model.ibnr_.sum(),
    n_jobs=-1 # Use all cores for parallel processing
)


`CapeCod` requires an exposure vector that itself needs to be a `Triangle` object.  To instantiate a Triangle as a vector, you can optionally exclude the
`development` argument.

In [None]:
exposure = cl.Triangle(
    pd.DataFrame({'origin': genins.origin,
                  'premium': [6000000]*len(genins.origin)}),
    origin='origin', columns='premium', cumulative=True)
exposure.development_grain = 'Y'

# Model Fitting
grid.fit(genins, sample_weight=exposure)

In [None]:
# Model Diagnostics
grid.results_.rename(columns={'score': 'IBNR'})

The choice of `TailCurve` produces a bimodal distribution of IBNR.

In [None]:
# More Model Diagnostics
ax = grid.results_['score'].plot(
    kind='hist', bins=100, ylim=(None, 50),
    title=f'CapeCod IBNR ({len(grid.results_)} Scenarios)')

ax.annotate("Inverse Power Tail", xy=(3.0e7, 42))
ax.annotate("Exponential Tail", xy=(1.6e7, 42));

CapeCod trend parameter seems to have a very predictible impact on the IBNR estimate.

In [None]:
# More Model Diagnostics
(grid.results_.groupby('model__trend')['score'].mean()/1e6).plot(
    marker='D', color='teal', 
    title='IBNR Estimate', ylabel='($Millions)');

### Fitting vs Diagnostics
We can take advantage of the multidimensional nature of our Triangles to estimate patterns at one level, but apply them at another.

In [None]:
# Model Specification
pipe = cl.Pipeline(steps=[
    ('dev', cl.Development(average='volume', groupby='LOB')),
    ('tail', cl.TailCurve('exponential')),
    ('model', cl.Chainladder())])

# Model Fitting
pipe.fit(clrd['CumPaidLoss'])

# Model Diagnostics
pipe.named_steps.model.cdf_.to_frame(origin_as_datetime=True)

However, we have IBNR estimates at the original index grain of the Triangle.  This can be very useful for summarizing results by region, program, or even individual claim without having to resort to a separate allocation process.

In [None]:
# More Model Diagnostics
pd.pivot_table(
    pipe.named_steps.model.ibnr_.sum('origin').to_frame(origin_as_datetime=True).reset_index(),
    index='GRNAME', columns='LOB', values=0).fillna(0).round(0)

Let's build a CapeCod estimate of the same.  There are only two things we need to change.

1. We have to swap out `Chainladder` for `CapeCod`
2. `CapeCod` requires an exposure vector. We can pass this to the fit method using the `sample_weight` argument.

**Note** CapeCod, BornhuetterFerguson, and Benktander all require a `sample_weight`

In [None]:
# Model Specification
pipe = cl.Pipeline(steps=[
    ('dev', cl.Development(average='volume', groupby='LOB')),
    ('tail', cl.TailCurve('exponential')),
    ('model', cl.CapeCod(decay=0.99, trend=.01))
])

# Model Fitting
pipe.fit(clrd['CumPaidLoss'], sample_weight=clrd['EarnedPremDIR'].latest_diagonal)

# Model Diagnostics
pd.pivot_table(
    pipe.named_steps.model.ibnr_.sum('origin').to_frame(origin_as_datetime=True).reset_index(),
    index='GRNAME', columns='LOB', values=0).fillna(0).round(0)

### Predictions

All IBNR models have a `predict` method which allows you to use the model to predict IBNR on other Triangles, such as in a roll-foward analysis.

To use models across analyses, you can save the model to disk using `to_pickle` and retrieve the model using `read_pickle`.

Here 

In [None]:
last_time = genins[genins.valuation<genins.valuation_date]
last_time

In [None]:
# Model Specification
last_time_model = cl.Pipeline(steps=[
    ('dev', cl.Development(average='simple')),
    ('tail', cl.TailConstant(1.05, decay=0.99)),
    ('model', cl.Chainladder())]
)

# Model Fitting
last_time_model.fit(last_time)

# Save model to disk
last_time_model.to_pickle('last_time.pickle')

Notice how our Tail pattern extrapolates lags at least one year into the future.  This allows us to use the model in a roll-forward context.

In [None]:
last_time_model.named_steps.model.cdf_

Another Triangle with an additional diagonal.

In [None]:
genins

Imagine this is an independent analysis and we are grabbing our old model from disk.

In [None]:
# Retrieve model from disk
last_time = cl.read_pickle('last_time.pickle')

# Make predictions on new Triangle
last_time.predict(genins).ibnr_

### Simulations (BootstrapODPSample)

There are many other useful Transformers besides Development and Tail estimators.  We can also use the BootstrapODPSample to generate Triangle simluations from an existing Triangle.

In [None]:
# Model Specification
boot = cl.BootstrapODPSample(n_sims=10000, random_state=42)

# Model Fitting
boot.fit(genins)

# Use transform to generate "transformed" triangles
simulations  = boot.transform(genins)
simulations

In [None]:
cl.Chainladder().fit(simulations).ibnr_.sum('origin').plot(kind='hist', bins=100);

Let's do the same with BornhuetterFerguson.  It is entirely appropriate to consider our apriori loss ratio as being sampled from a distribution too.

In [None]:
# Model Specification
bf = cl.BornhuetterFerguson(
    # Normal Distribution with mean 0.8 and std 0.10
    apriori=0.80, 
    apriori_sigma=0.10 
)

# Model Fitting
bf.fit(simulations, sample_weight=exposure)

# Model Diagnostics
(((bf.full_triangle_ - simulations + genins) /
   exposure).loc[..., '2009', :132]
           .to_frame(origin_as_datetime=True).sample(4000).T.plot(
               color='blue', legend=False, alpha=0.005,
               title='AY2009 Loss Ratio Uncertainty'));

### Conclusion

**Data manipulation** using the Triangle follows pandas syntax closely, but extends the functionality to match the actuarial science domain.
* Slicing, grouping, aggregations, arithmetic all follow pandas style.
* The first two Triangle axes (`index`,`columns`) behave like pandas `index` and `columns`
* The last two Triangle axes (`origin`, `development`) behave like pandas `str` or `dt` column accessors

**Model construction** follows scikit-learn syntax closely.  The separation of model specification, model estimation, and model prediction is a key feature of the library:
* Composing complex models as well as swapping between different models very easy.
* Models can be fit at a different (often higher) level of aggregation than results.
* Models can be saved and used in other analyses.

`chainladder` will reinforce your `pandas` and `scikit-learn` skills and vice versa.

#### Additional Resources

* Visit the [Documentation](https://chainladder-python.readthedocs.io/en/latest/) for more tutorials and examples.
* Visit the [Discussion Forum](https://github.com/casact/chainladder-python/discussions) to ask usage questions.
* Visit the [Source Code Respository](https://github.com/casact/chainladder-python) to view the implementation or contribute to it.
