# Stochastic Modeling for Car Insurance

A stochastic model for a car insurance company's total cost of damages from traffic accidents goes back to the work by Van der Lann and Louter, "A statistical model for the costs of passenger car traffic accidents", Journal of the Royal Statistical Society (1986).

For every $k=1,2,3\ldots$ we denote by the random variable $X_k$ the US dollar amount of a damage from a policy holder's traffic accident which will occur during the year 2019.

We assume that $X_1$, $X_2$,... is an i.i.d. (independent and identically distributed) sequence of exponentially distributed random variables with an average claim size of $\$1,000$ USD. Suppose the (random) total number of accidents $N$ in 2019 is expected to be Poisson distributed with 20 claims on average.

It is assumed that the number of accidents is independent of the US dollar amount of damages for each accident. That is, the random variable $N$ is independent of the random variables $X_1$, $X_2$,...

The total costs for the insurance company by the end of 2019 will thus be given by the <b>random sum</b>...

$$S_N := X_1 + X_2 + \dots + X_N = \sum_{k = 1}^{N} X_k.$$

The goal of our analysis is to approximate...

1. the expected total costs $$\mathbb{E}[S_N]$$ for the insurance company in 2019, and


2. the probabilities that the total cost will not exceed $K$ USD, i.e., 

$$\mathbb{P}[S_N \leq K] \quad \text{for} \, K = \$20000,\, \$40000,\, \$60000$$

via simulation.

In [1]:
import numpy as np
import math

<b>Step 1:</b><br>
First, we write a function which simulates the random variable $S_N$.

In [2]:
def randomSum(averageClaimSize,averageNumberOfClaims):
    numberOfClaims = np.random.poisson(averageNumberOfClaims)
    sampleRandomSum = sum([np.random.exponential(averageClaimSize) for x in range(numberOfClaims)])
    return sampleRandomSum  

In [3]:
## Testing our function
randomSum(1000,20)

17573.49494592012

<b>Step 2:</b><br>We write a simulator function which uses the function <tt>randomSum()</tt> to simulate $M \in \mathbb{N}$ samples from the random variable $S_N$.

In [4]:
def simulator(averageClaimSize,averageNumberOfClaims,M):
    samples = [randomSum(averageClaimSize,averageNumberOfClaims) for x in range(M)]
    return samples

In [5]:
## Testing our function
simulator(1000,20,10)

[17487.14992525941,
 25117.98492050578,
 21517.244128519527,
 6605.160519893929,
 19651.82458916048,
 27314.87145852899,
 13052.734037078937,
 20828.19367793959,
 9200.006651935206,
 12402.87463689419]

<b>Step 3:</b><br>It holds via the so-called <b>Wald's Identity</b> that the expectation of the random sum $S_N$ is actually given by the formula

\begin{equation}
\mathbb{E}[S_N] = \mathbb{E}[N] \cdot \mathbb{E}[X_1] = 20 \cdot \$1,000 = \$20,000.
\end{equation}

We'll check via the empirical mean that

$$ \$20,000 = \mathbb{E}[S_N] \approx \frac{1}{M} \sum_{m=1}^M s^{(m)}_N$$

where $s^{(1)}_N, s^{(2)}_N, \ldots, s^{(M)}_N$ denote $M$ independent realizations (samples) from the random variable $S_N$. Use $M = 10, 100, 1000, 10000$ simulations.  

We write a function <tt>MCsimulation()</tt> which uses the function <tt>simulator()</tt> to compute the empirical mean.

In [6]:
def MCsimulation(averageClaimSize,averageNumberOfClaims,M):
    empiricalMean = sum(simulator(averageClaimSize,averageNumberOfClaims,M))/M
    return empiricalMean

In [7]:
## Testing our function
print(MCsimulation(1000,20,10))
print(MCsimulation(1000,20,100))
print(MCsimulation(1000,20,1000))
print(MCsimulation(1000,20,10000))

20036.18846339164
20883.690092824807
20335.455520416715
20036.131589115816


In [8]:
## Computing the absolute error
print(np.absolute(MCsimulation(1000,20,10)-20000))
print(np.absolute(MCsimulation(1000,20,100)-20000))
print(np.absolute(MCsimulation(1000,20,1000)-20000))
print(np.absolute(MCsimulation(1000,20,10000)-20000))
print(np.absolute(MCsimulation(1000,20,20000)-20000))
print(np.absolute(MCsimulation(1000,20,50000)-20000))

1056.798981134103
1120.1173330961828
40.29723563662992
2.0122029191916226
12.064044342310808
24.561721672627755


<b>Step 4:</b><br>The desired probabilities $\mathbb{P}[S_N \leq K]$ for $K = \$20000,\, \$400000,\, \$600000$ can be computed as expectations via an indicator function

$$ \mathbb{P}[S_N \leq K] = \mathbb{E}[1_{\{S_N \leq K\}}].$$

We use once more the empricial mean to approximate

$$ \mathbb{E}[1_{\{S_N \leq K\}}] \approx \frac{1}{M} \sum_{m=1}^M 1_{\{s^{(m)}_N \leq K \}}$$

with $M$ independent realizations (samples) from the random variable $S_N$ (again denoted by $s^{(1)}_N, s^{(2)}_N, \ldots, s^{(M)}_N$).

We'll write a function <tt>MCprobEstimation()</tt> which estimates the probabilities $\mathbb{P}[S_N \leq K]$ for $K = \$20000,\, \$400000,\, \$600000$ as described based on $M$ simulations of $S_N$.

In [9]:
def MCprobEstimation(averageClaimSize, averageNumberOfClaims, K, M):
    indicatorArray = [1 if x <= K else 0 for x in simulator(averageClaimSize,averageNumberOfClaims,M)]
    empiricalProb = sum(indicatorArray)/M
    return empiricalProb

In [10]:
## Testing our function
MCprobEstimation(1000, 20, 20000, 10)

0.7

Testing our function for all $K = \$20000,\, \$40000,\, \$60000$ with varying $M = 100, 1000, 10000$ simulations:

In [11]:
print(MCprobEstimation(1000,20,20000,100))
print(MCprobEstimation(1000,20,20000,1000))
print(MCprobEstimation(1000,20,20000,10000))

0.54
0.586
0.5337


In [12]:
print(MCprobEstimation(1000,20,40000,100))
print(MCprobEstimation(1000,20,40000,1000))
print(MCprobEstimation(1000,20,40000,10000))

1.0
0.993
0.9969


In [13]:
print(MCprobEstimation(1000,20,60000,100))
print(MCprobEstimation(1000,20,60000,1000))
print(MCprobEstimation(1000,20,60000,10000))

1.0
1.0
1.0
