In [21]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.sandbox.regression import gmm

plt.style.use('ggplot')

# 2. Method of Moments

The MOM is based on the law of large numbers, and implies that the sample mean will converge to the distributional mean as the number of observations increase.

Given this assumption, we can match the actual observed moments in the data and calculate the theoretical parameters of the distributions by solving a system of equations. Of course, this also means that there must be at least as many moments in the data as there are parameters in the selected distribution.

The steps involved in fitting are:
- Choosing a well defined distribution function
- Calculate the moments in the data
- Set up a system of equations from calculated moments and PDFs
- Solve for parameters of the PDF


$ E_{f}[X] =  \dfrac{1}{n} \Sigma x_i $

$ E_{f}[X^k] = \dfrac{1}{n} \Sigma x_{i}^k $

$ VAR_{f}(X) = E_{f}[X^2] - E_{f}[X]^2 $

In [3]:
df = pd.read_csv('data/sample_claims.csv')

For example, if we choose to model our dataset with an exponential distribution, which has the pdf:

$ f_x(\lambda) = \lambda e^{-\lambda x} $

Then the first 2 moments of an exponential distribution are:

$ E_{f}[X] = \dfrac{1}{\lambda} $

$ E_{f}[X^2] = \dfrac{2}{\lambda^2} $

$ VAR_{f}(X) = E_{f}[X^2] - E_{f}[X]^2 = \dfrac{1}{\lambda^2} $

The equations above can be derived either by using the moment generating function (MGF) or by simply evaluating the integral:

$ E_{f}[X^k] = \int_{0}^{\inf} x^{k} \lambda e^{-\lambda x} dx $

Using method of moments, we get:

$ \lambda = \dfrac{1}{\bar x} $

In [15]:
df_mean = df.mean()

exp_lambda = 1 / df_mean

print(exp_lambda)

claims    0.000334
dtype: float64


This model assumes that the data follows $ EXPO(\lambda = 0.000334) $