## Exploratory data analysis
#### Pablo Gonzalez B.

### Variable Description

`GRCODE NAIC` company code (including insurer groups and single insurers)

`GRNAME NAIC` company name (including insurer groups and single insurers)

`AccidentYear` Accident year(1988 to 1997)

`DevelopmentYear` Development year (1988 to 1997)

`DevelopmentLag` Development year (AY-1987 + DY-1987 - 1)

`IncurLoss_` Incurred losses and allocated expenses reported at year end

`CumPaidLoss_` Cumulative paid losses and allocated expenses at year end

`BulkLoss_` Bulk and IBNR reserves on net losses and defense and cost containment expenses reported at year end

`PostedReserve97_` Posted reserves in year 1997 taken from the Underwriting and Investment Exhibit – Part 2A, including net losses unpaid and unpaid loss adjustment expenses

`EarnedPremDIR_` Premiums earned at incurral year - direct and assumed

`EarnedPremCeded_` Premiums earned at incurral year - ceded

`EarnedPremNet_` Premiums earned at incurral year - net

`Single` 1 indicates a single entity, 0 indicates a group insurer

### Import libraries

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

### Read data

In [None]:
data = pd.read_csv('medmal_pos.csv')
data.shape

### Descriptive statistics, dtypes and non-null count

In [None]:
data.describe()

In [None]:
data.info()

In [None]:
numeric_features = data.select_dtypes(include=['int64', 'float64']).columns.tolist()
non_numeric_features = data.select_dtypes(include=['object']).columns.tolist()

print(f'Numeric columns: {len(numeric_features)}, Non-numeric columns: {len(non_numeric_features)}')

In [None]:
data[non_numeric_features[0]].unique()

### Outliers analysis and distribution of numerical features

In [None]:
plt.rcParams['figure.figsize'] = (26, 13)
sns.boxplot(data[numeric_features], orient="h");

In [None]:
plt.rcParams['figure.figsize'] = (26, 4 * len(numeric_features))
fig, axes = plt.subplots(len(numeric_features), 1)

for i in range(len(axes)):
    sns.histplot(data=data, x=numeric_features[i], kde=True, legend=True, ax=axes[i])
plt.show();

### Correlations between numerical features

In [None]:
plt.rcParams['figure.figsize'] = (22, 7)
sns.heatmap(data[numeric_features].corr(method='spearman'), annot=True);

### Scatterplot of relevant correlations grouped by `AccidentYear`

In [None]:
plt.rcParams['figure.figsize'] = (18, 7)
sns.scatterplot(data=data, x='IncurLoss_F2', y='PostedReserve97_F2', hue='AccidentYear');

In [None]:
sns.scatterplot(data=data, x='IncurLoss_F2', y='EarnedPremNet_F2', hue='AccidentYear');

In [None]:
sns.scatterplot(data=data, x='EarnedPremNet_F2', y='PostedReserve97_F2', hue='AccidentYear');