# Claims Risk Modelling

<small>This notebook explores risk modelling for pet insurance claims.
The goal is to identify patterns in claim frequency and severity, quantify risk drivers, and build models that support pricing, underwriting, and portfolio management decisions.

The dataset includes:
- **Pets** (breed, age, species, household)
- **Products** (cover type, premium, limits)
- **Policies** (policyâ€“pet relationships)
- **Claims** (dates, types, costs, outcomes)
- **Vet Visits** (location, cost patterns)

This notebook forms part of a wider analytics portfolio demonstrating data engineering, modelling, and business interpretation skills.

In [1]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.metrics import mean_squared_error, classification_report
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier

sns.set_theme(style="whitegrid", context="notebook")
pd.set_option("display.max_columns", None)

In [2]:
#Create function to save figures with timestamped filenames
def save_fig(name):
    filename = f"{name}_{dt.datetime.now().strftime('%Y%m%d_%H%M%S')}.png"
    plt.savefig(
        fr"C:\Users\leebe\Documents\GitHub\Professional-Portfolio\Pet Insurance Analytics Project\images\{filename}",
        dpi=300,
        bbox_inches='tight'
    )

In [15]:
# Load datasets
customers = pd.read_csv(r"c:\Users\leebe\Documents\VScode\portfolio project\Pet Insurance Analytics Project\data\raw\customers.csv")
pets = pd.read_csv(r"c:\Users\leebe\Documents\VScode\portfolio project\Pet Insurance Analytics Project\data\raw\pets.csv")
policies = pd.read_csv(r"c:\Users\leebe\Documents\VScode\portfolio project\Pet Insurance Analytics Project\data\raw\policies.csv")
products = pd.read_csv(r"c:\Users\leebe\Documents\VScode\portfolio project\Pet Insurance Analytics Project\data\raw\products.csv")
line_items = pd.read_csv(r"c:\Users\leebe\Documents\VScode\portfolio project\Pet Insurance Analytics Project\data\raw\invoice_line_items.csv")
claims = pd.read_csv(r"c:\Users\leebe\Documents\VScode\portfolio project\Pet Insurance Analytics Project\data\raw\claims.csv")
claim_payments = pd.read_csv(r"c:\Users\leebe\Documents\VScode\portfolio project\Pet Insurance Analytics Project\data\raw\claim_payments.csv")

print("Datasets loaded successfully.")

Datasets loaded successfully.


In [18]:
tables = {
    "customers": customers,
    "pets": pets,
    "policies": policies,
    "products": products,
    "line_items": line_items,
    "claims": claims,
    "claim_payments": claim_payments,
}

for name, df in tables.items():
    print(f"\n=== {name.upper()} ({df.shape[0]} rows, {df.shape[1]} cols) ===")
    display(df.head())


=== CUSTOMERS (60000 rows, 9 cols) ===


Unnamed: 0,customer_id,first_name,last_name,email,phone,address,city,country,created_at
0,1,Johnny,Gonzalez,robertstimothy@example.net,(885)906-5118,"06784 Lauren Divide Suite 513, Port Richard, N...",West James,Greece,2019-05-09
1,2,Daniel,Adkins,heathercollins@example.net,(752)430-0473x853,"77498 Michelle Mountain Suite 831, South Sheil...",North Kristen,Faroe Islands,2019-10-16
2,3,Meredith,Gardner,kathryn99@example.org,5784713734,"7968 Barrett Drive Suite 823, Port Michael, NE...",South Tiffany,Saudi Arabia,2024-06-14
3,4,Sarah,Logan,crystalpacheco@example.org,730.754.9758x61338,"USNV Johnson, FPO AA 23718",Port Taylormouth,Monaco,2021-10-30
4,5,Zachary,Miles,angelabentley@example.com,805-201-5867x259,"199 Garcia Ways Suite 881, West Ryanchester, S...",Smithport,Niue,2023-05-29



=== PETS (75150 rows, 10 cols) ===


Unnamed: 0,pet_id,customer_id,species,breed,gender,birthdate,age,weight_kg,microchip_id,neutered_spayed
0,1,1,Cat,British Shorthair,Male,2017-06-21,8,5.0,2236684291238,1
1,2,2,Cat,Scottish Fold,Female,2007-03-06,18,2.8,7532758582321,0
2,3,3,Cat,Bengal,Male,2008-06-26,17,6.5,9381040561081,1
3,4,4,Cat,American Shorthair,Male,2012-12-02,13,2.4,7931814725250,1
4,5,5,Dog,Papillon,Female,2014-08-16,11,7.9,1115385790206,0



=== POLICIES (120000 rows, 7 cols) ===


Unnamed: 0,policy_id,pet_id,product_id,start_date,end_date,age_at_policy_start,active
0,1,30289,3,2022-03-19,2024-03-08,1,0
1,2,45173,6,2021-11-17,2024-11-03,13,0
2,3,18891,15,2025-07-19,2026-04-08,1,1
3,4,53368,4,2025-02-02,2025-09-01,3,0
4,5,70238,10,2023-07-11,2024-10-05,7,0



=== PRODUCTS (27 rows, 6 cols) ===


Unnamed: 0,product_id,product_name,coverage_type,annual_limit,deductible,monthly_premium
0,1,Lifetime Essential,Lifetime,2000,100,28.33
1,2,Lifetime Plus,Lifetime,4000,75,87.99
2,3,Lifetime Premier,Lifetime,1000,75,57.44
3,4,Lifetime Elite,Lifetime,2000,50,89.68
4,5,Lifetime Ultra,Lifetime,12000,75,66.43



=== LINE_ITEMS (600000 rows, 5 cols) ===


Unnamed: 0,line_item_id,visit_id,treatment_code,description,cost
0,1,57096,CYTO,Cytology,189.68
1,2,14105,URINE,Urinalysis,170.64
2,3,18263,NEUTER,Neuter/Spay,456.15
3,4,61565,URINE,Urinalysis,89.03
4,5,82806,FOLLOW,Follow-up Consultation,64.11



=== CLAIMS (200000 rows, 6 cols) ===


Unnamed: 0,claim_id,policy_id,claim_date,claim_amount,diagnosis,status
0,1,106001,2022-06-08,732.65,Dental Issue,Rejected
1,2,42260,2022-10-20,240.94,Skin Condition,Closed
2,3,310,2021-10-17,1095.07,Feline Lower Urinary Tract Disease (FLUTD),Pending
3,4,31328,2025-09-16,253.66,Ear Infection,Rejected
4,5,59822,2025-12-03,238.97,Eye Infection,Rejected



=== CLAIM_PAYMENTS (300000 rows, 5 cols) ===


Unnamed: 0,payment_id,claim_id,payment_date,amount_paid,payment_method
0,1,71681,2025-01-15,108.74,Cheque
1,2,135982,2025-03-31,84.41,Bank Transfer
2,3,53948,2025-08-18,409.11,Bank Transfer
3,4,32955,2023-11-27,300.08,Cheque
4,5,92789,2025-06-03,243.98,Cheque
