# Acquire Property and Casualty Dataset

This notebook downloads the needed dataset from Kaggle to complete the project on property and casualty insurance.

## Option #1: Obtain data via an API call

The Kaggle API and CLI tool provide easy ways to interact with Datasets on Kaggle. The commands available can make searching for and downloading Kaggle Datasets a seamless part of your data science workflow.

See: https://www.kaggle.com/docs/api

The Kaggle CLI tool will look for this token at ~/.kaggle/kaggle.json on Linux, OSX, and other UNIX-based operating systems, and at C:\Users\<Windows-username>\.kaggle\kaggle.json on Windows. Note, if the token is not there, an error will be raised. Hence, once you’ve downloaded your API token, you should move it from your Downloads folder to this folder.

CLI command to verify the dataset is present: kaggle datasets list -s agencyperformance

There are many other insurance datasets. See: kaggle datasets list -s insurance

In [6]:
# Must have API key
import kagglehub
from kagglehub import KaggleDatasetAdapter

# Set the path to the file you'd like to load
file_path = "finalapi.csv"

# Load the latest version
pdf_data = kagglehub.dataset_load(
  KaggleDatasetAdapter.PANDAS,
  "moneystore/agencyperformance",
  file_path,
    pandas_kwargs={"low_memory": False}
  # Provide any additional arguments like 
  # sql_query or pandas_kwargs. See the 
  # documenation for more information:
  # https://github.com/Kaggle/kagglehub/blob/main/README.md#kaggledatasetadapterpandas
)

print(pdf_data.shape)
pdf_data.head()

(213328, 49)


Unnamed: 0,AGENCY_ID,PRIMARY_AGENCY_ID,PROD_ABBR,PROD_LINE,STATE_ABBR,STAT_PROFILE_DATE_YEAR,RETENTION_POLY_QTY,POLY_INFORCE_QTY,PREV_POLY_INFORCE_QTY,NB_WRTN_PREM_AMT,...,PL_BOUND_CT_ELINKS,PL_QUO_CT_ELINKS,PL_BOUND_CT_PLRANK,PL_QUO_CT_PLRANK,PL_BOUND_CT_eQTte,PL_QUO_CT_eQTte,PL_BOUND_CT_APPLIED,PL_QUO_CT_APPLIED,PL_BOUND_CT_TRANSACTNOW,PL_QUO_CT_TRANSACTNOW
0,3,3,BOILERMACH,CL,IN,2005,0,0,0,40.0,...,0,0,0,103,50,288,0,0,0,0
1,3,3,BOILERMACH,CL,IN,2006,0,0,0,151.0,...,0,0,0,103,50,288,0,0,0,0
2,3,3,BOILERMACH,CL,IN,2007,0,0,0,40.0,...,0,0,0,103,50,288,0,0,0,0
3,3,3,BOILERMACH,CL,IN,2008,0,0,0,69.0,...,0,0,0,103,50,288,0,0,0,0
4,3,3,BOILERMACH,CL,IN,2009,0,0,0,28.0,...,0,0,0,103,50,288,0,0,0,0


### Option #2a: Download a .zip file of the data locally (using API)

In [18]:
# This downloads a .zip file
import kaggle.cli
import sys
import pandas as pd
from pathlib import Path
from zipfile import ZipFile

# download data set
# https://www.kaggle.com/datasets/moneystore/agencyperformance/finalapi.csv
dataset = "moneystore/agencyperformance"
sys.argv = [sys.argv[0]] + f"datasets download {dataset}".split(" ")
kaggle.cli.main()

zfile = ZipFile(f"{dataset.split('/')[1]}.zip")

dct_data = {f.filename:pd.read_csv(zfile.open(f)) for f in zfile.infolist() }

dct_data["finalapi.csv"]

Downloading agencyperformance.zip to /home/rich/Documents/Software/DevFile/Property&CasualtyInsurance


100%|██████████| 5.08M/5.08M [00:01<00:00, 4.52MB/s]





Unnamed: 0,AGENCY_ID,PRIMARY_AGENCY_ID,PROD_ABBR,PROD_LINE,STATE_ABBR,STAT_PROFILE_DATE_YEAR,RETENTION_POLY_QTY,POLY_INFORCE_QTY,PREV_POLY_INFORCE_QTY,NB_WRTN_PREM_AMT,...,PL_BOUND_CT_ELINKS,PL_QUO_CT_ELINKS,PL_BOUND_CT_PLRANK,PL_QUO_CT_PLRANK,PL_BOUND_CT_eQTte,PL_QUO_CT_eQTte,PL_BOUND_CT_APPLIED,PL_QUO_CT_APPLIED,PL_BOUND_CT_TRANSACTNOW,PL_QUO_CT_TRANSACTNOW
0,3,3,BOILERMACH,CL,IN,2005,0,0,0,40.00,...,0,0,0,103,50,288,0,0,0,0
1,3,3,BOILERMACH,CL,IN,2006,0,0,0,151.00,...,0,0,0,103,50,288,0,0,0,0
2,3,3,BOILERMACH,CL,IN,2007,0,0,0,40.00,...,0,0,0,103,50,288,0,0,0,0
3,3,3,BOILERMACH,CL,IN,2008,0,0,0,69.00,...,0,0,0,103,50,288,0,0,0,0
4,3,3,BOILERMACH,CL,IN,2009,0,0,0,28.00,...,0,0,0,103,50,288,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
213323,9998,9998,PERSUMBREL,PL,IN,2014,39,39,60,0.00,...,0,0,0,0,22,136,0,0,0,0
213324,9998,9998,PERSUMBREL,PL,IN,2015,15,15,18,0.00,...,0,0,0,0,22,136,0,0,0,0
213325,9998,9998,PERSUMBREL,PL,KY,2013,0,9,99999,132.19,...,0,0,0,0,22,136,0,0,0,0
213326,9998,9998,PERSUMBREL,PL,KY,2014,9,12,9,0.00,...,0,0,0,0,22,136,0,0,0,0


### Option #2b: Download data locally

In [None]:
# Use secure sign-in
kagglehub.login()
# from kagglehub import KaggleDatasetAdapter

In [12]:
# Download the data from Kaggle

import kagglehub
# If needed uncomment lines 5 and 6
# import os
# os.environ["KAGGLEHUB_CACHE"] = ""

path = kagglehub.dataset_download("moneystore/agencyperformance")
print("Path to dataset files:", path)

Downloading from https://www.kaggle.com/api/v1/datasets/download/moneystore/agencyperformance?dataset_version_number=1...


100%|██████████| 5.08M/5.08M [00:00<00:00, 12.6MB/s]

Extracting model files...





Path to dataset files: datasets/moneystore/agencyperformance/versions/1
