Skip to content
Looking for factors indicating fraud using insurance claims data.
Jupyter Notebook
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore
Alpha Insurance-CCD.ipynb
Alpha Insurance-Individual Variables.ipynb
Alpha Insurance-Modeling.ipynb
Alpha Insurance.ipynb
Anova Test.txt
Plotly Example.ipynb
README.md
Undersample Code.ipynb

README.md

Alpha-Insurance-Fraud-Detection

You have been hired by Alpha Insurance to develop predictive models to determine which automobile claims are fraudulent. You have been given data on approximately 5000 auto claims which includes a variable indicating whether the company believes the claim is fraudulent or not.

Author:

  • Robert Shea

Bryant University ~ Fall 2018

Hypothesis

These variables appear to be the best for detecting fraudulent claims:

  • Claim Amount - Uncommonly high claim amounts are more likely to be fraudulent.
  • Claim Cause - The more severe claim causes (fire and collision) will be less likely to be fraudulent.
  • Claim Report Type - Fraud claims will be reported with as little human interaction as possible.
  • Employment Status - Claimants who are not currently employed are more likely to report fraudulent claims.
  • Income - The higher the level of education, the less likely reports are to be fraudulent. (This may also be linked with income)

Process

Data Exploration

  • Univariate exploration
  • Bivariate exploration

Transformations

  • Impute missing values
  • Handle outliers
  • Transform variables with functions
  • Transform variables with binning
  • Encoding
  • Balancing Sample

Modeling

  • Regression
  • Decision Tree
  • Neural Network
  • Other
  • Model Selection

Sources

You can’t perform that action at this time.