Skip to content
Looking for factors indicating fraud using insurance claims data.
Jupyter Notebook
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
Alpha Insurance-CCD.ipynb
Alpha Insurance-Individual Variables.ipynb
Alpha Insurance-Modeling.ipynb
Alpha Insurance.ipynb
Anova Test.txt
Plotly Example.ipynb
Undersample Code.ipynb


You have been hired by Alpha Insurance to develop predictive models to determine which automobile claims are fraudulent. You have been given data on approximately 5000 auto claims which includes a variable indicating whether the company believes the claim is fraudulent or not.


  • Robert Shea

Bryant University ~ Fall 2018


These variables appear to be the best for detecting fraudulent claims:

  • Claim Amount - Uncommonly high claim amounts are more likely to be fraudulent.
  • Claim Cause - The more severe claim causes (fire and collision) will be less likely to be fraudulent.
  • Claim Report Type - Fraud claims will be reported with as little human interaction as possible.
  • Employment Status - Claimants who are not currently employed are more likely to report fraudulent claims.
  • Income - The higher the level of education, the less likely reports are to be fraudulent. (This may also be linked with income)


Data Exploration

  • Univariate exploration
  • Bivariate exploration


  • Impute missing values
  • Handle outliers
  • Transform variables with functions
  • Transform variables with binning
  • Encoding
  • Balancing Sample


  • Regression
  • Decision Tree
  • Neural Network
  • Other
  • Model Selection


You can’t perform that action at this time.