# Flagging suspicious healthcare claims with Amazon SageMaker
by Vikrant Kahlir, Elena Ehrlich, and Hanif Mahboobi | on 10 FEB 2020 | in Amazon SageMaker, Artificial Intelligence

This is a reproduced notebook from someone else.

## Introduction

The National Health Care Anti-Fraud Association (NHCAA) estimates that healthcare fraud costs the nation approximately 68 billion \\$ annually, 3 of the nation’s 2.26 trillion \\$ in healthcare spending. This is a conservative estimate; other estimates range as high as 10% of annual healthcare expenditure, or 230 billion \\$.

Healthcare fraud inevitably results in higher premiums and out-of-pocket expenses for consumers, as well as reduced benefits or coverage.

Labeling a claim as fraudulent could require a complex and detailed investigation. This post demonstrates how to train an Amazon SageMaker model to flag anomalous post-payment Medicare inpatient claims and target them for further investigation on suspicion of fraud. The solution doesn’t need labeled data; it uses unsupervised machine learning (ML) to create a model to flag suspicious claims.

Anomaly detection is a difficult problem due to the following challenges:

* The difference between data normality and abnormality is often not clear. Anomaly detection methods could be application-specific. For example, in clinical data, a small deviation could be an outlier, but in a marketing application, you need a significant deviation to justify an outlier.
* Noise in data may appear as deviations in attribute values or missing values. Noise may hide an outlier or flag deviation as an outlier.
* Providing clear justification for an outlier may be difficult.\
This solution uses Amazon SageMaker, which provides developer and data scientists with the ability to build, train, and deploy ML models. Amazon SageMaker is a fully managed service that covers the entire ML workflow to label and prepare your data, choose an algorithm, train the model, tune and optimize it for deployment, make predictions, and take action.

The end-to-end implementation of this solution is available as an Amazon SageMaker Jupyter Notebook. For more information, see the GitHub repository.

## Solution overview

In this example we’re going to use Amazon SageMaker to:\
1. download the dataset and visualize it using a Jupyter notebook; 
2. perform data cleaning locally within the notebook and look at a sample of the data;
3. do feature engineering on text columns using the word2vec;
4. fit a principal components analysis (PCA) model to the preprocessed dataset; 
5. score the entire dataset;
6. apply a threshold to the scores to identify any suspicious or anomalous claims.