Medicare fraud detection

A machine learning based web application to help detect fraudulent practitioners among Medicare.

Check the notebooks for more details about data processing and model training/evaluation.

Requirements

Python 3.9+
conda (to create the virtual environments)

Usage

Launching the app

Make sure that the requirements are installed :

pip install -r requirements.txt

Launch the app from terminal :

python app.py

It is very important that the CSV file respects a specific order of columns / variables :

1. Specialty
2. State
3. Total HCPCS
4. Male Patients Count
5. Average Age
6. Total Services
7. Total Patients
8. Average Risk Score
9. Charges Submitted
10. Charges Paid

Creating the virtual environment

Make sure that conda is installed and run the following command :

conda env create -f medicare-fraud-env.yml

Activate the environment :

conda activate medicare-fraud

Data used

Overview

The dataset used describes the services and procedures that health care professionals provide to Medicare beneficiaries. The records in the dataset contain various provider-level attributes, such as National Provider Identifier (NPI), first and last name, gender, address, etc. In addition, the records contain information that describes a provider's Medicare activity in a given year. Examples include: the procedure performed, the average fee submitted to Medicare, the average amount paid by Medicare, and the location of service.

This dataset is aggregated at the NPI level, the identifier of a provider, and contains information for the years 2016 through 2019 with over one million records and 73 variables for each year, which makes this database an excellent candidate for data analysis and machine learning.

The Medicare provider fraud labels are identified using LEIE data, LEIE is maintained by the OIG in accordance with sections 1128 and 1156 of the Social Security Act and is updated monthly. The OIG has the authority to exclude providers from federally funded health care programs for various reasons. Excluded individuals cannot receive payment from federal health programs for any services, and must apply for reinstatement once their exclusion period has expired. The current LEIE data format contains 18 attributes that describe the provider and the reason for the exclusion.

Data sampling

After data processing and labeling , the final dataset includes information on 4,419,977 practitioners, of which 1,276 are fraudulent, which represents a little over 0.03% of the total workforce.

The fraud rate (0.03%) does not reflect reality (10% worldwide), therefore it is necessary to select a sample of the negative class (population) to increase the fraud rate up to 10%. The sample was selected using random sampling, and statistical tests were used to ensure that it was not biased.

Check the notebooks for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

app

app

data

data

models

models

notebooks

notebooks

.gitignore

.gitignore

README.md

README.md

app.py

app.py

medicare-fraud-env.yml

medicare-fraud-env.yml

requirements.txt

requirements.txt

Repository files navigation

Medicare fraud detection

Requirements

Usage

Launching the app

Creating the virtual environment

Data used

Overview

Data sampling

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
app		app
data		data
models		models
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md
app.py		app.py
medicare-fraud-env.yml		medicare-fraud-env.yml
requirements.txt		requirements.txt

sandeshregmi/medicare-fraud-detection

Folders and files

Latest commit

History

Repository files navigation

Medicare fraud detection

Requirements

Usage

Launching the app

Creating the virtual environment

Data used

Overview

Data sampling

About

Resources

Stars

Watchers

Forks

Languages