# Background research

Health care fraud has a human face too. Individual victims of health care fraud are sadly easy to find. These are people who are exploited and subjected to unnecessary or unsafe medical procedures. Or whose medical records are compromised or whose legitimate insurance information is used to submit falsified claims.

Some of the more common types of fraud committed by dishonest providers include:

- Billing for services that were never rendered—by using genuine patient information, sometimes obtained through identity theft, to fabricate entire claims or by padding otherwise legitimate claims with charges for procedures or services that did not take place.
- Billing for more expensive services or procedures than were actually provided or performed, commonly known as "upcoding"—i.e., falsely billing for a higher-priced treatment than was actually provided (which often requires the accompanying "inflation" of the patient's diagnosis code to a more serious condition consistent with the false procedure code).
- Performing medically unnecessary services solely for the purpose of generating insurance payments—this is seen very often in diagnostic-testing schemes such as nerve-conduction and genetic testing.
- Misrepresenting non-covered treatments as medically necessary covered treatments for purposes of obtaining insurance payments—this is widely seen in cosmetic-surgery schemes, in which non-covered cosmetic procedures such as "nose jobs" are billed to patients' insurers as deviated-septum repairs.
- Falsifying a patient's diagnosis and medical record to justify tests, surgeries or other procedures that aren't medically necessary.
- Unbundling—billing for each step of a procedure as if they are separate procedures.
- Billing a patient more than the required co-pay amount for services that were prepaid or paid-in-full by the benefit plan under the terms of a managed care contract.
- Accepting kickbacks for patient referrals.
- Waiving patient co-pays or deductibles for medical or dental care and over-billing the insurance carrier or benefit plan (insurers often set the policy with regard to the waiver of co-pays through its provider contracting process; while, under Medicare, routinely waiving co-pays is prohibited and may only be waived due to "financial hardship").

### Problem Statement

The goal of this project is to " predict the potentially fraudulent providers " based on the claims filed by them.along with this, we will also discover important variables helpful in detecting the behaviour of potentially fraud providers. further, we will study fraudulent patterns in the provider's claims to understand the future behaviour of providers.

# Dataset

In [1]:
import pandas as pd

pd.set_option('display.max_columns', 1000)
pd.set_option('display.max_rows', 1000)

In [2]:
# import csv
inpatient = pd.read_csv("../data/Train_Inpatientdata-1542865627584.csv")
outpatient = pd.read_csv("../data/Train_Outpatientdata-1542865627584.csv")
beneficiary = pd.read_csv("../data/Train_Beneficiarydata-1542865627584.csv")
target = pd.read_csv("../data/Train-1542865627584.csv")

In [3]:
# how do they look?
display(inpatient.head(3))
display(outpatient.head(3))
display(beneficiary.head(3))
display(target.head(3))

Unnamed: 0,BeneID,ClaimID,ClaimStartDt,ClaimEndDt,Provider,InscClaimAmtReimbursed,AttendingPhysician,OperatingPhysician,OtherPhysician,AdmissionDt,ClmAdmitDiagnosisCode,DeductibleAmtPaid,DischargeDt,DiagnosisGroupCode,ClmDiagnosisCode_1,ClmDiagnosisCode_2,ClmDiagnosisCode_3,ClmDiagnosisCode_4,ClmDiagnosisCode_5,ClmDiagnosisCode_6,ClmDiagnosisCode_7,ClmDiagnosisCode_8,ClmDiagnosisCode_9,ClmDiagnosisCode_10,ClmProcedureCode_1,ClmProcedureCode_2,ClmProcedureCode_3,ClmProcedureCode_4,ClmProcedureCode_5,ClmProcedureCode_6
0,BENE11001,CLM46614,2009-04-12,2009-04-18,PRV55912,26000,PHY390922,,,2009-04-12,7866,1068.0,2009-04-18,201,1970,4019,5853,7843.0,2768,71590.0,2724.0,19889.0,5849.0,,,,,,,
1,BENE11001,CLM66048,2009-08-31,2009-09-02,PRV55907,5000,PHY318495,PHY318495,,2009-08-31,6186,1068.0,2009-09-02,750,6186,2948,56400,,,,,,,,7092.0,,,,,
2,BENE11001,CLM68358,2009-09-17,2009-09-20,PRV56046,5000,PHY372395,,PHY324689,2009-09-17,29590,1068.0,2009-09-20,883,29623,30390,71690,34590.0,V1581,32723.0,,,,,,,,,,


Unnamed: 0,BeneID,ClaimID,ClaimStartDt,ClaimEndDt,Provider,InscClaimAmtReimbursed,AttendingPhysician,OperatingPhysician,OtherPhysician,ClmDiagnosisCode_1,ClmDiagnosisCode_2,ClmDiagnosisCode_3,ClmDiagnosisCode_4,ClmDiagnosisCode_5,ClmDiagnosisCode_6,ClmDiagnosisCode_7,ClmDiagnosisCode_8,ClmDiagnosisCode_9,ClmDiagnosisCode_10,ClmProcedureCode_1,ClmProcedureCode_2,ClmProcedureCode_3,ClmProcedureCode_4,ClmProcedureCode_5,ClmProcedureCode_6,DeductibleAmtPaid,ClmAdmitDiagnosisCode
0,BENE11002,CLM624349,2009-10-11,2009-10-11,PRV56011,30,PHY326117,,,78943,V5866,V1272,,,,,,,,,,,,,,0,56409.0
1,BENE11003,CLM189947,2009-02-12,2009-02-12,PRV57610,80,PHY362868,,,6115,,,,,,,,,,,,,,,,0,79380.0
2,BENE11003,CLM438021,2009-06-27,2009-06-27,PRV57595,10,PHY328821,,,2723,,,,,,,,,,,,,,,,0,


Unnamed: 0,BeneID,DOB,DOD,Gender,Race,RenalDiseaseIndicator,State,County,NoOfMonths_PartACov,NoOfMonths_PartBCov,ChronicCond_Alzheimer,ChronicCond_Heartfailure,ChronicCond_KidneyDisease,ChronicCond_Cancer,ChronicCond_ObstrPulmonary,ChronicCond_Depression,ChronicCond_Diabetes,ChronicCond_IschemicHeart,ChronicCond_Osteoporasis,ChronicCond_rheumatoidarthritis,ChronicCond_stroke,IPAnnualReimbursementAmt,IPAnnualDeductibleAmt,OPAnnualReimbursementAmt,OPAnnualDeductibleAmt
0,BENE11001,1943-01-01,,1,1,0,39,230,12,12,1,2,1,2,2,1,1,1,2,1,1,36000,3204,60,70
1,BENE11002,1936-09-01,,2,1,0,39,280,12,12,2,2,2,2,2,2,2,2,2,2,2,0,0,30,50
2,BENE11003,1936-08-01,,1,1,0,52,590,12,12,1,2,2,2,2,2,2,1,2,2,2,0,0,90,40


Unnamed: 0,Provider,PotentialFraud
0,PRV51001,No
1,PRV51003,Yes
2,PRV51004,No


In [8]:
# join all tables into one
train = inpatient.append(outpatient, ignore_index=True, sort=False).merge(beneficiary).merge(target)
display(train[train['BeneID'] == 'BENE11011'])

train.shape

Unnamed: 0,BeneID,ClaimID,ClaimStartDt,ClaimEndDt,Provider,InscClaimAmtReimbursed,AttendingPhysician,OperatingPhysician,OtherPhysician,AdmissionDt,ClmAdmitDiagnosisCode,DeductibleAmtPaid,DischargeDt,DiagnosisGroupCode,ClmDiagnosisCode_1,ClmDiagnosisCode_2,ClmDiagnosisCode_3,ClmDiagnosisCode_4,ClmDiagnosisCode_5,ClmDiagnosisCode_6,ClmDiagnosisCode_7,ClmDiagnosisCode_8,ClmDiagnosisCode_9,ClmDiagnosisCode_10,ClmProcedureCode_1,ClmProcedureCode_2,ClmProcedureCode_3,ClmProcedureCode_4,ClmProcedureCode_5,ClmProcedureCode_6,isOutpatient,DOB,DOD,Gender,Race,RenalDiseaseIndicator,State,County,NoOfMonths_PartACov,NoOfMonths_PartBCov,ChronicCond_Alzheimer,ChronicCond_Heartfailure,ChronicCond_KidneyDisease,ChronicCond_Cancer,ChronicCond_ObstrPulmonary,ChronicCond_Depression,ChronicCond_Diabetes,ChronicCond_IschemicHeart,ChronicCond_Osteoporasis,ChronicCond_rheumatoidarthritis,ChronicCond_stroke,IPAnnualReimbursementAmt,IPAnnualDeductibleAmt,OPAnnualReimbursementAmt,OPAnnualDeductibleAmt,PotentialFraud
370,BENE11011,CLM38412,2009-02-14,2009-02-22,PRV52405,5000,PHY369659,PHY392961,PHY349768,2009-02-14,431,1068.0,2009-02-22,67.0,43491,2762.0,7843,32723.0,V1041,4254.0,25062.0,40390.0,4019.0,,331.0,,,,,,0,1914-03-01,,2,2,0,1,360,12,12,2,1,1,2,2,1,1,2,2,1,1,5000,1068,250,320,No
459,BENE11011,CLM144521,2009-01-18,2009-01-18,PRV52314,50,PHY379398,,,,78900,0.0,,,78969,78701.0,V5866,59389.0,2449,,,,,,,,,,,,1,1914-03-01,,2,2,0,1,360,12,12,2,1,1,2,2,1,1,2,2,1,1,5000,1068,250,320,No
682,BENE11011,CLM347780,2009-05-08,2009-05-08,PRV51012,50,PHY429635,,PHY322331,,37611,0.0,,,37500,,,,,,,,,,,,,,,,1,1914-03-01,,2,2,0,1,360,12,12,2,1,1,2,2,1,1,2,2,1,1,5000,1068,250,320,No
730,BENE11011,CLM507201,2009-08-04,2009-08-04,PRV51063,80,PHY345842,,,,311,0.0,,,29633,2724.0,3009,,,,,,,,,,,,,,1,1914-03-01,,2,2,0,1,360,12,12,2,1,1,2,2,1,1,2,2,1,1,5000,1068,250,320,No


(558211, 56)

In [5]:
train.to_csv(r'../data/train.csv', index = False)