In [15]:
import pandas as pd

df = pd.read_csv('insurance_claims_cleaned_fixed.csv', dtype={'column_name': 'string'})

We will be looking at hour of the day, insured occupation, and insured education level as they relate to fraud reported

#lets first get some info on insured hour of the day as it relates to fraud reported

In [16]:
#lets get fraud rate reported by hour of the day and put it into percentages and into a table

#now lets see the percentages in a table
pd.crosstab(df['incident_hour_of_the_day'], df['fraud_reported'], normalize = 'index')


fraud_reported,No,Yes
incident_hour_of_the_day,Unnamed: 1_level_1,Unnamed: 2_level_1
0,0.769231,0.230769
1,0.724138,0.275862
2,0.677419,0.322581
3,0.773585,0.226415
4,0.826087,0.173913
5,0.727273,0.272727
6,0.681818,0.318182
7,0.8,0.2
8,0.833333,0.166667
9,0.906977,0.093023


# Summary: Fraud Rates by Hour of the Day

## Key Findings

### **Highest Risk Hours (Fraud Rate > 30%)**
- **Hour 11 (11 AM)**: **40.0%** fraud rate - *Highest overall*
- **Hour 14 (2 PM)**: **34.9%** fraud rate
- **Hour 18 (6 PM)**: **34.1%** fraud rate  
- **Hour 15 (3 PM)**: **33.3%** fraud rate
- **Hour 10 (10 AM)**: **32.6%** fraud rate
- **Hour 16 (4 PM)**: **32.7%** fraud rate
- **Hour 2 (2 AM)**: **32.3%** fraud rate
- **Hour 6 (6 AM)**: **31.8%** fraud rate

### **Lowest Risk Hours (Fraud Rate < 20%)**
- **Hour 9 (9 AM)**: **9.3%** fraud rate - *Lowest overall*
- **Hour 17 (5 PM)**: **14.8%** fraud rate
- **Hour 22 (10 PM)**: **13.2%** fraud rate
- **Hour 8 (8 AM)**: **16.7%** fraud rate
- **Hour 4 (4 AM)**: **17.4%** fraud rate
- **Hour 13 (1 PM)**: **17.4%** fraud rate

## Temporal Patterns

### **Business Hours Analysis (9 AM - 5 PM)**
- **Morning Rush (9-11 AM)**: Mixed pattern - Hour 9 has lowest fraud (9.3%), but Hour 11 has highest (40.0%)
- **Lunch/Afternoon (12-3 PM)**: High fraud activity - Hours 14-15 show 34-35% fraud rates
- **Late Afternoon (4-5 PM)**: Hour 16 shows high fraud (32.7%), but Hour 17 drops significantly (14.8%)

### **Off-Hours Analysis**
- **Late Night/Early Morning (12-6 AM)**: Variable rates ranging from 17.4% to 32.3%
- **Evening Hours (6-11 PM)**: Generally moderate fraud rates (13-34%)

## Strategic Implications

### **Red Flag Hours** ⚠️
Claims occurring at **11 AM, 2 PM, 3 PM, and 6 PM** should receive enhanced scrutiny due to fraud rates exceeding 30%.

### **Low-Risk Hours** ✅
Claims at **9 AM, 5 PM, and 10 PM** have significantly lower fraud probability and may require standard processing.

### **Operational Insights**
- The **11 AM peak** suggests potential coordination with business operations or specific fraudulent schemes
- The **5 PM dip** despite high business activity suggests genuine claims during end-of-workday incidents
- **Evening hours (6-8 PM)** show elevated fraud, possibly targeting after-hours claim processing

### **Risk Assessment Score**
**Average fraud rate across all hours: 24.7%**
- Hours with rates **>35%**: Extreme risk
- Hours with rates **25-35%**: High risk  
- Hours with rates **15-25%**: Moderate risk
- Hours with rates **<15%**: Low risk

In [17]:
#now lets take a look at insured occupation against fraud reported


#lets do a crosstab to see the percentages
pd.crosstab(df['insured_occupation'], df['fraud_reported'], normalize = 'index')

fraud_reported,No,Yes
insured_occupation,Unnamed: 1_level_1,Unnamed: 2_level_1
adm-clerical,0.830769,0.169231
armed-forces,0.753623,0.246377
craft-repair,0.702703,0.297297
exec-managerial,0.631579,0.368421
farming-fishing,0.698113,0.301887
handlers-cleaners,0.796296,0.203704
machine-op-inspct,0.763441,0.236559
other-service,0.830986,0.169014
priv-house-serv,0.830986,0.169014
prof-specialty,0.788235,0.211765


# Summary: Fraud Rates by Insured Occupation

## Key Findings

### **Highest Risk Occupations (Fraud Rate > 30%)**
- **Executive/Managerial**: **36.8%** fraud rate - *Highest overall*
- **Craft/Repair**: **29.7%** fraud rate
- **Farming/Fishing**: **30.2%** fraud rate
- **Transport/Moving**: **29.2%** fraud rate

### **Moderate Risk Occupations (Fraud Rate 25-30%)**
- **Tech Support**: **28.2%** fraud rate
- **Sales**: **27.6%** fraud rate
- **Armed Forces**: **24.6%** fraud rate
- **Machine Operator/Inspector**: **23.7%** fraud rate

### **Lower Risk Occupations (Fraud Rate < 22%)**
- **Administrative/Clerical**: **16.9%** fraud rate - *Lowest overall*
- **Other Service**: **16.9%** fraud rate
- **Private Household Service**: **16.9%** fraud rate
- **Handlers/Cleaners**: **20.4%** fraud rate
- **Professional Specialty**: **21.2%** fraud rate
- **Protective Services**: **22.2%** fraud rate

## Occupational Risk Analysis

### **White-Collar vs Blue-Collar Pattern**
- **Executive/Managerial** roles show the **highest fraud rate (36.8%)**, suggesting potential for complex financial fraud schemes
- **Administrative/Clerical** workers have the **lowest rate (16.9%)**, indicating more straightforward, verifiable claims
- **Professional specialty** workers show moderate-low fraud (21.2%), possibly due to stable employment and income

### **Industry Sector Insights**

#### **High-Risk Sectors**
- **Construction/Repair (29.7%)**: Physical nature of work may enable staged accidents
- **Agriculture (30.2%)**: Rural/remote work environments may complicate claim verification
- **Transportation (29.2%)**: Vehicle-related incidents provide fraud opportunities
- **Executive Leadership (36.8%)**: Access to resources and knowledge for sophisticated schemes

#### **Low-Risk Sectors**
- **Administrative Support (16.9%)**: Office-based, documented work environments
- **Service Industries (16.9%)**: Customer-facing roles with witnesses
- **Protective Services (22.2%)**: Law enforcement background may deter fraud

### **Fraud Risk Tiers**

#### **🔴 HIGH RISK (>30% fraud rate)**
1. **Executive/Managerial**: 36.8%
2. **Farming/Fishing**: 30.2%

#### **🟡 ELEVATED RISK (25-30% fraud rate)**
3. **Craft/Repair**: 29.7%
4. **Transport/Moving**: 29.2%
5. **Tech Support**: 28.2%
6. **Sales**: 27.6%

#### **🟢 MODERATE RISK (20-25% fraud rate)**
7. **Armed Forces**: 24.6%
8. **Machine Operator/Inspector**: 23.7%
9. **Protective Services**: 22.2%

#### **✅ LOW RISK (<20% fraud rate)**
10. **Professional Specialty**: 21.2%
11. **Handlers/Cleaners**: 20.4%
12. **Administrative/Clerical**: 16.9%
13. **Other Service**: 16.9%
14. **Private Household Service**: 16.9%

## Strategic Implications

### **Risk-Based Processing**
- **Executive/Managerial claims** require enhanced financial investigation and documentation
- **Manual labor occupations** need careful accident scene analysis and medical verification
- **Administrative roles** can follow standard processing due to low fraud probability

### **Investigation Focus Areas**
- **High-income professionals**: Look for financial motive and sophisticated schemes
- **Physical labor jobs**: Verify injury mechanisms and workplace safety records
- **Transportation workers**: Cross-reference vehicle records and route documentation

### **Fraud Prevention Strategies**
- Implement **occupation-specific risk scores** in claim processing systems
- Develop **targeted verification protocols** for high-risk occupation categories
- Consider **industry expertise** when assigning claims investigators

**Average fraud rate across all occupations: 24.9%**

In [18]:
#finally, lets take a look at insured education level against fraud reported

pd.crosstab(df['insured_education_level'], df['fraud_reported'], normalize = 'index')

fraud_reported,No,Yes
insured_education_level,Unnamed: 1_level_1,Unnamed: 2_level_1
Associate,0.765517,0.234483
College,0.737705,0.262295
High School,0.775,0.225
JD,0.73913,0.26087
MD,0.736111,0.263889
Masters,0.776224,0.223776
PhD,0.736,0.264


# Summary: Fraud Rates by Insured Education Level

## Key Findings

### **Education Level Fraud Distribution**
- **PhD**: **26.4%** fraud rate - *Highest among advanced degrees*
- **College**: **26.2%** fraud rate - *Highest overall*
- **MD**: **26.4%** fraud rate 
- **JD**: **26.1%** fraud rate
- **Associate**: **23.4%** fraud rate
- **Masters**: **22.4%** fraud rate - *Lowest overall*
- **High School**: **22.5%** fraud rate

## Education-Based Risk Analysis

### **Surprising Pattern: Inverse Education-Fraud Relationship**
Contrary to expectations, **higher education levels show elevated fraud rates**:
- **Advanced Professional Degrees (PhD, MD, JD)**: ~26% fraud rate
- **Undergraduate Education (College)**: 26.2% fraud rate  
- **Graduate Degrees (Masters)**: 22.4% fraud rate
- **Basic Education (High School, Associate)**: ~22-23% fraud rate

### **Risk Tier Classification**

#### **🔴 HIGHEST RISK (>26% fraud rate)**
1. **PhD**: 26.4%
2. **College**: 26.2% 
3. **MD**: 26.4%

#### **🟡 ELEVATED RISK (25-26% fraud rate)**
4. **JD**: 26.1%

#### **🟢 MODERATE RISK (22-24% fraud rate)**
5. **Associate**: 23.4%
6. **High School**: 22.5%
7. **Masters**: 22.4%

## Strategic Insights

### **Counter-Intuitive Findings**
- **PhD holders** show the **highest fraud propensity** at 26.4%
- **Masters degree holders** show the **lowest fraud rate** at 22.4%
- Only **4% difference** between highest and lowest education levels (much smaller than occupation variance)

### **Professional Degree Analysis**
- **Medical Doctors (MD)**: 26.4% - May exploit medical knowledge for fraudulent claims
- **Lawyers (JD)**: 26.1% - Legal expertise could enable sophisticated fraud schemes
- **PhD holders**: 26.4% - Academic training may facilitate complex deception strategies

### **Education Paradox Explained**
Possible reasons for higher fraud among educated individuals:
1. **Sophistication**: Advanced education enables more complex fraud schemes
2. **Overconfidence**: Higher education may lead to belief they can outsmart detection systems
3. **Financial Pressure**: Higher education often correlates with higher lifestyle expectations and debt
4. **Knowledge Advantage**: Understanding of insurance processes and legal loopholes

### **Risk Assessment Implications**

#### **Low Variability**: 
- Education shows **much less predictive power** than occupation or time-of-day
- **4% range** vs **20% range** for occupation
- Should be considered a **secondary risk factor**

#### **Investigation Strategy**:
- **PhD/MD/JD claims**: Look for overly sophisticated explanations or medical/legal technicalities
- **College graduates**: Standard enhanced verification
- **Masters/High School**: Normal processing protocols

### **Fraud Prevention Recommendations**
- **Education should NOT be primary risk factor** due to small variance
- Focus resources on **occupation and temporal patterns** instead
- Consider education as **fraud sophistication indicator** rather than likelihood predictor
- Train investigators to recognize **credential-based manipulation tactics**

**Average fraud rate across all education levels: 24.6%**

### **Key Takeaway**
Education level shows **minimal predictive value** for fraud likelihood but may indicate **fraud complexity and sophistication** when fraud does occur.