
# Case Study: Policyholder Data Transformation in Insurance Industry using numpy

## Overview:
An insurance company stores policyholder data, including policy number, premium amount, insured amount, and claim amount.

Due to a data migration issue, the "Premium Amount" and "Claim Amount" columns have been swapped.

Your task is to correct this misalignment using NumPy.

## Learning Objectives:
- Understand how to manipulate structured data using **NumPy**.
- Perform **column swapping** in a NumPy array.
- Gain insights into **real-world insurance data handling**.

In [1]:
import numpy as np
import pandas as pd

In [2]:
# Step 1: Create a sample dataset representing insurance policyholder records

# Columns: Policy Number, Premium Amount (Incorrect), Insured Amount, Claim Amount (Incorrect)

policy_data = np.array([
    [1001, 12000, 500000, 5000],
    [1002, 15000, 750000, 7000],
    [1003, 18000, 1000000, 10000],
    [1004, 20000, 1200000, 12000],
    [1005, 25000, 1500000, 15000]
])

col = ["Policy Number", "Premium Amount", "Insured Amount", "Claim Amount"]

# Print Original Dataset
print("Original Policyholder Data:")
print()
df_before = pd.DataFrame(policy_data, columns=col)
print(df_before)

Original Policyholder Data:

   Policy Number  Premium Amount  Insured Amount  Claim Amount
0           1001           12000          500000          5000
1           1002           15000          750000          7000
2           1003           18000         1000000         10000
3           1004           20000         1200000         12000
4           1005           25000         1500000         15000


In [3]:
# Step 2: Swap the columns (Premium Amount and Claim Amount)

# Column indices: 1 (Premium Amount) and 3 (Claim Amount)

policy_data[:, [1, 3]] = policy_data[:, [3, 1]]

# We use NumPy slicing (`[:, [1, 3]] = [:, [3, 1]]`) to swap the misplaced columns efficiently.

# Print Corrected Dataset

print("\nCorrected Policyholder Data (After Swapping Premium and Claim Amount Columns):")
print()

df_after = pd.DataFrame(policy_data, columns=col)
print(df_after)




Corrected Policyholder Data (After Swapping Premium and Claim Amount Columns):

   Policy Number  Premium Amount  Insured Amount  Claim Amount
0           1001            5000          500000         12000
1           1002            7000          750000         15000
2           1003           10000         1000000         18000
3           1004           12000         1200000         20000
4           1005           15000         1500000         25000


In [4]:
## Real-World Impact:
## - Insurance companies use large datasets for premium calculations and claim processing.
## - Incorrect column placements can lead to incorrect financial reporting.
## - This technique ensures **data accuracy** before feeding it into analytics or decision-making systems.


In [5]:
# Additional Task: Calculate Total Claim Amount Processed
total_claim_amount = np.sum(policy_data[:, 3])
print(f"\nTotal Claim Amount Processed: Rs {total_claim_amount}")



Total Claim Amount Processed: Rs 90000
