# ICIS Claim Data Processing Tutorial

## Step-by-Step Guide for ICIS Claim Data Processing

**Author**: Seokhoon Joo  

## Table of Contents
* [1. Setup and Data Loading](#1.-setup-and-data-loading)
    * [1.1 Import Required Libraries](#1.1-Import-Required-Libraries)
    * [1.2 Load ICIS Claim Data](#1.2-Load-ICIS-Claim-Data)
    * [1.3 Load Main Disease Classification Data](#1.3-Load-Main-Disease-Classification-Data)
    * [1.4 Initialize ICIS Processor](#1.4-Initialize-ICIS-Processor)
* [2. Step-by-Step Processing](#2.-Step-by-Step-Processing)
    * [2.1 Data Validation](#2.1-Data-Validation)
    * [2.2 Data Cleansing](#2.2-Data-Cleansing)
    * [2.3 Data Preparation](#2.3-Data-Preparation)
    * [2.4 Data Calculations](#2.4-Data-Calculations)
    * [2.5 Merge Calculated Data](#2.5-Merge-Calculated-Data)
* [3. Complete Pipeline Processing](#3.-Complete-Pipeline-Processing)
    * [3.1 Pipeline Execution](#3.1-Pipeline-Execution)
    * [3.2 Results Validation](#3.2-Results-Validation)
    * [Appendix: Error Handling](#Appendix:-Error-Handling)

## 1. Setup and Data Loading

### 1.1 Import Required Libraries

In [2]:
import pandas as pd
from underwriter.icis import ICIS

### 1.2 Load ICIS Claim Data

In [3]:
claim = pd.read_csv('data/claim.csv')
print("Initial claim data:")
print("Shape:", claim.shape)
print("\nColumns:", claim.columns.tolist())
print("\nFirst few rows:")
display(claim.head())

Initial claim data:
Shape: (12, 14)

Columns: ['id', 'kcd0', 'kcd1', 'kcd2', 'kcd3', 'kcd4', 'inq_date', 'clm_date', 'hos_sdate', 'hos_edate', 'hos_day', 'hos_cnt', 'out_cnt', 'sur_cnt']

First few rows:


Unnamed: 0,id,kcd0,kcd1,kcd2,kcd3,kcd4,inq_date,clm_date,hos_sdate,hos_edate,hos_day,hos_cnt,out_cnt,sur_cnt
0,100000001,,C50,,,,20250101,20150102,20150102.0,20150102.0,0,0,1,0
1,100000001,,M51,,C44,,20250101,20150102,20150102.0,20150108.0,4,1,0,0
2,100000001,,M51,,C44,,20250101,20150102,20150102.0,20150108.0,4,1,0,0
3,100000001,S33,G551,,,,20250101,20150102,20150102.0,20150105.0,0,2,2,0
4,100000001,M512,,,,,20250101,20200901,,,0,0,0,2


### 1.3 Load Main Disease Classification Data

In [4]:
main = pd.read_csv('data/main.csv')
print("\nMain disease classification data:")
print("Shape:", main.shape)
print("\nColumns:", main.columns.tolist())
print("\nFirst few rows:")
display(main.head())


Main disease classification data:
Shape: (11, 3)

Columns: ['kcd', 'kcd_main', 'sub_chk']

First few rows:


Unnamed: 0,kcd,kcd_main,sub_chk
0,C50,C50,1
1,C73,C73,1
2,D12,D12,1
3,K20,K20,0
4,M51,M51,1


### 1.4 Initialize ICIS Processor

In [5]:
icis = ICIS(claim=claim, main=main)

## 2. Step-by-Step Processing

### 2.1 Data Validation

In [6]:
print("\n2.1 Data Validation")
print("-----------------")
try:
    icis.validate_columns()
    print("✓ Column validation successful")
except ValueError as e:
    print(f"✗ Validation error: {e}")


2.1 Data Validation
-----------------
✓ Column validation successful


### 2.2 Data Cleansing

In [7]:
print("\n2.2 Data Cleansing")
print("----------------")

print("• Initial claim data shape:", icis.claim.shape)
display(icis.claim.head())

print("\n1) Removing duplicates...")
icis.drop_duplicates()
print("• Shape after deduplication:", icis.claim.shape)
display(icis.claim.head())

print("\n2) Forward filling KCD codes...")
icis.fill_kcd_forward()
print("• Shape after forward fill:", icis.filled.shape)
display(icis.filled.head())


2.2 Data Cleansing
----------------
• Initial claim data shape: (12, 14)


Unnamed: 0,id,kcd0,kcd1,kcd2,kcd3,kcd4,inq_date,clm_date,hos_sdate,hos_edate,hos_day,hos_cnt,out_cnt,sur_cnt
0,100000001,,C50,,,,2025-01-01,2015-01-02,2015-01-02,2015-01-02,0,0,1,0
1,100000001,,M51,,C44,,2025-01-01,2015-01-02,2015-01-02,2015-01-08,4,1,0,0
2,100000001,,M51,,C44,,2025-01-01,2015-01-02,2015-01-02,2015-01-08,4,1,0,0
3,100000001,S33,G551,,,,2025-01-01,2015-01-02,2015-01-02,2015-01-05,0,2,2,0
4,100000001,M512,,,,,2025-01-01,2020-09-01,NaT,NaT,0,0,0,2



1) Removing duplicates...
• Shape after deduplication: (11, 14)


Unnamed: 0,id,kcd0,kcd1,kcd2,kcd3,kcd4,inq_date,clm_date,hos_sdate,hos_edate,hos_day,hos_cnt,out_cnt,sur_cnt
0,100000001,,C50,,,,2025-01-01,2015-01-02,2015-01-02,2015-01-02,0,0,1,0
1,100000001,,M51,,C44,,2025-01-01,2015-01-02,2015-01-02,2015-01-08,4,1,0,0
2,100000001,S33,G551,,,,2025-01-01,2015-01-02,2015-01-02,2015-01-05,0,2,2,0
3,100000001,M512,,,,,2025-01-01,2020-09-01,NaT,NaT,0,0,0,2
4,100000001,S33,M54,M513,,,2025-01-01,2022-08-02,2022-08-02,2022-08-06,5,0,0,0



2) Forward filling KCD codes...
• Shape after forward fill: (11, 14)


Unnamed: 0,id,kcd0,kcd1,kcd2,kcd3,kcd4,inq_date,clm_date,hos_sdate,hos_edate,hos_day,hos_cnt,out_cnt,sur_cnt
0,100000001,C50,,,,,2025-01-01,2015-01-02,2015-01-02,2015-01-02,0,0,1,0
1,100000001,M51,C44,,,,2025-01-01,2015-01-02,2015-01-02,2015-01-08,4,1,0,0
2,100000001,S33,G551,,,,2025-01-01,2015-01-02,2015-01-02,2015-01-05,0,2,2,0
3,100000001,M512,,,,,2025-01-01,2020-09-01,NaT,NaT,0,0,0,2
4,100000001,S33,M54,M513,,,2025-01-01,2022-08-02,2022-08-02,2022-08-06,5,0,0,0


### 2.3 Data Preparation

In [8]:
print("\n2.3 Data Preparation")
print("------------------")

print("1) Setting medical care types...")
icis.set_type()
print("• Data with medical care types:")
display(icis.filled[['id', 'clm_date', 'type']].head())

print("\n2) Modifying hospital end dates...")
icis.set_hos_edate_mod()
print("• Data with modified hospital end dates:")
display(icis.filled[['id', 'hos_edate', 'hos_edate_mod']].head())

print("\n3) Converting to long format...")
icis.melt()
print("• Melted data shape:", icis.melted.shape)
display(icis.melted.head())

print("\n4) Processing KCD information...")
icis.set_sub_kcd()
icis.merge_main_info()
icis.filter_sub_kcd()
print("• Shape after KCD processing:", icis.melted.shape)
display(icis.melted.head())


2.3 Data Preparation
------------------
1) Setting medical care types...
• Data with medical care types:


Unnamed: 0,id,clm_date,type
0,100000001,2015-01-02,out
1,100000001,2015-01-02,hos
2,100000001,2015-01-02,out
3,100000001,2020-09-01,sur
4,100000001,2022-08-02,hos



2) Modifying hospital end dates...
• Data with modified hospital end dates:


Unnamed: 0,id,hos_edate,hos_edate_mod
0,100000001,2015-01-02,2015-01-02
1,100000001,2015-01-08,2015-01-05
2,100000001,2015-01-05,2015-01-05
3,100000001,NaT,NaT
4,100000001,2022-08-06,2022-08-06



3) Converting to long format...
• Melted data shape: (23, 13)


Unnamed: 0,id,inq_date,clm_date,hos_sdate,hos_edate,hos_edate_mod,hos_day,hos_cnt,out_cnt,sur_cnt,type,kcd_ord,kcd
0,100000001,2025-01-01,2015-01-02,2015-01-02,2015-01-02,2015-01-02,0,0,1,0,out,0,C50
1,100000001,2025-01-01,2015-01-02,2015-01-02,2015-01-08,2015-01-05,4,1,0,0,hos,0,M51
2,100000001,2025-01-01,2015-01-02,2015-01-02,2015-01-05,2015-01-05,0,2,2,0,out,0,S33
3,100000001,2025-01-01,2020-09-01,NaT,NaT,NaT,0,0,0,2,sur,0,M512
4,100000001,2025-01-01,2022-08-02,2022-08-02,2022-08-06,2022-08-06,5,0,0,0,hos,0,S33



4) Processing KCD information...
• Shape after KCD processing: (21, 16)


Unnamed: 0,id,inq_date,clm_date,hos_sdate,hos_edate,hos_edate_mod,hos_day,hos_cnt,out_cnt,sur_cnt,type,kcd_ord,kcd,sub_kcd,kcd_main,sub_chk
0,100000001,2025-01-01,2015-01-02,2015-01-02,2015-01-02,2015-01-02,0,0,1,0,out,0,C50,0,C50,1.0
1,100000001,2025-01-01,2015-01-02,2015-01-02,2015-01-08,2015-01-05,4,1,0,0,hos,0,M51,0,M51,1.0
2,100000001,2025-01-01,2015-01-02,2015-01-02,2015-01-05,2015-01-05,0,2,2,0,out,0,S33,0,S33,1.0
3,100000001,2025-01-01,2020-09-01,NaT,NaT,NaT,0,0,0,2,sur,0,M512,0,M51,1.0
4,100000001,2025-01-01,2022-08-02,2022-08-02,2022-08-06,2022-08-06,5,0,0,0,hos,0,S33,0,S33,1.0


### 2.4 Data Calculations

In [8]:
print("\n2.4 Data Calculations")
print("------------------")

print("1) Setting date ranges...")
icis.set_date_range()

print("\n2) Calculating hospitalization days...")
icis.calc_hos_day()
print("• Hospitalized data shape:", icis.hospitalized.shape)
display(icis.hospitalized.head())

print("\n3) Calculating surgery counts...")
icis.calc_sur_cnt()
print("• Surgery data shape:", icis.underwent.shape)
display(icis.underwent.head())

print("\n4) Calculating elapsed days...")
icis.calc_elp_day()
print("• Elapsed days data shape:", icis.elapsed.shape)
display(icis.elapsed.head())


2.4 Data Calculations
------------------
1) Setting date ranges...

2) Calculating hospitalization days...
• Hospitalized data shape: (4, 3)


Unnamed: 0,id,kcd_main,hos_day
0,100000001,D12,1
1,100000001,M51,8
2,100000001,M54,5
3,100000001,S33,5



3) Calculating surgery counts...
• Surgery data shape: (1, 3)


Unnamed: 0,id,kcd_main,sur_cnt
0,100000001,M51,2



4) Calculating elapsed days...
• Elapsed days data shape: (7, 4)


Unnamed: 0,id,kcd_main,elp_day_si,elp_day_std
0,100000001,C50,,3653.0
1,100000001,C73,,246.0
2,100000001,D12,882.0,882.0
3,100000001,M51,328.0,246.0
4,100000001,M54,880.0,880.0


### 2.5 Merge Calculated Data

In [9]:
print("\n2.5 Final Merge")
print("-------------")
step_result = icis.merge_calculated()
print("• Final result shape:", step_result.shape)
print("• Final columns:", step_result.columns.tolist())
display(step_result.head())


2.5 Final Merge
-------------
• Final result shape: (7, 6)
• Final columns: ['id', 'kcd_main', 'hos_day', 'sur_cnt', 'elp_day_si', 'elp_day_std']


Unnamed: 0,id,kcd_main,hos_day,sur_cnt,elp_day_si,elp_day_std
0,100000001,C50,0.0,0.0,,3653.0
1,100000001,C73,0.0,0.0,,246.0
2,100000001,D12,1.0,0.0,882.0,882.0
3,100000001,M51,8.0,2.0,328.0,246.0
4,100000001,M54,5.0,0.0,880.0,880.0


## 3. Complete Pipeline Processing

### 3.1 Pipeline Execution

In [10]:
print("\n3.1 Pipeline Execution")
print("--------------------")

# Initialize new ICIS instance with today's date as inq_date
icis_pipeline = ICIS(claim=claim, main=main)

try:
    # Process ICIS claim data using complete pipeline
    print("Processing ICIS claim data using icis.process()...")
    pipeline_result = icis_pipeline.process()
    print("\n✓ Processing completed successfully!")
    print("• Final result shape:", pipeline_result.shape)
    print("\nFirst few rows of the result:")
    display(pipeline_result.head())
except Exception as e:
    print(f"\n✗ Processing failed: {str(e)}")


3.1 Pipeline Execution
--------------------
Processing ICIS claim data using icis.process()...

✓ Processing completed successfully!
• Final result shape: (7, 6)

First few rows of the result:


Unnamed: 0,id,kcd_main,hos_day,sur_cnt,elp_day_si,elp_day_std
0,100000001,C50,0.0,0.0,,3653.0
1,100000001,C73,0.0,0.0,,246.0
2,100000001,D12,1.0,0.0,882.0,882.0
3,100000001,M51,8.0,2.0,328.0,246.0
4,100000001,M54,5.0,0.0,880.0,880.0


### 3.2 Results Comparison

In [11]:
print("\n3.2 Results Validation")
print("--------------------")
# Compare results
print("\nResults Comparison:")
print("• Step-by-step shape:", step_result.shape)
print("• Pipeline shape:", pipeline_result.shape)

are_equal = step_result.equals(pipeline_result)
print(f"\n✓ Results are identical: {are_equal}")

if not are_equal:
    print("\nDifferences in columns:")
    print(set(step_result.columns) ^ set(pipeline_result.columns))


3.2 Results Validation
--------------------

Results Comparison:
• Step-by-step shape: (7, 6)
• Pipeline shape: (7, 6)

✓ Results are identical: True


### Appendix: Error Handling

In [12]:
print("\nAppendix: Error Handling")
print("----------------")
# Example of error handling with invalid data
print("Testing error handling with invalid input...")

try:
    # Create invalid data for testing
    invalid_claim = claim.drop(columns=['id'])
    invalid_icis = ICIS(claim=invalid_claim, main=main)
    invalid_result = invalid_icis.process()
except ValueError as e:
    print(f"\n✓ Validation error caught successfully: {e}")
except RuntimeError as e:
    print(f"\n✓ Processing error caught successfully: {e}")
except Exception as e:
    print(f"\n✓ Unexpected error caught successfully: {e}")


Appendix: Error Handling
----------------
Testing error handling with invalid input...

✓ Validation error caught successfully: Missing required columns in claim DataFrame: ['id']
Required columns: ['clm_date', 'hos_cnt', 'hos_day', 'hos_edate', 'hos_sdate', 'id', 'inq_date', 'kcd0', 'kcd1', 'kcd2', 'kcd3', 'kcd4', 'out_cnt', 'sur_cnt']
