# Maternal Morbidity ICD Codes 



@Author: Georgia Liu 
@Last update: 2024-04-24

This code finds a crosswalk of Maternal Morbidity (MM) on ICD10 and CCSR code and provide a subset of ICD10-CCSR reference code on Maternal Morbidity. We will use this subset for finding datasets on HCUD and MEPS database related on MM. 

It is to note that Maternal Morbidity is different from "Severe Maternal Morbidity" and "Maternal Mortality" which have their own specific ICD codes.

The steps are:
1. Get the unique ICD codes on Maternal Morbidity that is listed by the World Health Organization (WHO): https://app.box.com/file/1512322912485 (Chapter15_ICD_MM2012.xlsx)
2. Crosswalk the WHO_ICD code on Maternal Morbidity to the Clinical Classifications Software Refined (CCSR) code for ICD-10-CM Diagnoses (HCUP CCSR codes) using the mapping file: https://app.box.com/file/1506401379868 
3. Get a subset of the DXCCSR_Reference-File-v2023-1.xlsx that contains only the matched Maternal Morbidity ICD codes and their corresponding CCSR codes: Output/DX_ccsr_matched_on_MM.csv 

Detailed description of the HCUP, MEPS and ICD10 datasets summary: https://app.box.com/file/1506475220630?s=jk3qt0g8fdrauixxgp2vc2yffwo84u3u


In [435]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import math
from thefuzz import process, fuzz


# Read ICD code on Severe Maternal Morbidity 


In [349]:
df_mm = pd.read_excel("Dataset/Chapter15_ICD_MM2012.xlsx", sheet_name='Obstetric codes', header=None, dtype=str)

df_mm = df_mm.iloc[10:].reset_index(drop=True)

new_header = df_mm.iloc[0] #grab the first row for the header
df_mm = df_mm[1:] #take the data less the header row
df_mm.columns = new_header #set the header row as the df header

df_mm.head(5)

list_dx = df_mm['ICD-10 Code,      2010 edition'].to_list()


In [378]:
list_dx = [x for x in list_dx if not (isinstance(x, float) and math.isnan(x))]

list_dx = [element.replace(".", "") for element in list_dx]
list_dx = [element.replace(" ", "") for element in list_dx]

# Delete duplicate 
list_dx = list(set(list_dx)) 

# list_dx = [value for value in list_dx if value not in values_to_remove]

list_dx.sort()
# print(list_dx)
print("Total number of ICD codes for Maternal Morbidity (ICD-MM):", len(list_dx))


Total number of ICD codes for Maternal Morbidity (ICD-MM): 503


# Read HCUP CCSR

In [369]:
df_dx_ccsr = pd.read_excel("Dataset/HCUP-CCSR/DXCCSR-Reference-File-v2024-1.xlsx", sheet_name="DX_to_CCSR_Mapping", header=None).iloc[1:].reset_index(drop=True)

new_header = df_dx_ccsr.iloc[0] #grab the first row for the header
df_dx_ccsr = df_dx_ccsr[1:] #take the data less the header row
df_dx_ccsr.columns = new_header #set the header row as the df header

df_dx_ccsr

Unnamed: 0,ICD-10-CM Code,ICD-10-CM Code Description,CCSR Category,CCSR Category Description,Inpatient Default CCSR (Y/N/X),Outpatient Default CCSR (Y/N/X),Rationale for Default Assignment
1,A000,"Cholera due to Vibrio cholerae 01, biovar chol...",DIG001,Intestinal infection,Y,Y,06 Infectious conditions
2,A000,"Cholera due to Vibrio cholerae 01, biovar chol...",INF003,Bacterial infections,N,N,06 Infectious conditions
3,A001,"Cholera due to Vibrio cholerae 01, biovar eltor",DIG001,Intestinal infection,Y,Y,06 Infectious conditions
4,A001,"Cholera due to Vibrio cholerae 01, biovar eltor",INF003,Bacterial infections,N,N,06 Infectious conditions
5,A009,"Cholera, unspecified",DIG001,Intestinal infection,Y,Y,06 Infectious conditions
...,...,...,...,...,...,...,...
86852,Z992,Dependence on renal dialysis,FAC025,Other specified status,X,Y,99 Unacceptable PDX
86853,Z993,Dependence on wheelchair,FAC025,Other specified status,X,Y,99 Unacceptable PDX
86854,Z9981,Dependence on supplemental oxygen,FAC025,Other specified status,X,Y,99 Unacceptable PDX
86855,Z9989,Dependence on other enabling machines and devices,FAC025,Other specified status,X,Y,99 Unacceptable PDX


In [370]:
list_ccsr = df_dx_ccsr['ICD-10-CM Code'].sort_values(ascending=True).to_list()

# Delete duplicate 
list_ccsr = list(set(list_ccsr))

print("Total number of ICD codes on HCCUP:", len(list_ccsr))

Total number of ICD codes on HCCUP: 74987



# Cross-Matching DX code on Maternal Mobidity for CCSR (HCUP)

In [379]:
set_dx = set(list_dx)
set_ccsr = set(list_ccsr)

def get_matched_pairs(set_dx, set_ccsr):
    """
    Create a list of matched pairs from the intersections between two sets.

    Args:
        set_dx (set): A set of elements.
        set_ccs (set): Another set of elements.

    Returns:
        list: A list of tuples, where each tuple represents a matched pair.
    """
    intersection = set_dx.intersection(set_ccsr)
    matched_pairs = [(x, x) for x in intersection]
    return matched_pairs

matched_pairs = get_matched_pairs(set_dx, set_ccsr)
matched_list = [item for pair in matched_pairs for item in pair]

# matched_list
print("Numbers of exact matches for Maternal Morbidity ICD-10 codes and HCUP CCSR codes:", len(matched_list))


Numbers of exact matches for Maternal Morbidity ICD-10 codes and HCUP CCSR codes: 328


In [417]:
df_matched_pairs = pd.DataFrame(matched_list, columns=['Maternal Morbidity Code from WHO'])
df_matched_pairs['Matched DX code for CCSR']= df_matched_pairs['Maternal Morbidity Code from WHO'] 
df_matched_pairs.head(5)
# Export this list to a csv file 
# df_matched_pairs.to_csv("Output/matched_pairs_DX_MM.csv", index=False)

Unnamed: 0,Maternal Morbidity Code from WHO,Matched DX code for CCSR
0,O331,O331
1,O331,O331
2,O679,O679
3,O679,O679
4,O653,O653


In [383]:
list_dx_new = [value for value in list_dx if value not in matched_list]
print("Numbers of Maternal Morbidity ICD-10 codes not fully matched:", len(list_dx_new))
list_ccsr_new = [value for value in list_ccsr if value not in matched_list]
print("Numbers of CCSR ICD-10 codes not fully matched:", len(list_ccsr_new))

Numbers of Maternal Morbidity ICD-10 codes not fully matched: 339
Numbers of CCSR ICD-10 codes not fully matched: 74823


In [409]:
# Cross-Matching remaining DX code on Maternal Mobidity for CCSR (HCUP) with partial match

list_dx_new = [value for value in list_dx if value not in matched_list]

list_ccsr_new = [value for value in list_ccsr if value not in matched_list]

list_dx_semi_matched = []
list_ccsr_semi_matched = []

for mm_value in list_dx_new:

    for ccsr_code in list_ccsr_new:
        if (len(mm_value) == 3): 
            if mm_value[:3] == ccsr_code[:3]:
                list_dx_semi_matched.append(mm_value)
                list_ccsr_semi_matched.append(ccsr_code)

            
        elif (len(mm_value) == 4): 
            if (mm_value[:4] == ccsr_code[:4]):
                list_dx_semi_matched.append(mm_value)
                list_ccsr_semi_matched.append(ccsr_code)
        else: 
            None
                    

In [411]:
combined_list = list(zip(list_dx_semi_matched, list_ccsr_semi_matched))
df_semi_matched_pairs = pd.DataFrame(combined_list, columns=['Maternal Morbidity Code','Matched CCSR Code'])

df_semi_matched_pairs.head(5)

Unnamed: 0,Maternal Morbidity Code,Matched CCSR Code
0,O00,O00111
1,O00,O0010
2,O00,O00109
3,O00,O00219
4,O00,O00101


In [422]:
df_matched_pairs = pd.concat([df_matched_pairs,df_semi_matched_pairs], ignore_index = True)
print(df_matched_pairs.shape)

(8528, 4)


In [423]:
df_matched_pairs.to_csv('Output/matched_pairs_DX_MM_final.csv', index=False)

# Get subset of CCSR code on Maternal Morbidity based on partial match

In [430]:
# Merge datasets A and B with crosswalk C
merged_data = pd.merge(df_dx_ccsr, df_matched_pairs, how='right', left_on='ICD-10-CM Code', right_on='Matched DX code for CCSR')


In [434]:
merged_data.to_csv("Output/DX_ccsr_matched_on_MM.csv", index=False)

In [432]:
df_dx_ccsr.shape

(86856, 7)