# The Cost of Being Black While Giving Birth

## Inspect and Clean Data

### Data Source
This dataset was downloaded from CDC WONDER (Multiple Cause of Death, 1999–2022), filtered for ICD‑10 codes O00–O99 (pregnancy, childbirth, and the puerperium).  
Source: https://wonder.cdc.gov/mcd.html

In [None]:

import pandas as pd

# Load the CSV (WONDER files usually need skiprows=1)
df = pd.read_csv(r'\maternal-mortality\data\Underlying Cause of Death, 1999-2020.csv')

# Preview
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1276 entries, 0 to 1275
Data columns (total 12 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   Notes                       84 non-null     object 
 1   Race                        1192 non-null   object 
 2   Race Code                   1192 non-null   object 
 3   State                       1192 non-null   object 
 4   State Code                  1192 non-null   float64
 5   Year                        1192 non-null   float64
 6   Year Code                   1192 non-null   float64
 7   ICD-10 113 Cause List       1192 non-null   object 
 8   ICD-10 113 Cause List Code  1192 non-null   object 
 9   Deaths                      1192 non-null   float64
 10  Population                  1192 non-null   float64
 11  Crude Rate                  1192 non-null   object 
dtypes: float64(5), object(7)
memory usage: 119.8+ KB


In [41]:
df.head()

Unnamed: 0,Notes,Race,Race Code,State,State Code,Year,Year Code,ICD-10 113 Cause List,ICD-10 113 Cause List Code,Deaths,Population,Crude Rate
0,,Asian or Pacific Islander,A-PI,California,6.0,2004.0,2004.0,"#Pregnancy, childbirth and the puerperium (O00...",GR113-105,15.0,4750147.0,Unreliable
1,,Asian or Pacific Islander,A-PI,California,6.0,2004.0,2004.0,"Other complications of pregnancy, childbirth a...",GR113-107,15.0,4750147.0,Unreliable
2,,Asian or Pacific Islander,A-PI,California,6.0,2005.0,2005.0,"#Pregnancy, childbirth and the puerperium (O00...",GR113-105,11.0,4884264.0,Unreliable
3,,Asian or Pacific Islander,A-PI,California,6.0,2005.0,2005.0,"Other complications of pregnancy, childbirth a...",GR113-107,11.0,4884264.0,Unreliable
4,,Asian or Pacific Islander,A-PI,California,6.0,2006.0,2006.0,"#Pregnancy, childbirth and the puerperium (O00...",GR113-105,11.0,5013374.0,Unreliable


In [42]:
# Remove Footnotes and Notes Column

df = df.drop(columns=['Notes'])

In [43]:
# Clean Column Names

df.columns = (
    df.columns
    .str.strip()
    .str.lower()
    .str.replace(" ", "_")
    .str.replace(r"[^a-zA-Z0-9_]", "", regex=True)
)

In [45]:
# Clean Crude_Rate Column

df['crude_rate'] = pd.to_numeric(df['crude_rate'], errors='coerce')
df['crude_rate'].dtype

dtype('float64')

In [47]:
# Update Year Column to be Datetime

df['year'] = df['year'].astype('Int64')
df['year'].dtype

Int64Dtype()

In [48]:
# Remove rows with missing death counts

df = df.dropna(subset=['deaths'])

In [49]:
# Inspect DataFrame after cleaning

df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1192 entries, 0 to 1191
Data columns (total 11 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   race                       1192 non-null   object 
 1   race_code                  1192 non-null   object 
 2   state                      1192 non-null   object 
 3   state_code                 1192 non-null   float64
 4   year                       1192 non-null   Int64  
 5   year_code                  1192 non-null   float64
 6   icd10_113_cause_list       1192 non-null   object 
 7   icd10_113_cause_list_code  1192 non-null   object 
 8   deaths                     1192 non-null   float64
 9   population                 1192 non-null   float64
 10  crude_rate                 488 non-null    float64
dtypes: Int64(1), float64(5), object(5)
memory usage: 112.9+ KB


In [50]:
df.head()

Unnamed: 0,race,race_code,state,state_code,year,year_code,icd10_113_cause_list,icd10_113_cause_list_code,deaths,population,crude_rate
0,Asian or Pacific Islander,A-PI,California,6.0,2004,2004.0,"#Pregnancy, childbirth and the puerperium (O00...",GR113-105,15.0,4750147.0,
1,Asian or Pacific Islander,A-PI,California,6.0,2004,2004.0,"Other complications of pregnancy, childbirth a...",GR113-107,15.0,4750147.0,
2,Asian or Pacific Islander,A-PI,California,6.0,2005,2005.0,"#Pregnancy, childbirth and the puerperium (O00...",GR113-105,11.0,4884264.0,
3,Asian or Pacific Islander,A-PI,California,6.0,2005,2005.0,"Other complications of pregnancy, childbirth a...",GR113-107,11.0,4884264.0,
4,Asian or Pacific Islander,A-PI,California,6.0,2006,2006.0,"#Pregnancy, childbirth and the puerperium (O00...",GR113-105,11.0,5013374.0,


In [None]:
# Export cleaned data to new CSV
df.to_csv(r'\maternal-mortality\data\cleaned_maternal_mortality.csv', index=False)