# Calculating Metrics for Assessing Mortality Data Quality

The following code demonstrates some simple examples of calculating metrics that are part of the Data Quality Assessment Framework and  Jurisdictional Playbook.

## Calculating Unsuitable Underlying Cause of Death

The following code

1. Imports some Python libraries
2. Loads some synthetic sample data from a CSV
3. Loads a list of unsuitable underlying cause of death codes
4. Demonstrates how the underlying cause of death codes in sample data can be evaluated against the list of unsuitable underlying causes of death

In [2]:
import os
import pandas as pd
from IPython.display import display, HTML

# Load the death records data, making sure we handle N/A values in a usable way
death_records = pd.read_csv("./data/NotionalDeathRecordData.csv", keep_default_na=False, na_values=[""])
display(death_records)

Unnamed: 0,Death Record Number,Sex,Date of Death,Race,Age,Hispanic origin,Place of Death,Disposition Place Name,Date Certified,Certifier Type,...,Entity Axis COD 7,Entity Axis Line 8,Entity Axis Sequence 8,Entity Axis COD 8,Entity Axis Line 9,Entity Axis Sequence 9,Entity Axis COD 9,Entity Axis Line 10,Entity Axis Sequence 10,Entity Axis COD 10
0,2024000121,F,1/6/2024,W,98,N,E,Valley Cemetery,1/8/2024,C,...,,,,,,,,,,
1,2024000122,F,1/7/2024,B,72,N,E,Valley Cemetery,1/8/2024,P,...,,,,,,,,,,
2,2024000123,F,1/8/2024,A,84,N,N,Valley Cemetery,1/14/2024,P,...,,,,,,,,,,
3,2024000124,F,1/9/2024,A,90,N,E,Valley Cemetery,1/10/2024,C,...,,,,,,,,,,
4,2024000125,M,1/10/2024,N,28,N,H,Evergreen Crematory,1/12/2024,C,...,,,,,,,,,,
5,2024000126,F,1/11/2024,,88,,R,Memorial Park,1/12/2024,C,...,,,,,,,,,,
6,2024000127,F,1/12/2024,W,83,N,E,National Cemetery,1/13/2024,C,...,,,,,,,,,,
7,2024000128,F,1/13/2024,A,79,N,R,Evergreen Crematory,1/15/2024,C,...,J984,,,,,,,,,
8,2024000129,F,1/14/2024,B,70,N,R,Evergreen Crematory,1/15/2024,,...,E149,6.0,6.0,I429,,,,,,
9,2024000130,M,1/15/2024,N,93,N,H,Hills Cemetery,1/21/2025,M,...,,,,,,,,,,


In [3]:
# Load the unsuitable causes of death data
unsuitable_causes = pd.read_csv("./data/unsuitable_COD_codes.csv")
display(unsuitable_causes)

Unnamed: 0,code,category,display
0,A419,Immediate and intermediate causes,"Sepsis, unspecified organism"
1,A480,Immediate and intermediate causes,Gas gangrene
2,A483,Immediate and intermediate causes,Toxic shock syndrome
3,C77,Immediate and intermediate causes,Secondary and unspecified malignant neoplasm o...
4,C78,Immediate and intermediate causes,Secondary malignant neoplasm of respiratory an...
...,...,...,...
393,R94,Unknown and ill-defined causes,Abnormal results of function studies
394,R96,Unknown and ill-defined causes,"Symptoms, signs and abnormal clinical and labo..."
395,R97,Unknown and ill-defined causes,Abnormal tumor markers
396,R98,Unknown and ill-defined causes,"Symptoms, signs and abnormal clinical and labo..."


In [5]:
# Extract the unsuitable codes
unsuitable_codes = unsuitable_causes["code"].values

# Function to check if any unsuitable code is a prefix to the code in the record
def is_unsuitable(code):
    return any(code.startswith(unsuitable) for unsuitable in unsuitable_codes)

# Create a new column that is True when the underlying COD is unsuitable
death_records["Unsuitable Underlying"] = death_records["Underlying COD"].apply(is_unsuitable)
display(death_records)

Unnamed: 0,Death Record Number,Sex,Date of Death,Race,Age,Hispanic origin,Place of Death,Disposition Place Name,Date Certified,Certifier Type,...,Entity Axis Line 8,Entity Axis Sequence 8,Entity Axis COD 8,Entity Axis Line 9,Entity Axis Sequence 9,Entity Axis COD 9,Entity Axis Line 10,Entity Axis Sequence 10,Entity Axis COD 10,Unsuitable Underlying
0,2024000121,F,1/6/2024,W,98,N,E,Valley Cemetery,1/8/2024,C,...,,,,,,,,,,False
1,2024000122,F,1/7/2024,B,72,N,E,Valley Cemetery,1/8/2024,P,...,,,,,,,,,,False
2,2024000123,F,1/8/2024,A,84,N,N,Valley Cemetery,1/14/2024,P,...,,,,,,,,,,False
3,2024000124,F,1/9/2024,A,90,N,E,Valley Cemetery,1/10/2024,C,...,,,,,,,,,,False
4,2024000125,M,1/10/2024,N,28,N,H,Evergreen Crematory,1/12/2024,C,...,,,,,,,,,,False
5,2024000126,F,1/11/2024,,88,,R,Memorial Park,1/12/2024,C,...,,,,,,,,,,True
6,2024000127,F,1/12/2024,W,83,N,E,National Cemetery,1/13/2024,C,...,,,,,,,,,,False
7,2024000128,F,1/13/2024,A,79,N,R,Evergreen Crematory,1/15/2024,C,...,,,,,,,,,,False
8,2024000129,F,1/14/2024,B,70,N,R,Evergreen Crematory,1/15/2024,,...,6.0,6.0,I429,,,,,,,False
9,2024000130,M,1/15/2024,N,93,N,H,Hills Cemetery,1/21/2025,M,...,,,,,,,,,,True


In [6]:
# Calculate the proportion of records with an unsuitable underlying cause of death and print
proportion = death_records["Unsuitable Underlying"].mean()
print(f"The proportion of records with an unsuitable underlying cause of death is {proportion:.2f}")

The proportion of records with an unsuitable underlying cause of death is 0.20


In [7]:
# Group the records by certifier and calculate the proportion of unsuitable records for each certifier and print
certifier_proportions = death_records.groupby("Certifier Name", as_index=False)["Unsuitable Underlying"].mean()
display(certifier_proportions)

Unnamed: 0,Certifier Name,Unsuitable Underlying
0,Certifier 1,0.0
1,Certifier 2,0.4


In [8]:
# Filter for certifiers that have unsuitable underlying causes in records they've reported
filtered = certifier_proportions[certifier_proportions['Unsuitable Underlying'] > 0.0]
display(filtered)

Unnamed: 0,Certifier Name,Unsuitable Underlying
1,Certifier 2,0.4


In [9]:
# Print the certifier names in the resulting set; this could drive an automated notification process
for certifier_name in filtered['Certifier Name']:
    print(certifier_name)

Certifier 2
