# Calculating Metrics for Assessing Mortality Data Quality

The following code demonstrates some simple examples of calculating metrics that are part of the Data Quality Assessment Framework and  Jurisdictional Playbook.

## Calculating Unsuitable Underlying Cause of Death

The following code

1. Imports some Python libraries
2. Loads some synthetic sample data from a CSV
3. Loads a list of unsuitable underlying cause of death codes
4. Demonstrates how the underlying cause of death codes in sample data can be evaluated against the list of unsuitable underlying causes of death

In [None]:
import os
import pandas as pd
from IPython.display import display, HTML

# Load the death records data, making sure we handle N/A values in a usable way
death_records = pd.read_csv("./data/NotionalDeathRecordData.csv", keep_default_na=False, na_values=[""])
display(death_records)

In [None]:
# Load the unsuitable causes of death data
unsuitable_causes = pd.read_csv("./data/unsuitable_COD_codes.csv")
display(unsuitable_causes)

In [None]:
# Extract the unsuitable codes
unsuitable_codes = unsuitable_causes["code"].values

# Function to check if any unsuitable code is a prefix to the code in the record
def is_unsuitable(code):
    return any(code.startswith(unsuitable) for unsuitable in unsuitable_codes)

# Create a new column that is True when the underlying COD is unsuitable
death_records["Unsuitable Underlying"] = death_records["Underlying COD"].apply(is_unsuitable)
display(death_records)

In [None]:
# Calculate the proportion of records with an unsuitable underlying cause of death and print
proportion = death_records["Unsuitable Underlying"].mean()
print(f"The proportion of records with an unsuitable underlying cause of death is {proportion:.2f}")

In [None]:
# Group the records by certifier and calculate the proportion of unsuitable records for each certifier and print
certifier_proportions = death_records.groupby("Certifier Name", as_index=False)["Unsuitable Underlying"].mean()
display(certifier_proportions)

In [None]:
# Filter for certifiers that have unsuitable underlying causes in records they've reported
filtered = certifier_proportions[certifier_proportions['Unsuitable Underlying'] > 0.0]
display(filtered)

In [None]:
# Print the certifier names in the resulting set; this could drive an automated notification process
for certifier_name in filtered['Certifier Name']:
    print(certifier_name)