ICD 9 download to clean example (dta)

Mathew Kiang edited this page Sep 7, 2017 · 1 revision

This simple example shows how to use narcan to download an MCOD file in dta format, select only columns necessary (trimming), remove rows we don't want (e.g., non-residents), clean and label ICD-9 codes to be consistent across all years and columns, unite all contributory cause columns (i.e., record_) into a single regexable column, remap the race variable so it is consistent across 1979-2015, and finally add categorical variables.

## Test cleaning and processing an ICD-9 file
library(tidyverse)
library(narcan)

## Download if file doesn't exist
if (!file.exists('./raw_data/mort1998.dta.zip')) {
    download_mcod_dta(1998, './raw_data')
} 

## Load the dta file
mort_1998 <- haven::read_dta('./raw_data/mort1998.dta.zip')

## Trim columns, add year column, zap Stata metadata
df <- mort_1998 %>% 
    select(one_of(c("race", "hspanicr", "ager27", "restatus", "ucod")),
           starts_with("record_"), 
           starts_with("rnifla")) %>% 
    add_column(year = 1998) %>% 
    zap_dta_data()

## Drop nonresidents
df <- df %>% 
    filter(restatus %in% 1:3) %>% 
    select(-restatus)

## Clean codes
df <- clean_icd9_data(df)

## Unite contributory causes
df <- unite_records(df)

## Convert age, add hspanicr, remap race, and add categories
df <- df %>% 
    convert_age27() %>% 
    add_hspanicr_column() %>% 
    remap_race() %>% 
    mutate(race_cat = categorize_race(race), 
           hsp_cat  = categorize_hspanicr(hspanicr), 
           age_cat  = categorize_age_5(age))

## Reorder columns
df <- df %>% 
    select(year, race, race_cat, hspanicr, hsp_cat, 
           age, age_cat, ucod, f_records_all)
Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.