Fast ICD-10 and ICD-9 comorbidities, decoding and validation in R
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.aspell update all internal data, notably ICD-10 maps now include ICD-10-CM 2… Feb 5, 2019
R drop deprecated functions, clean up namespace for sort and order Feb 17, 2019
benchmarks fix sort and order naming and tests Feb 16, 2019
data do regenerate the ICD-10-CM procedure code map [ci skip] Feb 5, 2019
inst moving around more code and data to separate icd.data; linting line l… Jan 21, 2019
man-roxygen major chunk completing ICD WHO things and partially incorporating ann… Feb 3, 2019
man drop deprecated functions, clean up namespace for sort and order Feb 17, 2019
src fix sort and order naming and tests Feb 16, 2019
tests drop deprecated functions, clean up namespace for sort and order Feb 17, 2019
tools drop deprecated functions, clean up namespace for sort and order Feb 17, 2019
vignettes spelling, and test fix for icd < 1.1 Feb 16, 2019
.Rbuildignore work with update icd.data, use active ICD-10 data instead of 2016 Feb 12, 2019
.Rinstignore JSS article, benchmarks, test improvements, documentation improvements May 5, 2018
.covrignore JSS article, benchmarks, test improvements, documentation improvements May 5, 2018
.gitattributes factorSplit in C++ works May 25, 2018
.gitignore all vignettes compile Jan 22, 2019
.lintr moving around more code and data to separate icd.data; linting line l… Jan 21, 2019
.travis.yml avoid using :: on a possibly non-existent function if icd.data is too… Jan 30, 2019
CONTRIBUTING.md typo, thanks @vitallish Jan 25, 2019
COPYING merged Apr 12, 2015
DESCRIPTION drop deprecated functions, clean up namespace for sort and order Feb 17, 2019
NAMESPACE drop deprecated functions, clean up namespace for sort and order Feb 17, 2019
NEWS.md drop deprecated functions, clean up namespace for sort and order Feb 17, 2019
README.Rmd appveyor needs nhds; readme vignette code instead of direct links (wh… Jan 30, 2019
README.md appveyor needs nhds; readme vignette code instead of direct links (wh… Jan 30, 2019
_config.yml Set theme jekyll-theme-cayman Jan 30, 2017
_pkgdown.yml tidy up internal documentation for pkgdown [ci skip] Nov 27, 2018
appveyor.yml fix sort and order naming and tests Feb 16, 2019
codecov.yml metadata stuff, travis, pkgdown Jul 20, 2018
cran-comments.md refactored heuristics for guessing column types to better handle proc… Feb 8, 2019
icd.Rproj drop deprecated functions, clean up namespace for sort and order Feb 17, 2019

README.md

icd

CRAN lifecycle Project Status: Active – The project has reached a stable, usable state and is being actively developed. GitHub Travis Appveyor codecov.io CII Best Practices CRAN RStudio mirror downloads last calendar month

Comorbidities from ICD-9 and ICD-10 codes, manipulation and validation

Introduction

Calculate comorbidities, Charlson and van Walraven scores, perform fast and accurate validation, conversion, manipulation, filtering and comparison of ICD-9 and ICD-10 codes. This package enables a work flow from raw lists of ICD codes in hospital databases to comorbidities. ICD-9 and ICD-10 comorbidity mappings from Quan (Deyo and Elixhauser versions), Elixhauser and AHRQ included. Common ambiguities and code formats are handled.

icd is used by many researchers around the world who work in public health, epidemiology, clinical research, nutrition, journalism, health administration and more. I’m grateful for contact from people in these fields for their feedback and code contributions, and I’m pleased to say that icd has been used in works like the Pulitzer finalist work on maternal death by ProPublica.

Features

  • find comorbidities of patients based on ICD-9 or ICD-10 codes, e.g. Cancer, Heart Disease
    • several standard mappings of ICD codes to comorbidities are included (Quan, Deyo, Elixhauser, AHRQ, PCCC)
    • very fast assignment of ICD codes to comorbidities (using novel matrix multiplication algorithm and C++ internally)
  • use your existing data format, minimizing requirements for pre-processing
  • summarize groups of ICD codes in natural language
  • Charlson and Van Walraven score calculations
  • Hierarchical Condition Codes (HCC) from CMS
  • Clinical Classifcations Software (CCS) comorbidities from AHRQ
  • Pediatric Complex Chronic Condition comorbidities
  • AHRQ ICD-10 procedure code classification
  • annual revisions of ICD-9-CM and ICD-10-CM
  • correct conversion between different representations of ICD codes, with and without a decimal points, leading and trailing characters (this is not trivial for ICD-9-CM). ICD-9 to ICD-10 cross-walk is not yet implemented
  • comprehensive test suite to increase confidence in accurate processing of ICD codes
  • all internal ICD and comorbidity data is extracted directly from public data or code, allowing end-to-end reproducibility
  • used, tested and benchmarked against other comorbidity calculators on hardware from laptops to big servers

Examples

See also the vignettes and examples embedded in the help for each function for more. Here’s a taste:

# install.packages("icd")
library(icd)

# Typical diagnostic code data, with many-to-many relationship
patient_data
#>   visit_id  icd9
#> 1     1000 40201
#> 2     1000  2258
#> 3     1000  7208
#> 4     1000 25001
#> 5     1001 34400
#> 6     1001  4011
#> 7     1002  4011
#> 8     1000  <NA>

# get comorbidities using Quan's application of Deyo's Charlson comorbidity groups
comorbid_charlson(patient_data)
#>         MI   CHF   PVD Stroke Dementia Pulmonary Rheumatic   PUD LiverMild
#> 1000 FALSE  TRUE FALSE  FALSE    FALSE     FALSE     FALSE FALSE     FALSE
#> 1001 FALSE FALSE FALSE  FALSE    FALSE     FALSE     FALSE FALSE     FALSE
#> 1002 FALSE FALSE FALSE  FALSE    FALSE     FALSE     FALSE FALSE     FALSE
#>         DM  DMcx Paralysis Renal Cancer LiverSevere  Mets   HIV
#> 1000  TRUE FALSE     FALSE FALSE  FALSE       FALSE FALSE FALSE
#> 1001 FALSE FALSE      TRUE FALSE  FALSE       FALSE FALSE FALSE
#> 1002 FALSE FALSE     FALSE FALSE  FALSE       FALSE FALSE FALSE

# or go straight to the Charlson scores:
charlson(patient_data)
#> 1000 1001 1002 
#>    2    2    0

# plot summary of Uranium Cancer Registry sample data using AHRQ comorbidities
plot_comorbid(icd.data::uranium_pathology)

Make “Table 1” summary data

Here we are using the US National Hospital Discharge Survey 2010 data from the nhds package. For the sake of example, let us compare emergency to other admissions. A real table would have more patient features; this primarily demonstrates how to get ICD codes into your Table 1.

NHDS 2010 comorbidities to demonstrate Table One creation. Presented as counts (percentage prevalence in group).

nhds <- nhds::nhds2010
# get the comorbidities using the Quan-Deyo version of the Charlson categories
cmb <- icd::comorbid_quan_deyo(nhds, abbrev_names = FALSE)
nhds <- cbind(nhds, cmb, stringsAsFactors = FALSE)
Y <- nhds$adm_type == "emergency"
tab_dat <- vapply(
  unname(unlist(icd_names_charlson)),
  function(x) {
    c(sprintf("%i (%.2f%%)", 
              sum(nhds[Y, x]), 
              100 * mean(nhds[Y, x])),
      sprintf("%i (%.2f%%)",
              sum(nhds[!Y, x]),
              100 * mean(nhds[!Y, x])))
  },
  character(2)
)
knitr::kable(t(tab_dat), col.names = c("Emergency", "Not emergency"))
Emergency Not emergency
Myocardial Infarction 2709 (3.70%) 1113 (1.42%)
Congestive Heart Failure 12349 (16.85%) 5644 (7.21%)
Periphral Vascular Disease 3843 (5.25%) 3318 (4.24%)
Cerebrovascular Disease 5788 (7.90%) 3177 (4.06%)
Dementia 2176 (2.97%) 729 (0.93%)
Chronic Pulmonary Disease 12216 (16.67%) 7058 (9.02%)
Connective Tissue Disease-Rheumatic Disease 1529 (2.09%) 1143 (1.46%)
Peptic Ulcer Disease 1143 (1.56%) 636 (0.81%)
Mild Liver Disease 2171 (2.96%) 1149 (1.47%)
Diabetes without complications 14399 (19.65%) 9133 (11.67%)
Diabetes with complications 2719 (3.71%) 1449 (1.85%)
Paraplegia and Hemiplegia 1446 (1.97%) 968 (1.24%)
Renal Disease 9387 (12.81%) 4669 (5.96%)
Cancer 2780 (3.79%) 4008 (5.12%)
Moderate or Severe Liver Disease 1080 (1.47%) 521 (0.67%)
Metastatic Carcinoma 2100 (2.87%) 1665 (2.13%)
HIV/AIDS 25 (0.03%) 63 (0.08%)

How to get help

Look at the help files for details and examples of almost every function in this package. There are several vignettes showing the main features:

  • Introduction vignette("introduction", package = "icd")
  • Charlson scores vignette("charlson-scores", package = "icd")
  • Examples using ICD-10 codes vignette("ICD-10", package = "icd")
  • CMS Hierarchical Condition Codes (HCC) vignette("CMS-HCC", package = "icd")
  • Pediatric Complex Chronic Conditions (PCCC) vignette("PCCC", package = "icd")
  • Working with ICD code ranges vignette("ranges", package = "icd")
  • Comparing comorbidity maps vignette("compare-maps", package = "icd")
  • Paper detailing efficient matrix method of comorbidities vignette("efficiency", package = "icd")

Many users have emailed me directly for help, and I’ll do what I can, but it is often better to examine or add to the list of issues so we can help each other. Advanced users may look at the source code, particularly the extensive test suite which exercises all the key functions.

?comorbid
?comorbid_hcc
?explain_code
?is_valid

# first show the list
vignette(package = "icd")
vignette("introduction", package = "icd")

Relevance

ICD-9 codes are still in heavy use around the world, particularly in the USA where the ICD-9-CM (Clinical Modification) was in widespread use until the end of 2015. ICD-10 has been used worldwide for reporting cause of death for more than a decade, and ICD-11 is due to be released in 2018. ICD-10-CM is now the primary coding scheme for US hospital admission and discharge diagnoses used for regulatory purposes and billing. A vast amount of electronic patient data is recorded with ICD-9 codes of some kind: this package enables their use in R alongside ICD-10.

Comorbidities

A common requirement for medical research involving patients is determining new or existing comorbidities. This is often reported in Table 1 of research papers to demonstrate the similarity or differences of groups of patients. This package is focussed on fast and accurate generation of this comorbidity information from raw lists of ICD-9 codes.

ICD-9 codes

ICD-9 codes are not numbers, and great care is needed when matching individual codes and ranges of codes. It is easy to make mistakes, hence the need for this package. ICD-9 codes can be presented in short 5 character format, or decimal format, with a decimal place separating the code into two groups. There are also codes beginning with V and E which have different validation rules. Zeroes after a decimal place are meaningful, so numeric ICD-9 codes cannot be used in most cases. In addition, most clinical databases contain invalid codes, and even decimal and non-decimal format codes in different places. This package primarily deals with ICD-9-CM (Clinical Modification) codes, but should be applicable or easily extendible to the original WHO ICD-9 system.

ICD-10 codes

ICD-10 has a somewhat simpler format, with consistent use of a letter, then two alphanumeric characters. However, especially for ICD-10-CM, there are a multitude of qualifiers, e.g. specifying recurrence, laterality, which vastly increase the number of possible codes. This package recognizes validity of codes by syntax alone, or whether the codes appear in a canonical list. The current ICD-10-CM master list is the 2016 set. There is no capability of converting between ICD-9 and ICD-10, but comorbidities can be generated from older ICD-9 codes and newer ICD-10 codes in parallel, and the comorbidities can then be compared.

Development version

The latest version is available in github icd, and can be installed with:

    #install.packages("remotes")
    remotes::install_github("jackwasey/icd")