Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMS HCC risk adjustment models #31

Closed
jackwasey opened this issue Aug 29, 2014 · 37 comments
Closed

CMS HCC risk adjustment models #31

jackwasey opened this issue Aug 29, 2014 · 37 comments

Comments

@jackwasey
Copy link
Owner

http://www.cms.gov/Medicare/Health-Plans/MedicareAdvtgSpecRateStats/Risk-Adjustors.html

assigns ICD-9 codes to HCC codes, but also needs age and gender inputs.
many:many mapping, but should still be able to map from set of ICD-9 codes for an individual to a set of HCC codes (which could be considered comorbidities).

@jackwasey jackwasey modified the milestone: someday maybe Jan 24, 2015
@mongoose54
Copy link

@jackwasey Do you happen to know if there is an implementation of HCC risk scoring available in a language other than SAS? Unfortunately CMS has the software written in SAS language and its documentation is hard to read to re-write the code in another language.

@jackwasey
Copy link
Owner Author

I haven't seen another implementation. The SAS code is horrible: I just looked through it. The logic is actually very simple, and would be easy to implement using my R package as a basis. Also, the SAS code makes no attempt to identify errors in the data, which my package could do. This is on my to-do list, but I won't be able to get to it for quite a long time. Would you be interested in working on this?

There is one binary SAS data file in the v22 HCC software package which has the coefficients. Everything else is text, and shows some very simple logic which looks up CC codes from ICD-9-CM codes, applies the hierarchies, and then looks up coefficients based on per-patient flags provided by the user.

Would be good to have some public test data to work against. Maybe the Vermont data has this.

@mongoose54
Copy link

I am interested in implementing the CMS HCC. I am pretty busy on my side but I might be able to contribute something in the near future. For those interested, the National Bureau of Economic Research has more description on HCC: http://www.nber.org/data/cms-risk-adjustment-models.html

@jackwasey
Copy link
Owner Author

Thanks, that would be magnificent. Let me know if there is anything that
could be tweaked in the core code that might make it easier for you.
@wmuprhyrd just implemented a different risk score (van Walvaren), and it
went very well. CMS HCC has additional complexity, but I think our platform
is solid enough to handle this without any problems.

On Sun, Mar 22, 2015 at 9:25 PM, Alex notifications@github.com wrote:

I am interested in implementing the CMS HCC. I am pretty busy on my side
but I might be able to contribute something in the near future. For those
interested, the National Bureau of Economic Research has more description
on HCC: http://www.nber.org/data/cms-risk-adjustment-models.html


Reply to this email directly or view it on GitHub
https://github.com/jackwasey/icd9/issues/31#issuecomment-84743655.

@anobel
Copy link
Collaborator

anobel commented Apr 21, 2016

Hi @jackwasey and @mongoose54
I need to assign HCCs to ICD9s for a project I'm working on, and have started to implement this in R. the repo is at https://github.com/anobel/icdtohcc and is still quite basic (just importing/cleaning data so far), but would be interested in integrating it into the icd package. I've used icd but have not looked into the code recently (especially with update from icd9 -> icd). Would love to collaborate/help.

@marksendak
Copy link

Hi all, I just replied to an email from @anobel, but I'd love to help tackle this. I posted a simple conversion of SAS -> R on my blog about a year ago (http://healthydatascience.com/cms_hcc.html), but it's not generalizable and only builds HCC scores using a crosswalk from a single year.

@jackwasey
Copy link
Owner Author

This is perfect material for the package. I've not worked with this mapping before. My initial impression is that it could be implemented by having four new mappings. One for ICD-9 and ICD-10, with high or lower level categories represented by each. A function could then be added which would take a logical argument to determine whether to use the high or low level mapping.

This would not be much work at all, once each mapping is represented as a named list. I may well not understand the complexity of the hierarchy. Do we also need to be able to go from low-level to high level? Are there conditionals on assignment to groups which are based on things other than ICD codes?

Happy to help and accept pull requests.

@jackwasey jackwasey modified the milestones: v2.1, someday maybe Apr 24, 2016
@anobel
Copy link
Collaborator

anobel commented Apr 24, 2016

I've written the code to create condition categories (CC) with labels for both ICD9/10, for every year from 2007-2016, from the original CMS files. I've also applied these CCs to ICDs, taking into account year and ICD version, which was straightforward. My progress is at icdtohcc.

The issue I can't seem to resolve is how to apply the hierarchy rules to convert CCs to HCC. Basically, for each year, you need to identify if a patient has one of the more severe CCs, and then zero out the less severe CCs. for example, if they have the CC for metastatic cancer and CC for prostate cancer, you have to zero out the CC for prostate cancer. I've created a dataframe of the hierarchy rules, but would love any help/insight into how to best apply the rules to the patient lists...I posted a question to stackoverflow with no responses yet.

Ife we can get this figured out, it would be great to incorporate it into the icd package

@michaelgao8
Copy link
Contributor

From your stackoverflow question,

For every id/date combination, I need to check the hierarchy table for rules (for that year, as they change each year). If the condition category cc matches the hierarchy rule ifcc, each id/date in the df table needs to have the cc set to zero/NA/removed if it is in V2-V7 columns.

If the cc matches the ifcc for a given id/date, are you saying for the rest of the same id/date combinations, you want to essentially remove that row from the df?
I'm unclear as to how the cc can be set to 0 if it matches ifcc (since it then won't appear in v2-v7).

Sorry, I'm not a SO user yet, so can't comment there.

@anobel
Copy link
Collaborator

anobel commented Apr 24, 2016

Yes, @michaelgao8, thats correct. If the cc matches the ifcc, then any cc that falls within v2-v7 should be removed from the patient list for that date/id combination...

I hope that makes sense. We can always reshape the data in any way that makes this more simple

@marksendak
Copy link

@anobel, I'll answer your question from icdtohcc here. I'm assuming you have a column with a patient identifier and a column for date of encounter. Some thoughts on how to do this efficiently:

  • Convert from wide to long format using a function like melt (more info here: https://cran.r-project.org/web/packages/data.table/vignettes/datatable-reshape.html). Your id.vars would be the patient identifier and the date columns and measure.vars would be the ~25 columns with diagnoses.
  • Index the long table by ICD code. You do this with setkey in data.table and this should make the searching much faster
  • Merge in the appropriate HCC mapping by date (so you match date of encounter with the crosswalk for that year)
  • Convert back to wide with a function like dcast. In the LHS of the equation, you'll have patient ID and year. For the fun.agg function, you should be able to use the same custom function in the cast example on the page above. This will count the number of observations for each patient for each year.

I'm a sucker for data.table. I saw before that you use tidyr, which may be as fast, but will be a different syntax. Let me know how this works out (or doesn't)!

@anobel
Copy link
Collaborator

anobel commented Apr 25, 2016

@mpdakkak I want to clarify between cc and hcc and the current status. I've converted all the patient data to long format and mapped from ICDs to CC, and the merging is quite fast. The issue I can't seem to resolve is how to apply the hierarchy rules to the CCs to create HCCs. check out the stack overflow question (and maybe give it an up vote to get more attention!)

@jackwasey
Copy link
Owner Author

I'd just add that, much as I love ddply etc, I don't want to add a massive dependency load to the package. I wrote a couple of wrapper functions that do long to wide and wide to long using base functions, but they also do validation of arguments and guessing which are the ICD code columns in a wide table. See 'icd_wide_to_longand 'icd_long_to_wide. I'll have to leave the HCC discussion to you for now.

@anobel
Copy link
Collaborator

anobel commented Apr 27, 2016

That makes sense. I've been able to implement the hierarchy to generate HCCs from CCs. I've also removed all dependencies to external packages except stringr (which I see icd depends on already). I'm using icd_wide_to_long() as well now. Will have to look into your code in more detail to sort out a consistent way to integrate this

@anobel
Copy link
Collaborator

anobel commented Sep 1, 2016

does this mean we can close this?

@iamsafy
Copy link

iamsafy commented Oct 31, 2016

I just started to implement the SAS version of HHS-HCC (2016) model in R. Can anyone help me to convert all the SAS code including the calculation of score to R?

@jackwasey
Copy link
Owner Author

Thanks for your message. Glad to hear you're working on this. @anobel led the HCC code which is already in icd. Perhaps he is able to help you.

@anobel
Copy link
Collaborator

anobel commented Nov 5, 2016

thanks! I've implemented code to assign CC and HCC categories based on the CMS model, using both ICD9 and ICD10. We've implemented this for multiple years. However, the next step would be to actually use the HCCs to assign the year-specific CMS-HCC "score". I have not yet tackled this but would love any help. The SAS code is available from CMS (https://www.cms.gov/Medicare/Health-Plans/MedicareAdvtgSpecRateStats/Risk-Adjustors.html)
there is also a project here (https://github.com/healthactuary/cmshcc) that I wasn't able to get working but may have ones useful code

If you have a moment, take a look at what we've already incorporated in HCC assignment in ICD, and lets chat about how to extend this to add the scoring component.

@iamsafy
Copy link

iamsafy commented Nov 7, 2016

Thanks @jackwasey and @anobel for reply. I have go through you project and it's really helpful. Once I complete the preliminary stages, will start the to converting the HCC score.

@devonbrackbill
Copy link

Hi, I'm wondering what the status is of converting the HCC scores into risk scores based on the coefficients from CMS's model in the SAS code (https://www.cms.gov/Medicare/Health-Plans/MedicareAdvtgSpecRateStats/Risk-Adjustors.html). I'm struggling to find the relevant coefficients in the SAS code. Has anyone made any progress on this?

@arunaryan
Copy link

I have been working on my own version of this but in t-SQL. Please refer to the below script.
https://github.com/arunaryan/health-analytics/blob/master/HCC_Risk_score.sql

@arunaryan
Copy link

arunaryan commented Feb 2, 2017

After converting ICD codes to CC and then to HCC there is another step to pick up HCC codes following contributing to score by Hierarchy. Please refer to page-8 of the attached document.
HCC_risk_adjustment_051215.pdf

The below website is very helpful as they have converted the sas coeffcient files into .csv for use.

http://www.nber.org/data/cms-risk-adjustment.html

@devonbrackbill
Copy link

devonbrackbill commented Feb 2, 2017

@arunaryan Thanks, I was just about to write to you about where you were storing the coefficients when you run line 1296:
CROSS JOIN ref..HCCCoef hcc
because I wasn't seeing that table anywhere in your code. So am I correct you just build this table from the pdf somehow? Do you have the coefficients from that PDF in a machine readable format by any chance?

@arunaryan
Copy link

@devonbrackbill The table is created from the coefficient files I found as .csv from http://www.nber.org/data/cms-risk-adjustment.html

each row has a unique modelid, year and coeff for all possible HCC, interaction variables, demographic variables as per the HCC Risk model. The code is still work in progress and I will add comments and the data model for the sproc. Apologies for the mess :)

@devonbrackbill
Copy link

devonbrackbill commented Feb 2, 2017

No, I think the mess is Medicare's fault! The problem with the NBER coefficients in the .csv file is that it's difficult to interpret what the coefficient names mean. Like what does SNPNE_MCAID_ORIGDIS_NEM68 mean? There are a bunch like that that are impossible to comprehend, unless I'm missing something obvious.

Though it looks like you made some headway on it in your SQL code.

@Aquaroyal72
Copy link

Hi Jack,
I need some help using your R CMS_HCC model? I'm new with R programming, I know SAS. can you help me, Please.

Thank you,
Shailesh Patel

@anobel
Copy link
Collaborator

anobel commented Mar 27, 2017

Hi Shailesh;
Is there a specific issue you're having or need help with? the HCC assignment is implemented in a similar fashion to elixhauser and charlson assignment in the package, and those two are well documented.
thanks,
a

@Aquaroyal72
Copy link

Hello everyone,
THIS IS SAS hierarchies:-> model v22

%*imposing hierarchies;
/*Neoplasm 1 */ %SET0(CC=8 , HIER=%STR(9 ,10 ,11 ,12 ));
/*Neoplasm 2 */ %SET0(CC=9 , HIER=%STR(10 ,11 ,12 ));
/*Neoplasm 3 */ %SET0(CC=10 , HIER=%STR(11 ,12 ));
/*Neoplasm 4 */ %SET0(CC=11 , HIER=%STR(12 ));
/*Diabetes 1 */ %SET0(CC=17 , HIER=%STR(18 ,19 ));
/*Diabetes 2 */ %SET0(CC=18 , HIER=%STR(19 ));
/*Liver 1 */ %SET0(CC=27 , HIER=%STR(28 ,29 ,80 ));
/*Liver 2 */ %SET0(CC=28 , HIER=%STR(29 ));
/*Blood 1 */ %SET0(CC=46 , HIER=%STR(48 ));
/*SA1 */ %SET0(CC=54 , HIER=%STR(55 ));
/*Psychiatric 1 */%SET0(CC=57 , HIER=%STR(58 ));
/*Spinal 1 */ %SET0(CC=70 , HIER=%STR(71 ,72 ,103 ,104 ,169 ));
/*Spinal 2 */ %SET0(CC=71 , HIER=%STR(72 ,104 ,169 ));
/*Spinal 3 */ %SET0(CC=72 , HIER=%STR(169 ));
/*Arrest 1 */ %SET0(CC=82 , HIER=%STR(83 ,84 ));
/*Arrest 2 */ %SET0(CC=83 , HIER=%STR(84 ));
/*Heart 2 */ %SET0(CC=86 , HIER=%STR(87 ,88 ));
/*Heart 3 */ %SET0(CC=87 , HIER=%STR(88 ));
/*CVD 1 */ %SET0(CC=99 , HIER=%STR(100 ));
/*CVD 5 */ %SET0(CC=103 , HIER=%STR(104 ));
/*Vascular 1 */ %SET0(CC=106 , HIER=%STR(107 ,108 ,161 ,189 ));
/*Vascular 2 */ %SET0(CC=107 , HIER=%STR(108 ));
/*Lung 1 */ %SET0(CC=110 , HIER=%STR(111 ,112 ));
/*Lung 2 */ %SET0(CC=111 , HIER=%STR(112 ));
/*Lung 5 */ %SET0(CC=114 , HIER=%STR(115 ));
/*Kidney 3 */ %SET0(CC=134 , HIER=%STR(135 ,136 ,137 ));
/*Kidney 4 */ %SET0(CC=135 , HIER=%STR(136 ,137 ));
/*Kidney 5 */ %SET0(CC=136 , HIER=%STR(137 ));
/*Skin 1 */ %SET0(CC=157 , HIER=%STR(158 ,161 ));
/*Skin 2 */ %SET0(CC=158 , HIER=%STR(161 ));
/*Injury 1 */ %SET0(CC=166 , HIER=%STR(80 ,167 ));
How I can convert to R, I tried, I get lost.
Please someone help, I greatly appreciated.

Thank you,
Shailesh Patel

@Aquaroyal72
Copy link

How I can do above hierarchy by table driven? using elm column, in my table.

Thank you,
Shailesh

@ekortemeier
Copy link

Hi everyone!

This is really great work. I was wondering if there has been any progress in calculating a risk score from the HCC score in R? Also I know this was already mentioned, but this package (https://github.com/validatehealth/cmshcc) seems like it does all of the steps, according to these slides (http://ase.uva.nl/binaries/content/assets/subsites/amsterdam-school-of-economics/r-in-insurance/webster-risk-adjustment-in-r.pdf?1437549456225). Does anyone know how to use this package?

Thanks!
Emma Kortemeier

@jackwasey
Copy link
Owner Author

Emma, and others, I took a brief look at the package 'cmshcc' by @healthactuary and @npritzl. It seems like a compact bit of code and a good fit for including in this package, or at least referencing. Those guys used dplyr, which I have avoided until now to keep the dependencies lower. We could see how we could work together: does the output from 'icd' make good input for 'cmshcc'? Could or should we bring that or similar code into 'icd'. Open to suggestions.

@healthactuary
Copy link

Hi Jack, thanks for your response. We could look at using icd instead of dplyr to do the actual diagnosis grouping. Which function in icd would be used? We actually used to use a function from icd9 but the change to icd for icd10 caused us to use dply. Would be open to merging into the icd package if there is an equivalent to the dcast function in dplyr to further process the output of icd. Take care. @ekortemeier what do you think? you have used both packages I believe.

@ekortemeier
Copy link

It is great to hear from everyone! I have looked at both packages, but ended up using 'cmshcc' for calculating risk scores (thanks for your help Andrew!).
I wasn't able to figure out how to use the icd package to do what I wanted ; i.e., produce a risk score starting from a dataset with diagnosis information (including icd10 codes). That being said, I am not sure if the output from 'icd' would be suitable input for 'cmshcc'. I do think the icd_map_cc_hcc table could be very useful, I just wasn't sure how to utilize it. Hope that helps!

@jackwasey
Copy link
Owner Author

@healthactuary icd_comorbid (and the family of related functions, such as icd9_comorbid_elix) should work.

I'm about to release a big new update which will dramatically improve performance on big data using matrix algebra. It will also use function names like comorbid_elix instead of the previous icd_comorbid_elix, but you can continue to use the previous function names.

And @ekortemeier - keep in touch: if you're working with ICD codes, then the community here can offer advice, and possibly extend icd if you want it to do something useful.

(closing this issue as it was implemented a while ago by @anobel - thanks!)

@jackwasey
Copy link
Owner Author

Hi, all, following up with this.
@healthactuary just reread your comment: " Would be open to merging into the icd package if there is an equivalent to the dcast function in dplyr to further process the output of icd."

Would you please be able to give some example code using dplyr? Then I can see how this could work with icd.

@healthactuary
Copy link

Hi @jackwasey. Thanks for your response and sorry for the delay!
The "get_hcc_grid" function in the cmshcc package uses dplyr > dcast to essentially just pivot the mapped HCC codes into columnar form. Then the HCC columns are manipulated to perform the hierarchy and interaction factors.
Do you think that there is a function in the ICD package that would take a long-format combination of Medicare beneficiary ID and HCC codes to a unique row with Medicare ID as the primary key and HCC codes as columns.
The mapping of ICD-10 diagnoses to the condition categories (CC's) is done using a merge.


get_hcc_grid <- function(PERSON, DIAG, cmshcc_map) {
dummy_HCC_DIAG <- data.frame(HICNO="DUMMY", DX=cmshcc_map$DX, stringsAsFactors=FALSE)
dummy_PERSON_DIAG <- data.frame(HICNO=PERSON$HICNO, DX="DUMMY", stringsAsFactors=FALSE)
DIAG <- rbind(DIAG, dummy_HCC_DIAG, dummy_PERSON_DIAG) # ensures that all HCC columns appear in the grid
merge_df <- merge(DIAG, cmshcc_map, by = "DX")
merge_df$DX <- NULL
merge_df <- distinct(merge_df)
merge_df$indicator <- 1
hcc_grid <- dcast(merge_df, HICNO ~ CMSHCC, value.var="indicator", fill=0)
hcc_grid <- subset(hcc_grid, HICNO!="DUMMY")
hcc_grid$DUMMY <- NULL
hcc_grid
}

@jackwasey
Copy link
Owner Author

I don't use hierarchical condition codes myself. Would you please be able to give an examples of one input and expected output data frames?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests