-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cleanEHR #102
Comments
Editor checks:
Editor commentsThank you for your submission! I wanted to clarify one area before moving forward on review, which is understanding the limitations on access to the CCHIC data set. We aim to facilitate the access to open data, but we understand there are limitations in sharing personalized medical records. I am unsure of difference between the data contained in the package, the "toy" data set linked to from the README, and the extent of additional data available to researchers. Could you explain? Also, what are the requirements for receiving access to the broader data set? Is it a matter of signing the agreement to protect patient data or is the reason for research considered? (I note that the link from the README seems to go to a page for editing the access form, not the form itself.) Our reviewers will need access to the "toy" set, at least to review the package. Some additional initial notes (changes not needed immediately, may be addressed during the review):
Reviewers: |
@sinanshi Just checking in, can you answer my questions above? |
@noamross Apologies for the delay. Thank you for your comment. Regarding the data accessibility,
I hope I have answered your question. Thank you for the suggestion of using |
@noamross Just would like to know if we have answered the questions you posed previously? |
My apologies, I am bit behind on my queue here. Will get back in the next day or two. |
Hi @noamross - we would like to cite this library on a paper, do you have an estimation of when the review would be completed? Thanks!! 👍 |
Hi there, I've sorry this has dragged on. The editors have had a fair bit of discussion on this one. It's newer territory for us to be dealing with packages with these types of data limitations, but in the end I'm satisfied that its purpose is to expand data access, rather than just improve processing of a closed set. I'm fast-tracking to make up for the lost time and hope to have reviews to you faster than our usual three week turnaround. I have re-submitted my request for the toy dataset, which I believe goes to @docsteveharris. Please let me know if I can share this set with the reviewers I select or if they will need to submit this separately. In the meantime, there remain several lingering things from our automated
|
Reviewers: @haozhu233 @aammd |
Hi @haozhu233 and @aammd! A friendly reminder that your reviews are due in a week. |
@sinanshi @noamross Sorry for the delay. Please see my review below. I'm happy to discuss more about this package after this post. :) Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Functionality
Final approval (post-review)
Estimated hours spent reviewing: Review CommentsThis package allows users to extract and clean a very unique piece of clinical data for the CC-HIC datasets. Since the original dataset contains multiple complicated forms of data, it could be a very complicated process to start everything from scratch. I can imagine this package can be very helpful for many useRs in that system. I personally think this package agrees with rOpenSci's fit policy and is nice to be listed. However, I also want to note that maybe rOpenHealth fits this topic better? I do have a few comments after trying out this package.
|
This is an intricate and impressive package! It involves Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Functionality
Final approval (post-review)
Estimated hours spent reviewing: Review CommentsReview comments
*At the risk of seeming like a meat-based
warnings and automatic checks
|
Thank you for your reviews, @aammd and @haozhu233! @sinanshi and @dpshelio, please let us know if you have questions about any of the points in the reviews or the automated checks. When you have made changes, please summarize them in a response to reviewers here (Example). |
Thank you so much for your comments! Will have a look shortly. |
hey @aammd - est. hrs spent reviewing? |
@sinanshi Any updates on making changes in response to reviewers? |
Sorry for the delay. We are trying to modify the code based on the reviews. Will come back to you when the refactoring is done. |
@sinanshi Any updates on your changes? Do you have an expected time frame? |
@noamross sorry for spending such a long time to address the review. As we've received some very constructive reviews, we feel it is necessary to restructure our code slightly. That's why it takes longer than we expect. I think we can finish it this week. |
Dear reviewers, Thank you so much for your very helpful comments, and we are very sorry to be spending such a long to time to address these comments. We have made several re-structuring of the code, inspired by your comments. See https://github.com/CC-HIC/cleanEHR/tree/review
To Reviewer 1 @haozhu223
We are not clear about the nature of rOpenHealth, since we didn’t find too much information online. Is it a subsidiary of rOpenSci? Can we be listed in both organisation? We would love to explore the possibilities here.
The critical care data (CC-HIC) is one of the themes of National Institute for Health Research (NIHR) Health Informatics collaborative (HIC). It has other themes which covers a large area of health care topics. https://www.nihr.ac.uk/about-us/how-we-are-managed/our-structure/infrastructure/health-informatics-collaborative.htm
Thank you for the suggestion and you have the point. We do believe xml2 will be a better fit to the XML parser. However, the XML parser is the most complicated part of the code. The XML format we receive from hospitals do not follow strictly the standard. As a result, there are many corner cases that we fixed throughout the years of development. The XML parser is rather like a plugin which serves only with CCHIC dataset. Concerning the complexity of re-writing it with xml2, will it be OK if we leave the xml part this time, and redo them when the new xml standard is introduced in the coming years?
Thanks for pointing it out. ccTable is a Refclass which is served as the data cleaning and manipulation platform. Since the tables it held are usually very big, we intented to use refclass to avoid multiple copying. As we have adopted the OO way, we would like to keep all the ccTable related function, e.g. export.csv to be consistent. More documentation has been made, I hope it is clearer this time.
Thanks, we have added more documentation for the ccTable member methods, which are desigated as ccTable_methods.
As we mentioned above, the pipeline is removed from the package.
Amended, thanks.
New vignette has been made, thanks.
It is really a good point! We have the plan of making a website using shiny for data displaying and cleanEHR functionalities. It will be done as a separate project. To Reviewer2 @aammd
Thank you for your suggestion. We recognise that the naming is a huge problem. As the package started with a lot of experimental elements, naming has not been paid with too much attention. We amended the package to adopt the naming style using prefix where applicable. In particular, the episode.graph() function has been moved to the plot() method for ccEpisode.
sql.demographic.table has been renamed as ccd_demographic_table.
Amended accordingly.
Modified, thanks.
We didn’t manage to switch for all the cases, but we changed a couple of them.
This is a great idea and we moved it to plot() for ccEpisode class now.
Amended accordingly. Thanks.
In fact, the function can accept both list and the path of a YAML configuration file as a input. You are right, it might be confusing to use yaml.load in the vignette. In the new version of the vignette, we demonstrated it without using yaml.load.
We actually feel the data quality reporting and visualisation, where pander and ggplot are used, are crucial to the data cleaning process. For this reason, we feel it is perhaps necessary to make these functions loaded from the beginning. Please let me know if you feel otherwise. We can discuss about this.
That is a mistake. Amended!
Unnecessary files removed.
This is probably a warning from data.table. In some R version it happens. But you can still run through.
Amended. |
Thank you for the response, @sinanshi. @aammd and @haozhu233, let us know if the changes address your concerns and if there are further areas of follow-up. Let me comment on a few things:
|
@aammd and @haozhu233, do you have any comment on the response we made previously? |
Sorry for not following up on this earlier myself, @sinanshi. @aammd and @haozhu233, please do respond to the update. |
@haozhu233 thanks. |
@aammd and @haozhu233 Have you had a chance to review the changes? |
I think the changes @sinanshi made to the cleanEHR package addressed the issues I raised last time really well. I really like the improvements on documentation and I think the changes made it a lot easier for first-time users to understand. I also have a few notes here:
|
Hi @haozhu233. Thank you for your further comments.
|
@aammd Do you have any feedback on the change we've made? |
@sinanshi I'm very sorry for my late reply! Your response to my comments sounds great to me, and looking over the package quickly, everything looks awesome -- especially the vignettes. I'll go over things more in detail by Friday but I don't anticipate anything major |
I think @sinanshi et al did a great job addressing my comments and polishing what was already a truly impressive package. I especially like some of the new naming conventions, it makes it much easier to navigate the package. My comments are all very minor, mostly small tweaks to the vignettes and some comments on the output of
*I worked through both vignettes and I found them clear and everything seems to work.
library(cleanEHR)
data.path <- paste0(find.package("cleanEHR"), "/doc/sample_ccd.RData")
print(data.path) And I worry that it might be a little confusing for some users. Also, is it guaranteed to work on every machine? I'm not sure about that. Personally, I'm more familiar with the approach described in Wickham's "R packages" book, which argues that datasets used in a package should be placed in
IT would be much easier if this .Rdata file was called
dt <- ccd_demographic_table(ccd, dtype=TRUE)
Warning messages:
1: NAs introduced by coercion
2: NAs introduced by coercion
3: NAs introduced by coercion
4: NAs introduced by coercion
5: NAs introduced by coercion
6: In `[.data.table`(demogt, , `:=`("index", seq(nrow(demogt)))) :
Invalid .internal.selfref detected and fixed by taking a (shallow) copy of the data.table so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or been created manually using structure() or similar). Avoid key<-, names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. Also, in R<=v3.0.2, list(DT1,DT2) copied the entire DT1 and DT2 (R's list() used to copy named objects); please upgrade to R>v3.0.2 if that is biting. If this message doesn't help, please report to datatable-help so the root cause can be fixed. I am no data.table expert so perhaps this is nothing to worry about?
Might be too technical for some of your users. Better perhaps to describe what a refclass does, rather than use the technical term. I'm confused about it myself -- does it have anything to do with "Reference Classes" (RC) which yet another object oriented system in R?
|
Thanks @aammd for your comments.
We had a double check and we can confirm that we only imported the necessary packages.
It is an old file that we forgot to delete. It is now called "Data cleaning and wrangling with cleanEHR".
Changed accordingly. Thank you for pointing that out.
We sort of expected the warning messages thought it is not pretty. It converts tree-style data
refclass is indeed the Reference Class.
YAML explanation added. The vignette has actually covered almost all the option of the YAML configuration. We mostly expect the users to use the template directly with a minor modification.
You are right, thanks. We have changed accordingly.
We have added some examples for the imputation function. After all, thank you for all these comments to improve the quality of our package. Sinan |
@sinanshi This looks great! I just read your comments and looked over your commit history on your master branch, and everything looks wonderful. Thank you for a very enjoyable dialogue; i hope it was helpful to you. Congratulations on producing such an ambitious and polished package! |
@aammd, It has been a great pleasure to work with you to improve our package. Thank you very much for you most valuable suggestions. |
Thanks @aammd and @haozhu233 for your thorough reviews and follow-ups. @sinanshi, we are almost there, just a couple of remaining notes from R CMD check:
You can fix these by, when when generating these files with You'll also want to add With those two things fixed, we can approve! To-dos after those fixes:
For submission to JOSS, you'll then want to generate a new Zenodo record with a DOI (you should be able to once you've joined the rOpenSci team and I have made you admin for the repo). Then, over on the JOSS thread, provide them with the updated DOI. |
@noamross Fixed the two issues. |
Approved! OK, you can go ahead and initiate the repo transfer and the rest of the checklist above. |
Badges added, CI links updated, new DOI obtained. |
Summary
What does this package do? (explain in 50 words or less):
cleanEHR works with the CCHIC critical care patient record, which is the UK first open access, high resolution longitudinal critical care database. The package is created to address various data reliability and accessibility problems of CCHIC database. It provides a unique platform that enables data manipulation, transformation, reduction, anonymisation, cleaning and validation.
Paste the full DESCRIPTION file inside a code block below:
URL for the package (the development repository, not a stylized html page):
https://github.com/CC-HIC/cleanEHR
Who is the target audience?
Clinical/medical researchers
Are there other R packages that accomplish the same thing? If so, what is different about yours?
No
Requirements
Confirm each of the following by checking the box. This package:
Publication options
paper.md
with a high-level description in the package root or ininst/
.Detail
Does
R CMD check
(ordevtools::check()
) succeed? Paste and describe any errors or warnings:Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
If this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:
The text was updated successfully, but these errors were encountered: