-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
submission: EndoMineR #153
Comments
hi @sebastiz - Is this a pre-submission inquiry OR are you submitting a package? Usually with pre-submissions you only need to briefly introduce the pkg and we discuss whether we'd like you to submit the package or not, see examples here https://github.com/ropensci/onboarding/issues?q=is%3Aissue+label%3A0%2Fpresubmission |
Hi Scott
Well it started as a pre submission enquiry... but your guidelines were so useful that I ended up implementing all of them to develop the package.
I guess it's therefore now a submission. Is therr anything else I need to do to make it a formal submission?
Thanks
Sebastian
…________________________________
From: Scott Chamberlain <notifications@github.com>
Sent: 22 September 2017 21:55:05
To: ropensci/onboarding
Cc: sebastiz; Mention
Subject: Re: [ropensci/onboarding] Pre-submission enquiry for EndoMineR (#153)
hi @sebastiz<https://github.com/sebastiz> - Is this a pre-submission inquiry OR are you submitting a package? Usually with pre-submissions you only need to briefly introduce the pkg and we discuss whether we'd like you to submit the package or not, see examples here https://github.com/ropensci/onboarding/issues?q=is%3Aissue+label%3A0%2Fpresubmission
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#153 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AJHXPUMoLMiU5pJU4LfctORmggRcG4T7ks5slCy5gaJpZM4PgPys>.
|
thanks @sebastiz i changed the title to remove pre- part and removed the presubmission label The first thing we (editors) will do is discuss fit for ropensci - we'll get back to you asap with that decision, if it's a fit, then we'll proceed as normal ... |
Hi @sckott There are over 10 million electronic endoscopy records across the UK alone. There is no standardised scripted approach to allow hospitals to analyse these records so that we can assess missed cancer rates/ quality of endoscopy and many other aspects as documented. This is really what this work addresses In terms of broad applicability, the basic functions of the package can be applied to any endoscopic procedure (colonoscopy/ bronchoscopy/ cystoscopy) so that although it is designed as a gastroenterology package it is also a useful package for many other specialities to analyse their data. I have focussed on gastroenterology to give the project a focussed scope and a defined audience. The files data-raw are typical input files which are always semi-structured text. The internal structure of the input files (as in headings used) may change from hospital to hospital which is why the Extractor function is designed to allow the user to define the extractable sections of a report. This gives the user to flexibility to extract the information they want and then apply the relevant downstream functions as needed. Please let me know if you have any further questions and I will answer them as soon as I can. |
thanks! Will get back to you soon |
@sebastiz sorry for delay on this. We think it's a good fit, so we'll go ahead with the review. Will assign an editor very soon ... |
Editor checks:
Editor commentsThank you for the submission. I have done a quick scan through the package and have also copied the output of
can just be
I'd recommend scanning through the testing chapter on Hadley's book, or for a more detailed description, Richie Cotton's book on testing. There are many prebuilt expectations in testthat and I see a bunch of your code is doing things a convoluted way (e.g. checking for classes etc). There are also a lot of imports on your package. For example, I see both ggplot2 and lattice. Is there a good reason to import both? Can some imports be streamlined? If you use Rstudio, please format the code to make it more readable. You can 'tidy' the code easily by going to the menu option code and choosing "reformat code". There are various readability issues, like for example spaces (or lack thereof) around assignment operators. Your examples lack comments. This makes it hard for a new user to figure out what is going on. I'm currently seeking reviewers and will update the thread once I have some folks lined up. Stay tuned. 📻 ── GP EndoMineR ──────────────────────────────────────────────────────────────── It is good practice to ✖ write unit tests for all functions, and all package code
✖ add a "URL" field to DESCRIPTION. It helps users find
✖ avoid long code lines, it is bad for readability. Also,
✖ avoid sapply(), it is not type safe. It might return a
✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...),
✖ not import packages as a whole, as this can cause name
──────────────────────────────────────────────────────────────────────────────── |
OK. I am away for the next couple of days but I will get this done as soon as I am back |
I've made a series of changes. I have 98% test coverage but the 2% left I think is incorrect so I havent covered it. I've also left two sapply dependent functions as they are although goodpractice doesn't like it. All the examples are fully explained and also the vignette is more comprehensive. |
Hi @karthik is there anything else that needs to be done to move this forward? |
Hi @sebastiz Apologies from me. Nothing to do on your side. The reviewers I contacted never replied and I lost track. I will strive to assign new folks asap. |
@sebastiz New reviewers pinged, I'll update the thread in a few days. Thanks for your patience. |
Reviewer 1 is @RMHogervorst (assigned on 11/27/2017) |
Reviewer 2 is @jonclayden (assigned on 11/27/2017). Review due December 18th |
Hi @sebastiz , when I install the package from github the vignette is not included. You could re-document the project in your rstudio session if you toggle knit vignettes. and
and push your changes to github you would not believe how many times I 've had this issue too |
@RMHogervorst I have now done this. Another good place for the package explanation is https://sebastiz.github.io/EndoMineR/index.html |
@karthik. I am currently correcting the documentation and am getting through it as fast as I can. Thanks for checking in. |
@karthik Just to let you know that I am still working on this and should have it finished soon - just renaming functions and reflecting changes in the documentation as per the reviewers' comments |
@karthik @RMHogervorst @jonclayden here is my response to the reviewers following my amendments REVIEWER ONE Design, data and documentation
This has now been amended. Where possible all code documented has an input and expected output. Bodies of functions have been removed. The extractor function has been completely re-written and works as intended.
Unfortunately the dataset has to be this size because of the nature of selecting subsets to show the user how to process different aspects. I could provide several smaller datasets but the overall data directory size may not change significantly as a result. The data has been compressed wherever possible and datasets which were no longer in use have been removed so that the directory is now much smaller
Ths documentation now contains several more images detailing how the functions are used.
The assumptions are necessary to use the package and all medical records are organised by patient unique ID and date a test was performed. NAs do appear in the new column with incorrect parsing so the user should be aware that further data preparation is necessary in order to obtain the required output. The package does not contain package specific error messages at the moment but this may be included in future iterations
At the moment I am struggling to generate this. Clicking on the package in Rstudio allows the documentation to be seen but for some reason ?EndoMineR doesn't seem to access the man files. This is currently the subject of a stackexchange question
All typos have now been corrected Packaging
The DESCRIPTION list has now been shortened with many of the dependencies moved to Suggests
The tests should now run OK with none of the warnings listed. 3.There is no package coverage badge in the README, as required by the packaging guide. There are also no community guidelines, which are stipulated above. These have now been added
Code
All tibbles are now returned as dataframes. I am not keen to split the functions that return a tibble (now a dataframe) and a plot as the dataframe is often the data that feeds the plot but in numerical form, in case the user wants to have access to it. Splitting into a separate function feels like creating a new function for the same unique task. I will separate if absolutely necessary but it doesn't feel logical.
I agree this was a complicated function. This has now been completely re-written and is significantly simpler
The functions have been renamed and shortened to be more logical and consistent
These have all been changed to take more meaningful argument names.
Such cleaning functions have now been created (the HistolAll,EndoAll and BarrettsAll) as suggested.
gsubs and grepls have been replace where possible. Sometimes it is not possible and so they have been left in sight. The functions run well over large datasets so at present this doesn't seem to be a performance issue
REVIEWER TWO
I very much hope that reading through the illustrated vignette and associated website will help the user to understand the package. he functions have been renamed, as have their arguments, to make the functions more intuitive to use. Some of the functions have been re-written altogether to make them simpler.
Patients always get a unique hospital number which is the basis of attributing all data to a unique individual. The unique idenitifier is the same as a hospital number.
The vignettes have all been substantially reorganised to make the assumptions etc part of background reading which doesn't have to be read.
This has been separated out
Thank you. I agree that the first iteration was maybe a little complex. I have spent considerable time trying to re-write the documentation to make it easier to digest. The vignettes have been separated out and pkgdown has been used to create further documentation and tutorials. notes for endominer
I have rewritten the functions that contain for-loops and any functions that required boilerplate code (especially the Extractor function). The re-writing of many of the functions has been specifically to simplify their use and I hope this is reflected in the code examples in the vignette as well as the website. General remarks
It is for all the above. I have incorporated this into the documentation.
3.On the one hand I like that the package is very focused, on the other hand I feel as if it is too specific on some points I have taken this on board and I hope the reviewer agrees with the vignette separation into EndoMineR principle (explaining assumptions etc/ Package overview (explaining all the cleaning functions both pathology and Endoscopy), Analysis (containing all the analysis functions eg Patient flow, surveillance etc) and disease specific vignette (ie Barrett's). This seems more logical.
I agree with the concept of not re-inventing the wheel but this can be useful for beginning R users especially as the merge is not a straightforward merge but allows merging with some variability of the date between endoscopy and pathology (which is the case in real world scenarios where the tissue samples can be received on a separate date to the day of the endoscopy).
Thank you. This is now a separate vignette
sapply has now been repace with vapply
I have now renamed the arguments so it is more intuitive to understand
This is now functional but most references have been removed anyway
The large overview was too big so has been removed. The smaller images have been embellished and added to. Hopfeully this makes things clearer
This has now been removed and re-written as part of the Extractor function
I haven't changed this as long as it is OK
I recieved warnings (with goodpractice::gp()) that entire packages were
I have re-written the names of many of the functions so that they are more intuitive. Hopefully this makes more sense
The acronyms have been explained in the text as it is introduced. I hope this is now clearer I found some interesting things that could be optimized in the funtion 'SurveySankey': you use both dplyr's group_by AND datatable, which also has groupings. AND reshape2 which could be succeeded by tidyr? Vignette walk through Extractor
Extraction of sentences can also be done with the tokenizers package^[Although
cleaning
Thank you. I agree this is a terribly written sentence and I have changed it.
Thank you. I removed all function bodies from the explanatory text and have replaced it with input and expected output to make things clearer
I have re-written this function so that the arguments are clearer. I have also provided examples of what it extracts with a given input The Analysis Functions
The functions carry out several dplyr and ggplot functions to provide output in a standardised way that many users of this kind of package would probably find extremely useful
Thanks. I have explained how to use this in the vignette. Regarding the need for two functions I am not keen to split the functions that return a tibble (now a dataframe) and a plot as the dataframe is often the data that feeds the plot but in numerical form, in case the user wants to have access to it. Splitting into a separate function feels like creating a new function for the same unique task. I will separate if absolutely necessary but it doesn't feel logical. Patient flow functions
The images all load of the computers I have tried so this may be a local issue
I have done as suggested Assessment of quality functions
Thank you I have changed this
This should show a barchart now Summary I think the package has many many timesavers but would be hard to use for r-beginners. Rewriting the examples in the vignette to include a 'before' and 'after' would be very helpful. As well as a quickstart. I would recommend to focus on the documentation first and functionality second. |
I'm looking forwards to checking it out ! It sounds as if you put in a lot of effort. Regarding package docs (you have to make a seperate piece of roxygen with
|
Thanks, @sebastiz. You've clearly done a lot of work on the package, and particularly the documentation, since I last looked at it. I just have a few small follow-up points:
|
@jonclayden thanks. These have now been all corrected. The tests seem to be fine when I run devtools::check(). Hopefully the top level documentation is what is needed and should be accessible with ?EndoMiner. The CONTRIBUTING.md file has been added but the URLs may not be quite correct yet until (and if) the package is taken up by ropensci |
Great! This all looks good, so I think that's everything from me. |
Hi @RMHogervorst just to let you know that the top level package documentation has been corrected as has Jon Clayden's latest corrections. Thanks |
The vignettes look beautiful! Beautiful work Sebastian ! Small note before you submit it to CRAN: |
Thank you @RMHogervost really appreciate that |
Hi @karthik can you tell me what the next steps are? |
Hi @karthik I am keen to move forward with this is it has been accepted by both reviewers as there are some conferences coming up id like to describe it as. It would be great to say it had been accepted here when I do. Thanks |
Hi @sebastiz sorry this fell through the cracks. I will update with next steps in an hour or so. My apologies |
Approved! 🎉 Thank you for submitting and @RMHogervorst and @jonclayden for thorough and timely reviews. To-dos:
Welcome aboard! We'd also love a blog post about your package, either a short-form intro to it (https://ropensci.org/tech-notes/) or long-form post with more narrative about its development. ((https://ropensci.org/blog/). If you are, @stefaniebutland will be in touch about content and timing. |
OK great @karthik . I'll make the relevant changes. Happy to do tech-notes for this. I'd like to submit to JOSS. Can I still do this? |
@sebastiz Great to hear that you'd like to publish a technote on this. We can publish at any time (in contrast to blog posts that are scheduled). Please submit a draft by pull request at your convenience and we can review before publishing. Here are examples: https://ropensci.org/technotes/ Instructions on pull request and preview: https://github.com/ropensci/roweb2#contributing-a-blog-post. The only difference in YAML for technote is I will add the topicid before publishing. Don't hesitate to ask any questions here. |
Yes of course! When you submit to JOSS, mention in the created thread that it was reviewed here and it will get fast tracked. Feel free to @ tag me there. |
Hi @karthik I don't think I have any collaborator invitations on my github yet |
Hi @karthik Yes please resend. ?case sensitive? |
Canceled and resent. Please try now. |
OK thanks @karthik all done. |
Thanks @sebastiz! All set here. 🚀 |
Summary
The goal of EndoMineR is to extract as much information as possible from semi-structured text endoscopy reports and their associated pathology specimens for The package extracts,cleans and manipulates the data in a standardized way for the purpose of automating audit and further research in Gastroenterology.
URL for the package (the development repository, not a stylized html page):
https://github.com/sebastiz/EndoMineR
Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):
[e.g., "data extraction, because the package parses a scientific data file format"]
data extraction,and data munging as the data takes and cleans free text endoscopy and pathology reports and cleans and presents the data in formats needed by gastroenterologists
Gastroenterologists,pathologists but potential for many medical specialities with electronic reporting systems
yours differ or meet our criteria for best-in-category?
There are no other similar R packages to my knowledge
Requirements
Confirm each of the following by checking the box. This package:
Publication options
paper.md
matching JOSS's requirements with a high-level description in the package root or ininst/
.Not yet as this is a pre-publication enquiry
Not yet as this is a pre-publication enquiry
Detail
R CMD check
(ordevtools::check()
) succeed? Paste and describe any errors or warnings:RMD check passes
snake_case not used universally but this can be changed based on the outcome of the pre-submission enquiry
If this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:
The text was updated successfully, but these errors were encountered: