Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
This package will not go on CRAN, as it is and will continue to be published as a Bioconductor package (with ROpenSci co-branding, pending success of this submission).
This is a really nice package, which fills an important gap in the tools available to work on sequence data in R. Genbank is a very large sequence database, and genbank-formatted records provide rich information about the sequences it contains. This package provides tools to fetch and parse genbank records, and a set of functions to extract information from the parsed records. Importantly, the paclage integrates well with
Clashes between ropensci and Bioconductor styles
One of the difficulties in reviewing the package is that
This is perhaps another example of the different between each project. In
I have to say I also find the the installation instructions a little daunting. Perhaps start by emphasizing that almost all users should use the current Bioconductor release, which can be installed easily with
I found package-level function to be very good, both clear and complete. Though perhaps not technically documentation, the print function for genbank records is also well thought-out, providing the user wih a clear indication of what each record contains.
The package vignette is also clear and does a good job of demonstrating basic usages and integrations for the package. I think it would also be very helpful to show how the features in a genbank records can be used in an analysis.
For instance, an example demonstrating the best way to use features in a given file and
For the example of reading data into and R sessions, I think it's better to store the data in
I appreciate the vignette section of the "limitations" fo the package. The lack of standards as to exactly what goes into a genbank file means such limitations are inevitable, but this clear statement about how those problems are dealt with is helpful.
I have very little to comment on the code itself -- it's well written, clear and appropriately modular. I have not stress tested the package on very large files, but performance on typically-sized files has been good. I don't think there are an missing functions -- the package does a good job of "doing job well", so I like the focus of the current packages is good.
It would be good to have a code of conduct -- even if you envisage mostly working on the project yourself it's useful to have an indication that that would-be collaborators can expect to be treated well.
As mentioned above, it would be useful to have a clear indication of where bugs and/or questions about the package should be sent.
I think the the
Finally, the reviewer recommendations ask for an estimate of the time taken to review the package.
Thank you for the the thorough review. I have addressed and closed the issues you kindly filed in my genbankr repository. Please find my initial responses to the other points of your review below:
Unfortunately, in many cases my use of camelCase are not cases of me adopting the Bioconductor style, but rather my package providing methods to existing Bioconductor generics (e.g., getSeq, cdsBy, etc). As such I'm unable to change the function naming style of my package. In the interest of full disclosure, I would have chosen to use camelCase rather than snake_case anyway, as it is my preference, but that is moot as consistency is paramount and my hands are tied regarding the names of numerous methods my package provides.
The BiocInstaller package is not itself available on CRAN. The mechanism for initially installing it is the sourcing of the file you mention. Once that is installed, you are correct that
With respect to badges, these are avaiable at
I will add some mention of the Bioconductor help site to the Readme, I agree that that isn't super clear now.
I have changed the vignette so that it uses system file to retrieve the example genbank file (this is in devel only, not backported to the current release).
I don't really have the applied subject material expertise to determine what a compelling, more realistic example would be, so I have not added one now. I will talk to people I know are using it and try to work one in for the next release. I will add the mini-applications the reviewer mentions as well for that release.
Thank you for your kind words. As mentioned, I have fixed the two bugs you identified in your issue. Please bring any future problems you find to my attention as well, and the turn-around time should be shorter when I'm not in the middle of a week of conferences.
I would have thought that once
I have modified the README.md file to contain a link at the top to the official Bioconductor splashpage for genbankr.
I've also added CONDUCT.md.
PRs to this github repository should actually be fine, so I haven't made any changes there.
Please let me know if there's anything else outstanding that I need to address.
Hmm, I missed this message somehow. My bad.
I'd prefer any example usage code that has any meat to it live somewhere where it's compiled (and thus automatically tested), e.g. the vignette or examples in the documentation. I suppose I can add a non-run example call to readGenBank to the readme. Going beyond that seems like it is approaching a vignette that never gets built.
I'll add the link to bioc support, I had meant to do that.