Remove R parsing code in favor of qiime.io? #1135
Comments
+1 |
Well, depends. I basically just threw the package meta stuff (doc, depend, etc) around the So, is there a real "need" for this? If so, I can think about budgeting time to make this happen, and migrate over relevant code from phyloseq. Sorry, I'm a little out of the loop. |
Thanks for the details, @joey711. I wasn't aware that this code was also ported to phyloseq. I think there is a real need for this code to be removed from QIIME and pushed into its own package. QIIME would then depend on qiime.io. I think it'd be a huge plus because it would make it easy for R users to load QIIME files (we get fairly frequent requests for this on the QIIME forum), instead of copying this functionality from QIIME or elsewhere. qiime.io will likely be very lightweight, which also encourages adoption as a dependency in other microbiome projects. The biggest plus that I see is having unit tests, as there are occasionally issues with this parsing code (e.g., #1132 #996 #236). Of course, we could add unit tests within QIIME, but I think there are many benefits to extracting the code into its own package. It'd also be great to have this hosted on CRAN. @danknights @kylebittinger what do you guys think? Other devs? I'd be happy to review code, offer feedback, suggest specific unit test cases, etc. but I'm not the best person to code this since my R experience is extremely limited. |
I think it's a great idea. |
@jrrideout @danknights @kylebittinger I have not incorporated any of the Sorry for the confusion. To clarify, I was just suggesting I might be willing to migrate some of the extra bells and whistles (and unit tests) from the phyloseq version of legacy-qiime importing code into the Here is a tutorial on that: Happy to discuss. I agree |
@joey711 thanks for the clarification. I think it'd be fine (and a really good idea) to take either the code from phyloseq or loaddata.r and make it into an R package that will generally be useful to others, not just within QIIME or phyloseq. I don't think we need to stick to the existing interfaces that are in loaddata.r, as long as it's relatively easy for QIIME and phyloseq code to be updated to use this new package. Does anyone else have interest/bandwidth to work on this? |
Paul, Jai, Greg, Dan, I am all for establishing qiime.io as an optional dependency and doing some I think it would be a great idea to make qiime.io a "back end" package for I added some unit tests to qiime.io a while back, so we should be "almost --Kyle On Wed, Oct 2, 2013 at 6:41 PM, Jai Ram Rideout notifications@github.comwrote:
|
I can try to take a look at this over the weekend. The count of redundant qiime-parsing code now appears to be 3. All the more reason to make this happen, IMHO. @kylebittinger (or others), what sort of bugs are we worried about introducing? I figured I would follow the output data types and (meager?) tests that we currently have, so that the difference is opaque to QIIME. Similarly, is there anything we should enhance? I already suggested adding the chunk-streaming to accommodate large files without memory-swapping, since I've already written and tested that. It also optionally spits progress dots to standard out. Anything else? I don't currently have any support in phyloseq or my own private R scripts for writing a QIIME-legacy file from an OTU abundance matrix (plus optional meta data). This isn't needed for the QIIME interface, since QIIME does that on its own, but it would augment the R interoperability with QIIME. And add validity to the "o" in |
I agree with legacy output and chunk-streaming. Dan Knights, PhD On Thu, Oct 3, 2013 at 12:32 PM, Paul J. McMurdie
|
Legacy output would be great and probably easy enough to write. I also Looked over the chunk streaming section of the phyloseq code ( Thinking more about my fear of bugs, adding a few more tests for the output --Kyle On Thu, Oct 3, 2013 at 1:47 PM, danknights notifications@github.com wrote:
|
Whoops, just noticed/remembered that it includes optional parallelization of the chunk parsing: @kylebittinger |
Thanks! On Thu, Oct 3, 2013 at 2:25 PM, Paul J. McMurdie
|
This all sounds good, and I agree that having writers (in addition to parsers) is a good idea. Regarding bugs, I can help come up with unit test cases for issues that we've run into in the past. A good place to start, though, is to add unit tests that mimic the ones in QIIME, highlighted here: https://github.com/qiime/qiime/blob/master/tests/test_parse.py#L145-L254 |
Still working on tests for this. What is the expected behavior if there are spaces at the beginning/end of a --Kyle On Thu, Oct 3, 2013 at 2:36 PM, Jai Ram Rideout notifications@github.comwrote:
|
I prefer to remove spaces at the beginning/end of headers. Is there a specific scenario where they arise, or it this just a general check? Sorry I've been slow on this. I'll try to budget some time this week. Would be nice to get this submitted to CRAN soon. |
check_id_map.py flags leading/trailing whitespace in the header fields as a warning. This behavior is a bit odd, since the data fields (i.e. non-header fields) have leading/trailing whitespace stripped, but the headers don't. Are there any reasons why this behavior is in place? If not, I'd vote to have both QIIME and R strip leading/trailing whitespace from all fields (for consistency). |
+1 |
That sounds good to me. I skirted the issue for the current pull request. I included most of the tests in the QIIME python code. I did not Pull request is in! On Mon, Oct 14, 2013 at 4:30 PM, Jai Ram Rideout
|
Second pull request is in (to qiime.io) for stripping whitespace from What is on our to-do list before we get qiime.io onto CRAN? (I am excited Best, On Wed, Oct 16, 2013 at 10:18 AM, Kyle Bittinger kylebittinger@gmail.comwrote:
|
Thanks @kylebittinger! Yep, I think QIIME should be updated so that both parsers match. The QIIME mapping file docs should also be updated. |
I'll take a look. I'm guessing |
The qiime.io R package contains the parsing functionality that is in qiime/support_files/R/loaddata.r. I think we should remove loaddata.r and make qiime.io an optional QIIME dependency (like we are doing with randomForest, vegan, etc.). This is important because if there is a bug fix in one place, the other will have to be updated. For example, see #1132. loaddata.r isn't unit-tested, either, which makes it scary to maintain.
@joey711 it looks like qiime.io is still under development and doesn't have an official release yet. What are the remaining items that need to be in place before this is ready for its first release?
The text was updated successfully, but these errors were encountered: