-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reorganize and clarify import tutorial #358
Conversation
Sorry, I've been super behind on notifications and @thermokarst is on vacation! Re 1: I agree, this seems to happen fairly often. Really we just need someone to write a transformer or two to convert between demux FASTA and the single-file (but still demux) seqs.fna from QIIME 1. Until then, that isn't supported yet. Re 2: Definitely, just change the type. I think that's something that our documentation could be much clearer about (apologies if you've addressed this already, I haven't actually looked yet). The commands are very much a black box right now, but if we explained what the format and type meant, then you could probably extrapolate some of the other imports without needing to enumerate all possible combinations. |
Okay, I will make a note that #1 is not supported (yet). Re 2: this makes me think that I should also add a section that's like "And more!", giving some examples of how people can figure out what other data types they can import. tbh, reading this tutorial makes it seem like these are the only file types you can import, and together with a lack of documentation for I'll draft that bit later this week or next, but it'll probably just be a short sentence about using |
You're a hero @cduvallet! I'm afraid not within the user docs, you'll be among the first to write about it. There is the too technical dev docs which you could use as a reference for why things are the way they are, but it also doesn't have any practical explanation of how an end-user would use this. |
Ok, I made those edits and it should be ready to incorporate! (Though github is telling me there are conflicts that must be resolved... Do I need to do that or do y'all take care of it?) That said, I think that these two issues we brought up are pretty big ones for QIIME 2 - what should I do to make sure that they're documented somewhere? Happy to open github issues in the right places, happy to let y'all manage that! To recap, they are:
|
Thanks so much @cduvallet!
I am happy to wrangle that! RE: point 1 - an issue in q2-types would be really helpful (it can cross-ref this issue for context).
That is the "hard way" approach - a more useful short-term solution could be a forum post or user doc resource that just includes the prose (like I can take point on writing up issues for the format description steps, if that sounds okay to you. Thanks! PS - I will start reviewing this PR shortly - so sorry this has been held up on our end for a while now, thank you for your patience! |
* master: DOC: longitudinal - added note about missing sample handling (qiime2#362) Add 16S processing overview tutorial (qiime2#345) DOC: Small typo fix in atacama tutorial (qiime2#361) MAINT: updating for 2018.11 (qiime2#356) q2-sample-classifier tutorial: link to API docs (qiime2#355) fixed overview tutorial flowchart (qiime2#352) HACK: drop ilr-phylogenetic from gneiss tutorial (qiime2#351) IMP: added feature-volatility to tutorial and fixed maturity-index tutorial(qiime2#350) BUG: hierarchy overwritten in gneiss tutorial (qiime2#349) DOC: Update gneiss tutorial (qiime2#346) MAINT: move maturity-index tutorial (qiime2#348) IMP: add 'upgrade by installing again' to native.rst (qiime2#347) MAINT: updated maturity tutorial to use pipeline (qiime2#340)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there @cduvallet! Made it halfway through this but have to head out the door. I love these changes, and am really excited for this! Sorry it took us so long to look at - things have been busy with the QIIME 2 manuscript plus our workshop travel schedule. Anyway, please stay tuned for part two some time on Monday. Have a great weekend!
|
||
Obtaining example data | ||
********************** | ||
``````````````````````` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the headers at this new level, the rendered text is a lot cleaner and easier to parse!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
After I offered to re-organize this tutorial, I realized that like 40% of my confusion was actually just being caused by the different header levels. Unfortunately rst format didn't let me make them all perfectly consistent (e.g. in sections with different numbers of sub-heading levels, I couldn't skip an unused level directly to a lower level), but hopefully this helps a little!
source/tutorials/importing.rst
Outdated
.. _`manifest file`: | ||
|
||
"Fastq manifest" formats | ||
~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
If you don't have either EMP or Casava format, you need to import your data into QIIME 2 manually. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about:
If you don't have either EMP or Casava format, you need to import your data into QIIME 2 manually. | |
If you don't have either EMP or Casava format, you need to import your data into QIIME 2 manually using a "manifest" format. |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, though I find the focus on calling "manifest" a format very confusing. Maybe because the "manifest" format has something called a manifest file, but the manifest file itself is not an actual format (it's a file). I can see where this terminology comes from if I try to explain it from a CS point of view, but as a computational biologist it was very confusing to see the same word used for a format and a file.
But keeping things consistent with the rest of the QIIME 2 docs and code is more important than my confusion, so that's just a "for what it's worth!" :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
though I find the focus on calling "manifest" a format very confusing
Oops, sorry, wasn't trying to emphasize that with my suggestion. I guess when I first read it I was thinking, "well, a new user might see this and think they aren't able to use QIIME 2 with their demuxed data," so my suggestion was coming from a place of clarifying that "yep, we can handle that."
I am okay with this being left off (as in, as it was before), I think on second read it is pretty darn clear that this is a statement within the context of the manifest formats section --- I will gladly go with whatever you decide to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, just made a change that should hopefully address both our concerns! :)
New text:
If you don't have either EMP or Casava format, you need to import your data into QIIME 2 manually by first creating a "manifest file" and then using the
qiime tools import
command with different specifications than in the EMP or Casava import commands.
Okay, will do that now!
🎉
That sounds great @thermokarst! When those are finished, it will be crucial to link to them in this tutorial too.
Great, no worries! The delay between updates is quite nice actually - gives me time to work on other things guilt-free ;) Since the deadline for PRs for next release is on Tuesday, I'm guessing that this will wait till the next release? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part two of two. Thanks @cduvallet!
The changes requested are pretty light (I think) - we should be able to get this in before the release this week, but I recognize you have plenty of other commitments outside of QIIME 2! Options moving forward:
- If you sign off on my proposed tweaks here, I am happy to implement them.
- You take point on changes - if they get in before the release, great, if not, we could actually deploy the new docs whenever the change lands (we just don't make a habit of it, since it isn't entirely painless, but I think the improvements to this tutorial are worth it).
Release is planned for this Thursday.
source/tutorials/importing.rst
Outdated
.. _`casava import`: | ||
|
||
Casava 1.8 single-end demultiplexed fastq | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
Unfortunately, there is currently no way to import data files which contain quality-filtered FASTA sequences associated with sample IDs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about the QIIME1DemuxFormat
? We have a tutorial that has a brief usage example: https://docs.qiime2.org/2018.8/tutorials/otu-clustering/
I could drop in a brief example of this format here in this section if that helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, so this format does exist already? Does this mean the issue (#197) I filed in q2-types isn't necessary?
If that's the case, then yes having you put a brief example here would be great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I think that issue makes sense to me, since we don't currently support importing per-sample demuxed FASTA. The QIIME1DemuxFormat
is a single-file demuxed format that uses a weird header syntax. It is great for QIIME 1 data, but I haven't really seen too many other tools that use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok, I see. Yeah, some way to specify these differences might be useful, and to give users at least one option to import FASTA sequences (even if it's a pain to format the sequences in this way...)
Ok, apart from the QIIME1DemuxFormat example, I think it should be good to go! |
This looks great @cduvallet! If my changes look okay to you please let me know, otherwise amend away! Thanks again! |
Looks good! Added one sentence clarifying that other FASTA formats aren't supported, so that people don't try to go down those rabbit holes. But other than that, looks good to me! 😄 |
Good catch, I totally forgot about that! Okay, time to ! Thanks so much @cduvallet! |
Okay, I re-ordered the sections, added a TOC, and clarified a few of the sections that have consistently confused me.
A couple of questions for y'all:
Apologies for any typos/oversights, pretty sleepy tonight! 😪 😴