Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganize and clarify import tutorial #358

Merged
merged 12 commits into from
Nov 7, 2018
Merged

Conversation

cduvallet
Copy link
Contributor

Okay, I re-ordered the sections, added a TOC, and clarified a few of the sections that have consistently confused me.

A couple of questions for y'all:

  1. is there a way to import fasta files? i.e. not representative sequences but just quality-filtered fasta files, somehow associated with their sample IDs... this is obviously not ideal, but I find that many data sharers tend to only share fasta files so they're still more common than we'd like...
  2. what happens if you have a rooted phylogenetic tree? can you import that?

Apologies for any typos/oversights, pretty sleepy tonight! 😪 😴

@ebolyen
Copy link
Member

ebolyen commented Oct 15, 2018

Sorry, I've been super behind on notifications and @thermokarst is on vacation!

Re 1: I agree, this seems to happen fairly often. Really we just need someone to write a transformer or two to convert between demux FASTA and the single-file (but still demux) seqs.fna from QIIME 1. Until then, that isn't supported yet.

Re 2: Definitely, just change the type. I think that's something that our documentation could be much clearer about (apologies if you've addressed this already, I haven't actually looked yet). The commands are very much a black box right now, but if we explained what the format and type meant, then you could probably extrapolate some of the other imports without needing to enumerate all possible combinations.

@cduvallet
Copy link
Contributor Author

Okay, I will make a note that #1 is not supported (yet).

Re 2: this makes me think that I should also add a section that's like "And more!", giving some examples of how people can figure out what other data types they can import. tbh, reading this tutorial makes it seem like these are the only file types you can import, and together with a lack of documentation for qiime tools import I've never actually looked beyond it.

I'll draft that bit later this week or next, but it'll probably just be a short sentence about using --show-importable-formats/types to dig around. Is there anywhere in the qiime2 docs where these are listed/described more thoroughly that I should also be pointing user to?

@ebolyen
Copy link
Member

ebolyen commented Oct 15, 2018

You're a hero @cduvallet!

I'm afraid not within the user docs, you'll be among the first to write about it. There is the too technical dev docs which you could use as a reference for why things are the way they are, but it also doesn't have any practical explanation of how an end-user would use this.

@cduvallet
Copy link
Contributor Author

Ok, I made those edits and it should be ready to incorporate! (Though github is telling me there are conflicts that must be resolved... Do I need to do that or do y'all take care of it?)

That said, I think that these two issues we brought up are pretty big ones for QIIME 2 - what should I do to make sure that they're documented somewhere? Happy to open github issues in the right places, happy to let y'all manage that! To recap, they are:

  1. There needs to be a way to import non-representative quality-filtered FASTA files which are somehow associated with their sample IDs.

  2. The QIIME 2 docs really need an explanation of what each data type and data formats are (even just one sentence describing each one would make the world of difference!)

@thermokarst thermokarst self-assigned this Nov 1, 2018
@thermokarst thermokarst self-requested a review November 1, 2018 21:22
@thermokarst
Copy link
Contributor

Thanks so much @cduvallet!

(Though github is telling me there are conflicts that must be resolved... Do I need to do that or do y'all take care of it?)

I am happy to wrangle that!

RE: point 1 - an issue in q2-types would be really helpful (it can cross-ref this issue for context).
RE: point 2 - we are with you 110%! I don't think we have any good issues floating out there, but here are the missing pieces that I see:

  1. The framework's base format definitions will need to be updated to support some kind of help or description text.
  2. Prose will need to be written for all of the formats. Most of them are defined in q2-types.
  3. Interfaces and docs will need to be made aware of this new format attribute so that they can be rendered out.

That is the "hard way" approach - a more useful short-term solution could be a forum post or user doc resource that just includes the prose (like FormatXYZ - this is for blah blah blah, and looks like the following example: XYZ). Maybe that could be the initial target - that would allow us to port the prose to the formats whenever the machinery is in place...

I can take point on writing up issues for the format description steps, if that sounds okay to you.

Thanks!

PS - I will start reviewing this PR shortly - so sorry this has been held up on our end for a while now, thank you for your patience!

* master:
  DOC: longitudinal - added note about missing sample handling (qiime2#362)
  Add 16S processing overview tutorial (qiime2#345)
  DOC: Small typo fix in atacama tutorial (qiime2#361)
  MAINT: updating for 2018.11 (qiime2#356)
  q2-sample-classifier tutorial: link to API docs (qiime2#355)
  fixed overview tutorial flowchart (qiime2#352)
  HACK: drop ilr-phylogenetic from gneiss tutorial (qiime2#351)
  IMP: added feature-volatility to tutorial and fixed maturity-index tutorial(qiime2#350)
  BUG: hierarchy overwritten in gneiss tutorial (qiime2#349)
  DOC: Update gneiss tutorial (qiime2#346)
  MAINT: move maturity-index tutorial (qiime2#348)
  IMP: add 'upgrade by installing again' to native.rst (qiime2#347)
  MAINT: updated maturity tutorial to use pipeline (qiime2#340)
Copy link
Contributor

@thermokarst thermokarst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there @cduvallet! Made it halfway through this but have to head out the door. I love these changes, and am really excited for this! Sorry it took us so long to look at - things have been busy with the QIIME 2 manuscript plus our workshop travel schedule. Anyway, please stay tuned for part two some time on Monday. Have a great weekend!

source/tutorials/importing.rst Outdated Show resolved Hide resolved
source/tutorials/importing.rst Outdated Show resolved Hide resolved
source/tutorials/importing.rst Outdated Show resolved Hide resolved

Obtaining example data
**********************
```````````````````````
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the headers at this new level, the rendered text is a lot cleaner and easier to parse!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

After I offered to re-organize this tutorial, I realized that like 40% of my confusion was actually just being caused by the different header levels. Unfortunately rst format didn't let me make them all perfectly consistent (e.g. in sections with different numbers of sub-heading levels, I couldn't skip an unused level directly to a lower level), but hopefully this helps a little!

source/tutorials/importing.rst Outdated Show resolved Hide resolved
.. _`manifest file`:

"Fastq manifest" formats
~~~~~~~~~~~~~~~~~~~~~~~~

If you don't have either EMP or Casava format, you need to import your data into QIIME 2 manually.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

Suggested change
If you don't have either EMP or Casava format, you need to import your data into QIIME 2 manually.
If you don't have either EMP or Casava format, you need to import your data into QIIME 2 manually using a "manifest" format.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, though I find the focus on calling "manifest" a format very confusing. Maybe because the "manifest" format has something called a manifest file, but the manifest file itself is not an actual format (it's a file). I can see where this terminology comes from if I try to explain it from a CS point of view, but as a computational biologist it was very confusing to see the same word used for a format and a file.

But keeping things consistent with the rest of the QIIME 2 docs and code is more important than my confusion, so that's just a "for what it's worth!" :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though I find the focus on calling "manifest" a format very confusing

Oops, sorry, wasn't trying to emphasize that with my suggestion. I guess when I first read it I was thinking, "well, a new user might see this and think they aren't able to use QIIME 2 with their demuxed data," so my suggestion was coming from a place of clarifying that "yep, we can handle that."

I am okay with this being left off (as in, as it was before), I think on second read it is pretty darn clear that this is a statement within the context of the manifest formats section --- I will gladly go with whatever you decide to do.

Copy link
Contributor Author

@cduvallet cduvallet Nov 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, just made a change that should hopefully address both our concerns! :)

New text:

If you don't have either EMP or Casava format, you need to import your data into QIIME 2 manually by first creating a "manifest file" and then using the qiime tools import command with different specifications than in the EMP or Casava import commands.

source/tutorials/importing.rst Outdated Show resolved Hide resolved
source/tutorials/importing.rst Outdated Show resolved Hide resolved
source/tutorials/importing.rst Show resolved Hide resolved
@cduvallet
Copy link
Contributor Author

RE: point 1 - an issue in q2-types would be really helpful (it can cross-ref this issue for context).

Okay, will do that now!

RE: point 2 - we are with you 110%!

🎉

Maybe that could be the initial target - that would allow us to port the prose to the formats whenever the machinery is in place... I can take point on writing up issues for the format description steps, if that sounds okay to you.

That sounds great @thermokarst! When those are finished, it will be crucial to link to them in this tutorial too.

Hey there @cduvallet! Made it halfway through this but have to head out the door. I love these changes, and am really excited for this! Sorry it took us so long to look at - things have been busy with the QIIME 2 manuscript plus our workshop travel schedule. Anyway, please stay tuned for part two some time on Monday. Have a great weekend!

Great, no worries! The delay between updates is quite nice actually - gives me time to work on other things guilt-free ;) Since the deadline for PRs for next release is on Tuesday, I'm guessing that this will wait till the next release?

Copy link
Contributor

@thermokarst thermokarst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part two of two. Thanks @cduvallet!

The changes requested are pretty light (I think) - we should be able to get this in before the release this week, but I recognize you have plenty of other commitments outside of QIIME 2! Options moving forward:

  1. If you sign off on my proposed tweaks here, I am happy to implement them.
  2. You take point on changes - if they get in before the release, great, if not, we could actually deploy the new docs whenever the change lands (we just don't make a habit of it, since it isn't entirely painless, but I think the improvements to this tutorial are worth it).

Release is planned for this Thursday.

.. _`casava import`:

Casava 1.8 single-end demultiplexed fastq
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Unfortunately, there is currently no way to import data files which contain quality-filtered FASTA sequences associated with sample IDs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the QIIME1DemuxFormat? We have a tutorial that has a brief usage example: https://docs.qiime2.org/2018.8/tutorials/otu-clustering/

I could drop in a brief example of this format here in this section if that helps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, so this format does exist already? Does this mean the issue (#197) I filed in q2-types isn't necessary?

If that's the case, then yes having you put a brief example here would be great!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think that issue makes sense to me, since we don't currently support importing per-sample demuxed FASTA. The QIIME1DemuxFormat is a single-file demuxed format that uses a weird header syntax. It is great for QIIME 1 data, but I haven't really seen too many other tools that use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok, I see. Yeah, some way to specify these differences might be useful, and to give users at least one option to import FASTA sequences (even if it's a pain to format the sequences in this way...)

source/tutorials/importing.rst Show resolved Hide resolved
source/tutorials/importing.rst Outdated Show resolved Hide resolved
source/tutorials/importing.rst Outdated Show resolved Hide resolved
source/tutorials/importing.rst Outdated Show resolved Hide resolved
source/tutorials/importing.rst Outdated Show resolved Hide resolved
source/tutorials/importing.rst Outdated Show resolved Hide resolved
source/tutorials/importing.rst Outdated Show resolved Hide resolved
source/tutorials/importing.rst Outdated Show resolved Hide resolved
@thermokarst thermokarst assigned cduvallet and unassigned thermokarst Nov 5, 2018
@cduvallet
Copy link
Contributor Author

Ok, apart from the QIIME1DemuxFormat example, I think it should be good to go!

@thermokarst thermokarst assigned thermokarst and unassigned cduvallet Nov 6, 2018
@thermokarst
Copy link
Contributor

This looks great @cduvallet! If my changes look okay to you please let me know, otherwise amend away! Thanks again!

@thermokarst thermokarst assigned cduvallet and unassigned thermokarst Nov 6, 2018
@cduvallet
Copy link
Contributor Author

Looks good! Added one sentence clarifying that other FASTA formats aren't supported, so that people don't try to go down those rabbit holes.

But other than that, looks good to me! 😄

@thermokarst
Copy link
Contributor

Good catch, I totally forgot about that! Okay, time to :shipit:! Thanks so much @cduvallet!

@thermokarst thermokarst merged commit bdccea6 into qiime2:master Nov 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants