-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
filtering feature tables tutorial addition #39
Conversation
Conflicts: source/tutorials/import-sequence-data.rst source/tutorials/import.rst source/tutorials/index.rst
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial pass through, I like the organization.
|
||
Both of these methods can also be applied to filter contingent on the maximum number of features or samples, using the ``--p-max-features`` and ``--p-max-samples`` parameters. | ||
|
||
Identifier-based filtering |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth trying to call this Index-based filtering
or is that too technical?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or Identity-based filtering
which sounds more natural to me for some reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went with Index-based, and indicated that this refers to identifiers. In QIIME 1 we usually refer to these as identifiers, but I think we should transition to this terminology (which will important for consistency when we have real index/metadata support).
Metadata-based filtering | ||
------------------------ | ||
|
||
Metadata-based filtering is similar to identifier-based filtering, except that the list of identifiers to keep is determined based on metadata rather than being provided by the user directly. This is achieved using the ``--m-sample-metadata-file`` or ``--m-feature-metadata-file`` parameter (for ``filter-samples`` or ``filter-features``, respectively) and the ``--p-where`` parameter. The user provides a description of the samples that should be retained based on their metadata using ``--p-where``, where the syntax for this description is the SQLite where-clause syntax. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This paragraph is a little confusing. Is it possible to introduce the --p-where
clause first? Reading it the first time it seemed like this was an augmentation of the identifier-filter, but then went on to redefine the same parameters in the same way (felt like a deja vu) causing me to assume that I had misread the previous section somehow.
This is achieved by providing a ``--p-where`` parameter in addition to a
``--m-sample-metadata-file``/``--m-feature-metadata-file`` (as described above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I think we should link out SQLite where-clause
to something like: https://www.tutorialspoint.com/sqlite/sqlite_where_clause.htm (a more canonical resource would be nice, but the SQLite homepage was a grammar definition, which isn't user-friendly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about the generic WHERE
entry on wikipedia? We aren't using any SQLite-only features, and there seems to be some nice examples of predicates here.
.. command-block:: | ||
qiime feature-table filter-samples --i-table table.qza --m-sample-metadata-file sample-metadata.tsv --p-where "Subject='subject-1'" --o-filtered-table filtered-table | ||
|
||
``--p-where`` expressions can be made more complex as follows. Here, the ``--p-where`` parameter is specifying that we want to retain only the samples whose ``Subject`` is ``subject-1`` *and* whose ``BodySite`` is ``gut`` in ``sample-metadata.tsv``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would emphasize the quotation marks as well here.
curl -sL "https://docs.google.com/spreadsheets/d/1_3ZbqCtAYx-9BJYHoWlICkVJ4W_QGMfJRPLedt_0hws/export?gid=0&format=tsv" > sample-metadata.tsv | ||
curl -sLO https://data.qiime2.org/2.0.6/tutorials/filtering-feature-tables/table.qza | ||
|
||
Frequency-based filtering |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be Total frequency-based filtering
? With just frequency I didn't really intuit what it was doing. Based on this wikipedia example it looks like we are filtering the "marginal totals". Is that vocabulary useful here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Went with Total-frequency-based.
Addressed all of your comments, thanks @ebolyen and @thermokarst! |
Fixes #21.