filtering feature tables tutorial addition #39

gregcaporaso · 2016-12-05T23:37:02Z

Fixes #21.

Conflicts: source/tutorials/import-sequence-data.rst source/tutorials/import.rst source/tutorials/index.rst

ebolyen

Initial pass through, I like the organization.

ebolyen · 2016-12-05T23:45:12Z

source/tutorials/table-filtering.rst

+
+Both of these methods can also be applied to filter contingent on the maximum number of features or samples, using the ``--p-max-features`` and ``--p-max-samples`` parameters.
+
+Identifier-based filtering


Is it worth trying to call this Index-based filtering or is that too technical?

Or Identity-based filtering which sounds more natural to me for some reason?

Went with Index-based, and indicated that this refers to identifiers. In QIIME 1 we usually refer to these as identifiers, but I think we should transition to this terminology (which will important for consistency when we have real index/metadata support).

ebolyen · 2016-12-05T23:55:16Z

source/tutorials/table-filtering.rst

+Metadata-based filtering
+------------------------
+
+Metadata-based filtering is similar to identifier-based filtering, except that the list of identifiers to keep is determined based on metadata rather than being provided by the user directly. This is achieved using the ``--m-sample-metadata-file`` or ``--m-feature-metadata-file`` parameter (for ``filter-samples`` or ``filter-features``, respectively) and the ``--p-where`` parameter. The user provides a description of the samples that should be retained based on their metadata using ``--p-where``, where the syntax for this description is the SQLite where-clause syntax.


This paragraph is a little confusing. Is it possible to introduce the --p-where clause first? Reading it the first time it seemed like this was an augmentation of the identifier-filter, but then went on to redefine the same parameters in the same way (felt like a deja vu) causing me to assume that I had misread the previous section somehow.

This is achieved by providing a ``--p-where`` parameter in addition to a ``--m-sample-metadata-file``/``--m-feature-metadata-file`` (as described above).

Also I think we should link out SQLite where-clause to something like: https://www.tutorialspoint.com/sqlite/sqlite_where_clause.htm (a more canonical resource would be nice, but the SQLite homepage was a grammar definition, which isn't user-friendly)

How about the generic WHERE entry on wikipedia? We aren't using any SQLite-only features, and there seems to be some nice examples of predicates here.

ebolyen · 2016-12-05T23:55:52Z

source/tutorials/table-filtering.rst

+.. command-block::
+    qiime feature-table filter-samples --i-table table.qza --m-sample-metadata-file sample-metadata.tsv --p-where "Subject='subject-1'" --o-filtered-table filtered-table
+
+``--p-where`` expressions can be made more complex as follows. Here, the ``--p-where`` parameter is specifying that we want to retain only the samples whose ``Subject`` is ``subject-1`` *and* whose ``BodySite`` is ``gut`` in ``sample-metadata.tsv``.


I would emphasize the quotation marks as well here.

ebolyen · 2016-12-05T23:58:39Z

source/tutorials/table-filtering.rst

+    curl -sL "https://docs.google.com/spreadsheets/d/1_3ZbqCtAYx-9BJYHoWlICkVJ4W_QGMfJRPLedt_0hws/export?gid=0&format=tsv" > sample-metadata.tsv
+    curl -sLO https://data.qiime2.org/2.0.6/tutorials/filtering-feature-tables/table.qza
+
+Frequency-based filtering


Should this be Total frequency-based filtering? With just frequency I didn't really intuit what it was doing. Based on this wikipedia example it looks like we are filtering the "marginal totals". Is that vocabulary useful here?

👍 Went with Total-frequency-based.

gregcaporaso · 2016-12-06T16:02:59Z

Addressed all of your comments, thanks @ebolyen and @thermokarst!

gregcaporaso added 5 commits November 30, 2016 15:21

adds importing sequence data tutorial

6a078fd

fixes qiime2#33

temp commit

d0cea72

temp commit

00c9ef4

Merge branch 'master' of github.com:qiime2/docs into issue-21

f8e9f87

Conflicts: source/tutorials/import-sequence-data.rst source/tutorials/import.rst source/tutorials/index.rst

DOC: adds filtering tutorial, fixes qiime2#21

311194c

gregcaporaso assigned ebolyen Dec 5, 2016

ebolyen reviewed Dec 6, 2016

View reviewed changes

addressed comments from @ebolyen and @thermokarst

116447d

ebolyen merged commit a767bcd into qiime2:master Dec 6, 2016

thermokarst mentioned this pull request Jan 6, 2017

where clause URL appears to yield a 404 qiime2/q2-feature-table#61

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

filtering feature tables tutorial addition #39

filtering feature tables tutorial addition #39

gregcaporaso commented Dec 5, 2016

ebolyen left a comment

ebolyen Dec 5, 2016

ebolyen Dec 5, 2016

gregcaporaso Dec 6, 2016

ebolyen Dec 5, 2016

ebolyen Dec 6, 2016

thermokarst Dec 6, 2016

ebolyen Dec 5, 2016

ebolyen Dec 5, 2016

gregcaporaso Dec 6, 2016

gregcaporaso commented Dec 6, 2016


		Both of these methods can also be applied to filter contingent on the maximum number of features or samples, using the ``--p-max-features`` and ``--p-max-samples`` parameters.

		Identifier-based filtering

filtering feature tables tutorial addition #39

filtering feature tables tutorial addition #39

Conversation

gregcaporaso commented Dec 5, 2016

ebolyen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gregcaporaso commented Dec 6, 2016