Exposing unifrac.meta #292

wasade · 2020-08-20T17:45:51Z

This pull request is dependent on qiime2/q2-diversity-lib#26.

The known TODO items are:

adjust plugin_setup text to better reflect the method (currently effectively a copy/paste of beta_phylogenetic)
add support for meta specific arguments
augment testing to run through meta

wasade · 2020-11-10T00:58:15Z

Brief note, I'm refactoring some of the beta_phylogenetic tests so they can also use this method. Hoping to push next commit tomorrow

wasade · 2020-11-16T22:03:46Z

Expecting to push a commit or two today

wasade · 2020-11-16T23:52:31Z

Specifying an alpha for generalized unifrac, to use with meta, is not currently supported due to biocore/unifrac#117. This is at the moment reflected in the test code, though it may be necessary to do something in the library.

This is a bug but I'm not sure if I can do a minor unifrac release quickly enough for this. Possibly... the bug is simple.

wasade · 2020-11-17T17:36:22Z

Just had another reasonable item pop up on unifrac for some performance related commits that aren't in conda, so will try to get that release out today

wasade · 2020-11-17T20:51:24Z

@thermokarst, I think the build is not getting qiime2/q2-diversity-lib#26? I may be looking at the failures incorrectly though

thermokarst · 2020-11-17T23:09:33Z

@thermokarst, I think the build is not getting qiime2/q2-diversity-lib#26?

That is correct! BW is churning away now, hopefully we'll have new env files in the next few hrs. Worth noting, DockerHub's new rate limiting scheme is putting a damper on BW right now, so we might have some CI downtime while that is addressed. 🤞

wasade · 2020-11-17T23:20:28Z

Rate limits were meant to be broken...? 🚀

thermokarst · 2020-11-19T21:15:00Z

q2_diversity/plugin_setup.py

+    parameters={'metric': Str % Choices(beta.METRICS['PHYLO']['IMPL'] |
+                                        beta.METRICS['PHYLO']['UNIMPL']),


The current implementation of beta_phylogenetic_meta in this PR doesn't actually have any discretely implement individual metrics (for example), if that is the intent, please remove the IMPL metrics:

Suggested change

parameters={'metric': Str % Choices(beta.METRICS['PHYLO']['IMPL'] |

beta.METRICS['PHYLO']['UNIMPL']),

parameters={'metric': Str % Choices(beta.METRICS['PHYLO']['UNIMPL']),

Not fully sure I follow. Unweighted, weighted, variance adjusted and generalized should be suitable for meta unifrac, and all of these are implemented in unifrac? I may be misunderstanding something here

The reason we set up q2-diversity-lib is so that we can implement per-metric-qiime2-methods (like what I linked to above). This is great for provenance etc. The IMPL metrics are the list of metrics that have their own q2-diversity-lib "methods". The UNIMPL are metrics that don't have their own individual div-lib methods. If you peek at beta you'll see this illustrated more clearly:

q2-diversity/q2_diversity/_beta/_pipeline.py

Lines 45 to 52 in 60de8d5

if metric in METRICS['NONPHYLO']['IMPL']:

metric = METRICS['NAME_TRANSLATIONS'][metric]

action = ctx.get_action('diversity_lib', metric)

dm, = action(table=table, n_jobs=n_jobs)

else:

action = ctx.get_action('diversity_lib', 'beta_passthrough')

dm, = action(table=table, metric=metric, pseudocount=pseudocount,

n_jobs=n_jobs)

thermokarst · 2020-11-19T21:17:18Z

q2_diversity/plugin_setup.py

+                         'aggregated'
+    },
+    output_descriptions={'distance_matrix': 'The resulting distance matrix.'},
+    name='Beta diversity (phylogenetic)',


Should we say something about the "meta"ness? Otherwise this is identical to beta-phylogenetic's description:

👍 good point :)

q2_diversity/tests/test_beta.py

ChrisKeefe · 2020-11-23T18:31:33Z

Super excited to see this getting added. Just wondering whether there's a reason we went with table and phylogeny instead of tables and phylogenies if users are likely to pass Lists of them. Same question applies to qiime2/q2-diversity-lib#26

ebolyen

Looks good to me, but I think the tests can be cleaned up pretty easily, just some minor changes there.

ebolyen · 2020-11-23T18:32:06Z

q2_diversity/plugin_setup.py

+    function=q2_diversity.beta_phylogenetic_meta,
+    inputs={'table': List[FeatureTable[Frequency |
+                          RelativeFrequency |
+                          PresenceAbsence]],


Quick question, can the list be polymorphic as described, or should it be monomorphic like List[FeatureTable[Frequency] | FeatureTable[RelativeFrequency] | FeatureTable[PresenceAbsence]] ?

It seems like this would depend on if it was weighted or not? We could constrain the input with a typemap if so:

In [1]: from qiime2.plugin import TypeMap, Str, Choices, Visualization, List In [2]: from q2_types.feature_table import FeatureTable, Frequency, RelativeFrequency, PresenceAbsence In [3]: T_table, P_metric, _ = TypeMap({ ...: (List[FeatureTable[Frequency | RelativeFrequency | PresenceAbsence]], ...: Str % Choices("unweighted_unifrac")): Visualization, # a convention for no-op w.r.t. output ...: (List[FeatureTable[Frequency] | FeatureTable[RelativeFrequency]], ...: Str % Choices('weighted_unifrac', 'weighted_normalized_unifrac', 'generalized_unifrac')): Visualization ...: }) In [4]: T_table Out[4]: List[FeatureTable[Frequency | RelativeFrequency | PresenceAbsence]]¹ | List[FeatureTable[Frequency] | FeatureTable[RelativeFrequency]]² In [5]: P_metric Out[5]: Str % Choices('unweighted_unifrac')¹ | Str % Choices('weighted_unifrac', 'weighted_normalized_unifrac', 'generalized_unifrac')²

Although... to be honest I kind of hate what that type will look like in our interfaces, so I am good ignoring this...

Turns out I needed coffee, I think the only constraint here is whether P/A can be used in combination with the others, which means this would probably be fine:

In [10]: T_table, P_metric, _ = TypeMap({ ...: (List[FeatureTable[Frequency | RelativeFrequency | PresenceAbsence]], ...: Str % Choices("unweighted_unifrac")): Visualization, # a convention for no-op w.r.t. output ...: (List[FeatureTable[Frequency | RelativeFrequency]], ...: Str % Choices('weighted_unifrac', 'weighted_normalized_unifrac', 'generalized_unifrac')): Visualization ...: }) In [11]: T_table Out[11]: List[FeatureTable[Frequency | RelativeFrequency | PresenceAbsence]]¹ | List[FeatureTable[Frequency | RelativeFrequency]]² In [12]: P_metric Out[12]: Str % Choices('unweighted_unifrac')¹ | Str % Choices('weighted_unifrac', 'weighted_normalized_unifrac', 'generalized_unifrac')²

It is less hideous...

Do you mean whether to allow a user to specify --i-table a_frequency_table.qza --i-table a_presenceabsence_table.qza ...? I don't believe there is enough data to guide whether this should or should not be restricted like this

Quick question, can the list be polymorphic as described, or should it be monomorphic like List[FeatureTable[Frequency] | FeatureTable[RelativeFrequency] | FeatureTable[PresenceAbsence]] ?

A quick note on this - only one of these are actually possible here, because the diversity-lib beta-phylogenetic-meta-passthrough only works with FeatureTable[Frequency].

Okay, though we may want to consider relaxing the underlying library

Filed an issue here: qiime2/q2-diversity-lib#30

ebolyen · 2020-11-23T18:33:34Z

q2_diversity/tests/test_beta.py

+                for id1 in actual.ids:
+                    for id2 in actual.ids:
+                        npt.assert_almost_equal(actual[id1, id2],
+                                                expected[id1, id2])


Could we refactor this into a test helper?

Sure. Note that most of this is not new, just indented, but it could definitely benefit from some refactor. Will clean up

ebolyen · 2020-11-23T18:33:51Z

q2_diversity/tests/test_beta.py

+                expected = skbio.DistanceMatrix([[0.00, 0.25, 0.25],
+                                                 [0.25, 0.00, 0.00],
+                                                 [0.25, 0.00, 0.00]],
+                                                ids=['S1', 'S2', 'S3'])


This can probably sit outside the subtest loop, since it never changes.

ebolyen · 2020-11-23T18:34:27Z

q2_diversity/tests/test_beta.py

+                ids = ('10084.PC.481', '10084.PC.593', '10084.PC.356',
+                       '10084.PC.355', '10084.PC.354', '10084.PC.636',
+                       '10084.PC.635', '10084.PC.607', '10084.PC.634')
+                expected = skbio.DistanceMatrix(data, ids=ids)


Same here, can be moved outside the loop

ebolyen · 2020-11-23T18:34:48Z

q2_diversity/tests/test_beta.py

+                for id1 in actual.ids:
+                    for id2 in actual.ids:
+                        npt.assert_almost_equal(actual[id1, id2],
+                                                expected[id1, id2])


ditto: test helper

ebolyen · 2020-11-23T18:36:31Z

I would also second @ChrisKeefe's point on plural for the inputs

wasade · 2020-11-23T18:40:52Z

For the inputs, the user is entering --i-table <path> --i-table <path> ... as I don't think there is a good way to describe multiple entries for a single argument right now. I think table makes sense here as other actions which take, for example, multiple metadata files I don't believe rely on the plural form

thermokarst · 2020-11-23T18:44:26Z

I agree with @ChrisKeefe & @ebolyen, I think tables and phylogenies would be more clear (and if we update, we should fix that over in q2-div-lib before the release ships).

--i-tables table1.qza table2.qza table3.qza and --i-phylogenies tree1.qza tree2.qza tree3.qza looks pretty nice to me.

ebolyen · 2020-11-23T18:44:35Z

You can actually do this now:

--i-tables <path1> <path2> <path3>

(I got tired of not being able to do data/* as input...)

Other list-likes use plural as well.

ebolyen · 2020-11-23T18:44:47Z

@thermokarst jinx!

wasade · 2020-11-23T18:46:52Z

Oh, nice!!! I think that's new from when we discussed a few months back? I agree much cleaner, and agree with using the plural form

thermokarst · 2020-11-23T18:47:24Z

if users are likely to pass Lists of them

I think that that is the point of the method, right @wasade? If that's the case, do we need to validate the length? Does it make sense to run only on a single table/tree pair?

thermokarst · 2020-11-23T19:06:14Z

I think that's new from when we discussed a few months back?

@wasade, this functionality has been around since mid-2019. I think what we were discussing a few months back was a syntax for making these pairs full-fledged tuples, explicitly saying "this table belongs with this tree."

wasade · 2020-11-23T19:07:39Z

@thermokarst, ah okay. I thought what I put in here was what was the recommendation from that chat :) but I must have misread. The "most correct" thing would be to validate that each tree/table in a common index position correspond to each other or if QIIME2 was able to ensure the paired relationships. Operating on a single tree/table is fine and is identifical to using regular unifrac

thermokarst · 2020-11-23T21:05:07Z

Discussed out of band with @wasade & @ebolyen, going to postpone this q2-diversity integration until a future release. In the meantime, the new unifac.meta functionality can be found in q2-diversity-lib (and will be published in the 2020.11 core distribution).

Some remaining questions to solve: do we expose this meta functionality in its own pipeline, do we expose it via the existing beta-phylogenetic pipeline, or do we only leave it exposed through q2-diversity-lib?

lizgehret · 2022-09-15T17:18:03Z

Hey @ebolyen circling back here - should we revisit this?

ebolyen · 2022-09-21T22:28:25Z

This one's definitely quite old. @wasade do you or your team have any preferences on the matter?

Rereading the thread, it seems like this is all already implemented in q2-div-lib, so this PR is mostly just copying it into the "diversity" namespace so to speak.

I'm inclined to close it, since it hasn't really come up since and it may make more sense to modify beta-phylogenetic at some point instead. Please feel free to re-open at any point :)

wasade · 2022-09-21T22:35:19Z

It would be very exciting for meta to be available to the user community. Is it now possible for people to use it? If not, then this should be reopened and we should work toward merge.

ebolyen · 2022-09-22T04:43:10Z

Yep! It's already available in the default install as qiime diversity-lib beta-phylogenetic-meta-passthrough. It's perhaps not ideally named, but it's definitely been around for a while now.

If we'd like it in this particular diversity plugin it may make more sense to put it in beta-phylogenetic outright, since that's probably what a user would expect. Although the inputs might get tricky to explain.

wasade · 2022-09-22T16:06:50Z

Is that easy to do?

ebolyen · 2022-09-22T17:01:06Z

My hunch is no, because we would need to account for the different input type signatures here, list vs singular.

I know we can't take a union of List[X] | X so either that changes, or typemap accidentally does let this work. But then we'd have to double check that the interfaces can tolerate that.

Alternatively, we make the breaking change for beta-phylogenetic to the input parameters and update various call-sites like core-metrics-phylogenetic.

What if we rename it in diversity-lib so that it doesn't have "passthrough", which isn't particularly illuminating for end-users? That would be fast and a breaking change that is unlikely to rile anyone up.

wasade · 2022-09-22T17:02:49Z

I'm in favor of easy :) We have students actively exploring this method right now directly via the unifrac API, and its historical precedence is impressive, so simplifying use is I think a bit + for the user base

ebolyen · 2022-09-22T17:06:30Z

Would one of them be able to create a PR to update the name (dropping 'passthrough') and fix this issue: qiime2/q2-diversity-lib#30?

This could make a good first-PR for someone.

wasade · 2022-09-22T17:13:17Z

@ahdilmore, would you be interested in sorting this out? Basically, this would be to help make UniFrac's meta method more readily accessible via QIIME 2

wasade added 2 commits August 20, 2020 10:43

WIP: exposing unifrac.meta

6dc0cb6

Remove print

7856db6

wasade added 2 commits November 5, 2020 17:14

Merge branch 'master' of github.com:qiime2/q2-diversity into meta

c1b6cfd

Close out WIP items

d3335b1

thermokarst assigned wasade Nov 16, 2020

TST: tsts for beta_phylogenetic_meta

e630a45

wasade changed the title ~~WIP: exposing unifrac.meta~~ Exposing unifrac.meta Nov 17, 2020

thermokarst assigned thermokarst and unassigned wasade Nov 19, 2020

thermokarst self-requested a review November 19, 2020 19:38

thermokarst suggested changes Nov 19, 2020

View reviewed changes

thermokarst assigned wasade and unassigned thermokarst Nov 19, 2020

Addressing @thermokarst's comments

3677904

thermokarst assigned thermokarst and unassigned wasade Nov 20, 2020

thermokarst removed their assignment Nov 23, 2020

ebolyen self-assigned this Nov 23, 2020

ebolyen requested changes Nov 23, 2020

View reviewed changes

thermokarst mentioned this pull request Nov 23, 2020

beta-phylogenetic-meta-passthrough: work with relative frequency and presence/absence table qiime2/q2-diversity-lib#30

Open

thermokarst assigned wasade and unassigned ebolyen Dec 16, 2020

ebolyen closed this Sep 21, 2022

		parameters={'metric': Str % Choices(beta.METRICS['PHYLO']['IMPL'] \|
		beta.METRICS['PHYLO']['UNIMPL']),

	if metric in METRICS['NONPHYLO']['IMPL']:
	metric = METRICS['NAME_TRANSLATIONS'][metric]
	action = ctx.get_action('diversity_lib', metric)
	dm, = action(table=table, n_jobs=n_jobs)
	else:
	action = ctx.get_action('diversity_lib', 'beta_passthrough')
	dm, = action(table=table, metric=metric, pseudocount=pseudocount,
	n_jobs=n_jobs)

Exposing unifrac.meta #292

Exposing unifrac.meta #292

Conversation

wasade commented Aug 20, 2020 • edited Loading

wasade commented Nov 10, 2020

wasade commented Nov 16, 2020

wasade commented Nov 16, 2020

wasade commented Nov 17, 2020

wasade commented Nov 17, 2020

thermokarst commented Nov 17, 2020

wasade commented Nov 17, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChrisKeefe commented Nov 23, 2020

ebolyen left a comment

Choose a reason for hiding this comment

ebolyen Nov 23, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ebolyen commented Nov 23, 2020

wasade commented Nov 23, 2020

thermokarst commented Nov 23, 2020

ebolyen commented Nov 23, 2020

ebolyen commented Nov 23, 2020

wasade commented Nov 23, 2020

thermokarst commented Nov 23, 2020

thermokarst commented Nov 23, 2020

wasade commented Nov 23, 2020

thermokarst commented Nov 23, 2020

lizgehret commented Sep 15, 2022

ebolyen commented Sep 21, 2022

wasade commented Sep 21, 2022

ebolyen commented Sep 22, 2022

wasade commented Sep 22, 2022

ebolyen commented Sep 22, 2022

wasade commented Sep 22, 2022

ebolyen commented Sep 22, 2022

wasade commented Sep 22, 2022

wasade commented Aug 20, 2020 •

edited

Loading

ebolyen Nov 23, 2020 •

edited

Loading