-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exposing unifrac.meta #292
Conversation
Brief note, I'm refactoring some of the |
Expecting to push a commit or two today |
Specifying an alpha for generalized unifrac, to use with This is a bug but I'm not sure if I can do a minor unifrac release quickly enough for this. Possibly... the bug is simple. |
Just had another reasonable item pop up on unifrac for some performance related commits that aren't in conda, so will try to get that release out today |
@thermokarst, I think the build is not getting qiime2/q2-diversity-lib#26? I may be looking at the failures incorrectly though |
That is correct! BW is churning away now, hopefully we'll have new env files in the next few hrs. Worth noting, DockerHub's new rate limiting scheme is putting a damper on BW right now, so we might have some CI downtime while that is addressed. 🤞 |
Rate limits were meant to be broken...? 🚀 |
q2_diversity/plugin_setup.py
Outdated
parameters={'metric': Str % Choices(beta.METRICS['PHYLO']['IMPL'] | | ||
beta.METRICS['PHYLO']['UNIMPL']), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation of beta_phylogenetic_meta
in this PR doesn't actually have any discretely implement individual metrics (for example), if that is the intent, please remove the IMPL
metrics:
parameters={'metric': Str % Choices(beta.METRICS['PHYLO']['IMPL'] | | |
beta.METRICS['PHYLO']['UNIMPL']), | |
parameters={'metric': Str % Choices(beta.METRICS['PHYLO']['UNIMPL']), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not fully sure I follow. Unweighted, weighted, variance adjusted and generalized should be suitable for meta unifrac, and all of these are implemented in unifrac
? I may be misunderstanding something here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason we set up q2-diversity-lib is so that we can implement per-metric-qiime2-methods (like what I linked to above). This is great for provenance etc. The IMPL
metrics are the list of metrics that have their own q2-diversity-lib "methods". The UNIMPL
are metrics that don't have their own individual div-lib methods. If you peek at beta
you'll see this illustrated more clearly:
q2-diversity/q2_diversity/_beta/_pipeline.py
Lines 45 to 52 in 60de8d5
if metric in METRICS['NONPHYLO']['IMPL']: | |
metric = METRICS['NAME_TRANSLATIONS'][metric] | |
action = ctx.get_action('diversity_lib', metric) | |
dm, = action(table=table, n_jobs=n_jobs) | |
else: | |
action = ctx.get_action('diversity_lib', 'beta_passthrough') | |
dm, = action(table=table, metric=metric, pseudocount=pseudocount, | |
n_jobs=n_jobs) |
q2_diversity/plugin_setup.py
Outdated
'aggregated' | ||
}, | ||
output_descriptions={'distance_matrix': 'The resulting distance matrix.'}, | ||
name='Beta diversity (phylogenetic)', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 good point :)
Super excited to see this getting added. Just wondering whether there's a reason we went with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, but I think the tests can be cleaned up pretty easily, just some minor changes there.
function=q2_diversity.beta_phylogenetic_meta, | ||
inputs={'table': List[FeatureTable[Frequency | | ||
RelativeFrequency | | ||
PresenceAbsence]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick question, can the list be polymorphic as described, or should it be monomorphic like List[FeatureTable[Frequency] | FeatureTable[RelativeFrequency] | FeatureTable[PresenceAbsence]]
?
It seems like this would depend on if it was weighted or not? We could constrain the input with a typemap if so:
In [1]: from qiime2.plugin import TypeMap, Str, Choices, Visualization, List
In [2]: from q2_types.feature_table import FeatureTable, Frequency, RelativeFrequency, PresenceAbsence
In [3]: T_table, P_metric, _ = TypeMap({
...: (List[FeatureTable[Frequency | RelativeFrequency | PresenceAbsence]],
...: Str % Choices("unweighted_unifrac")): Visualization, # a convention for no-op w.r.t. output
...: (List[FeatureTable[Frequency] | FeatureTable[RelativeFrequency]],
...: Str % Choices('weighted_unifrac', 'weighted_normalized_unifrac', 'generalized_unifrac')): Visualization
...: })
In [4]: T_table
Out[4]: List[FeatureTable[Frequency | RelativeFrequency | PresenceAbsence]]¹ | List[FeatureTable[Frequency] | FeatureTable[RelativeFrequency]]²
In [5]: P_metric
Out[5]: Str % Choices('unweighted_unifrac')¹ | Str % Choices('weighted_unifrac', 'weighted_normalized_unifrac', 'generalized_unifrac')²
Although... to be honest I kind of hate what that type will look like in our interfaces, so I am good ignoring this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out I needed coffee, I think the only constraint here is whether P/A can be used in combination with the others, which means this would probably be fine:
In [10]: T_table, P_metric, _ = TypeMap({
...: (List[FeatureTable[Frequency | RelativeFrequency | PresenceAbsence]],
...: Str % Choices("unweighted_unifrac")): Visualization, # a convention for no-op w.r.t. output
...: (List[FeatureTable[Frequency | RelativeFrequency]],
...: Str % Choices('weighted_unifrac', 'weighted_normalized_unifrac', 'generalized_unifrac')): Visualization
...: })
In [11]: T_table
Out[11]: List[FeatureTable[Frequency | RelativeFrequency | PresenceAbsence]]¹ | List[FeatureTable[Frequency | RelativeFrequency]]²
In [12]: P_metric
Out[12]: Str % Choices('unweighted_unifrac')¹ | Str % Choices('weighted_unifrac', 'weighted_normalized_unifrac', 'generalized_unifrac')²
It is less hideous...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean whether to allow a user to specify --i-table a_frequency_table.qza --i-table a_presenceabsence_table.qza ...
? I don't believe there is enough data to guide whether this should or should not be restricted like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick question, can the list be polymorphic as described, or should it be monomorphic like List[FeatureTable[Frequency] | FeatureTable[RelativeFrequency] | FeatureTable[PresenceAbsence]] ?
A quick note on this - only one of these are actually possible here, because the diversity-lib beta-phylogenetic-meta-passthrough
only works with FeatureTable[Frequency]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, though we may want to consider relaxing the underlying library
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed an issue here: qiime2/q2-diversity-lib#30
for id1 in actual.ids: | ||
for id2 in actual.ids: | ||
npt.assert_almost_equal(actual[id1, id2], | ||
expected[id1, id2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we refactor this into a test helper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Note that most of this is not new, just indented, but it could definitely benefit from some refactor. Will clean up
expected = skbio.DistanceMatrix([[0.00, 0.25, 0.25], | ||
[0.25, 0.00, 0.00], | ||
[0.25, 0.00, 0.00]], | ||
ids=['S1', 'S2', 'S3']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can probably sit outside the subtest loop, since it never changes.
ids = ('10084.PC.481', '10084.PC.593', '10084.PC.356', | ||
'10084.PC.355', '10084.PC.354', '10084.PC.636', | ||
'10084.PC.635', '10084.PC.607', '10084.PC.634') | ||
expected = skbio.DistanceMatrix(data, ids=ids) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, can be moved outside the loop
for id1 in actual.ids: | ||
for id2 in actual.ids: | ||
npt.assert_almost_equal(actual[id1, id2], | ||
expected[id1, id2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto: test helper
I would also second @ChrisKeefe's point on plural for the inputs |
For the inputs, the user is entering |
I agree with @ChrisKeefe & @ebolyen, I think
|
You can actually do this now:
(I got tired of not being able to do Other list-likes use plural as well. |
@thermokarst jinx! |
Oh, nice!!! I think that's new from when we discussed a few months back? I agree much cleaner, and agree with using the plural form |
I think that that is the point of the method, right @wasade? If that's the case, do we need to validate the length? Does it make sense to run only on a single table/tree pair? |
@wasade, this functionality has been around since mid-2019. I think what we were discussing a few months back was a syntax for making these pairs full-fledged tuples, explicitly saying "this table belongs with this tree." |
@thermokarst, ah okay. I thought what I put in here was what was the recommendation from that chat :) but I must have misread. The "most correct" thing would be to validate that each tree/table in a common index position correspond to each other or if QIIME2 was able to ensure the paired relationships. Operating on a single tree/table is fine and is identifical to using regular unifrac |
Discussed out of band with @wasade & @ebolyen, going to postpone this q2-diversity integration until a future release. In the meantime, the new Some remaining questions to solve: do we expose this |
Hey @ebolyen circling back here - should we revisit this? |
This one's definitely quite old. @wasade do you or your team have any preferences on the matter? Rereading the thread, it seems like this is all already implemented in q2-div-lib, so this PR is mostly just copying it into the "diversity" namespace so to speak. I'm inclined to close it, since it hasn't really come up since and it may make more sense to modify |
It would be very exciting for |
Yep! It's already available in the default install as If we'd like it in this particular diversity plugin it may make more sense to put it in |
Is that easy to do? |
My hunch is no, because we would need to account for the different input type signatures here, list vs singular. I know we can't take a union of Alternatively, we make the breaking change for What if we rename it in diversity-lib so that it doesn't have "passthrough", which isn't particularly illuminating for end-users? That would be fast and a breaking change that is unlikely to rile anyone up. |
I'm in favor of easy :) We have students actively exploring this method right now directly via the unifrac API, and its historical precedence is impressive, so simplifying use is I think a bit + for the user base |
Would one of them be able to create a PR to update the name (dropping 'passthrough') and fix this issue: qiime2/q2-diversity-lib#30? This could make a good first-PR for someone. |
@ahdilmore, would you be interested in sorting this out? Basically, this would be to help make UniFrac's meta method more readily accessible via QIIME 2 |
This pull request is dependent on qiime2/q2-diversity-lib#26.
The known TODO items are:
beta_phylogenetic
)meta
specific arguments