Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exposing unifrac.meta #292

Closed
wants to merge 6 commits into from
Closed

Exposing unifrac.meta #292

wants to merge 6 commits into from

Conversation

wasade
Copy link
Member

@wasade wasade commented Aug 20, 2020

This pull request is dependent on qiime2/q2-diversity-lib#26.

The known TODO items are:

  • adjust plugin_setup text to better reflect the method (currently effectively a copy/paste of beta_phylogenetic)
  • add support for meta specific arguments
  • augment testing to run through meta

@wasade
Copy link
Member Author

wasade commented Nov 10, 2020

Brief note, I'm refactoring some of the beta_phylogenetic tests so they can also use this method. Hoping to push next commit tomorrow

@wasade
Copy link
Member Author

wasade commented Nov 16, 2020

Expecting to push a commit or two today

@wasade
Copy link
Member Author

wasade commented Nov 16, 2020

Specifying an alpha for generalized unifrac, to use with meta, is not currently supported due to biocore/unifrac#117. This is at the moment reflected in the test code, though it may be necessary to do something in the library.

This is a bug but I'm not sure if I can do a minor unifrac release quickly enough for this. Possibly... the bug is simple.

@wasade
Copy link
Member Author

wasade commented Nov 17, 2020

Just had another reasonable item pop up on unifrac for some performance related commits that aren't in conda, so will try to get that release out today

@wasade wasade changed the title WIP: exposing unifrac.meta Exposing unifrac.meta Nov 17, 2020
@wasade
Copy link
Member Author

wasade commented Nov 17, 2020

@thermokarst, I think the build is not getting qiime2/q2-diversity-lib#26? I may be looking at the failures incorrectly though

@thermokarst
Copy link
Contributor

@thermokarst, I think the build is not getting qiime2/q2-diversity-lib#26?

That is correct! BW is churning away now, hopefully we'll have new env files in the next few hrs. Worth noting, DockerHub's new rate limiting scheme is putting a damper on BW right now, so we might have some CI downtime while that is addressed. 🤞

@wasade
Copy link
Member Author

wasade commented Nov 17, 2020

Rate limits were meant to be broken...? 🚀

@thermokarst thermokarst assigned thermokarst and unassigned wasade Nov 19, 2020
@thermokarst thermokarst self-requested a review November 19, 2020 19:38
Comment on lines 114 to 115
parameters={'metric': Str % Choices(beta.METRICS['PHYLO']['IMPL'] |
beta.METRICS['PHYLO']['UNIMPL']),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation of beta_phylogenetic_meta in this PR doesn't actually have any discretely implement individual metrics (for example), if that is the intent, please remove the IMPL metrics:

Suggested change
parameters={'metric': Str % Choices(beta.METRICS['PHYLO']['IMPL'] |
beta.METRICS['PHYLO']['UNIMPL']),
parameters={'metric': Str % Choices(beta.METRICS['PHYLO']['UNIMPL']),

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not fully sure I follow. Unweighted, weighted, variance adjusted and generalized should be suitable for meta unifrac, and all of these are implemented in unifrac? I may be misunderstanding something here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we set up q2-diversity-lib is so that we can implement per-metric-qiime2-methods (like what I linked to above). This is great for provenance etc. The IMPL metrics are the list of metrics that have their own q2-diversity-lib "methods". The UNIMPL are metrics that don't have their own individual div-lib methods. If you peek at beta you'll see this illustrated more clearly:

if metric in METRICS['NONPHYLO']['IMPL']:
metric = METRICS['NAME_TRANSLATIONS'][metric]
action = ctx.get_action('diversity_lib', metric)
dm, = action(table=table, n_jobs=n_jobs)
else:
action = ctx.get_action('diversity_lib', 'beta_passthrough')
dm, = action(table=table, metric=metric, pseudocount=pseudocount,
n_jobs=n_jobs)

'aggregated'
},
output_descriptions={'distance_matrix': 'The resulting distance matrix.'},
name='Beta diversity (phylogenetic)',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we say something about the "meta"ness? Otherwise this is identical to beta-phylogenetic's description:

Screen Shot 2020-11-19 at 2 16 27 PM

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 good point :)

q2_diversity/tests/test_beta.py Show resolved Hide resolved
q2_diversity/tests/test_beta.py Outdated Show resolved Hide resolved
@thermokarst thermokarst assigned wasade and unassigned thermokarst Nov 19, 2020
@thermokarst thermokarst assigned thermokarst and unassigned wasade Nov 20, 2020
@thermokarst thermokarst removed their assignment Nov 23, 2020
@ebolyen ebolyen self-assigned this Nov 23, 2020
@ChrisKeefe
Copy link
Collaborator

Super excited to see this getting added. Just wondering whether there's a reason we went with table and phylogeny instead of tables and phylogenies if users are likely to pass Lists of them. Same question applies to qiime2/q2-diversity-lib#26

Copy link
Member

@ebolyen ebolyen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but I think the tests can be cleaned up pretty easily, just some minor changes there.

function=q2_diversity.beta_phylogenetic_meta,
inputs={'table': List[FeatureTable[Frequency |
RelativeFrequency |
PresenceAbsence]],
Copy link
Member

@ebolyen ebolyen Nov 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question, can the list be polymorphic as described, or should it be monomorphic like List[FeatureTable[Frequency] | FeatureTable[RelativeFrequency] | FeatureTable[PresenceAbsence]] ?

It seems like this would depend on if it was weighted or not? We could constrain the input with a typemap if so:

In [1]: from qiime2.plugin import TypeMap, Str, Choices, Visualization, List

In [2]: from q2_types.feature_table import FeatureTable, Frequency, RelativeFrequency, PresenceAbsence

In [3]: T_table, P_metric, _ = TypeMap({
   ...:     (List[FeatureTable[Frequency | RelativeFrequency | PresenceAbsence]],
   ...:      Str % Choices("unweighted_unifrac")): Visualization,  # a convention for no-op w.r.t. output
   ...:     (List[FeatureTable[Frequency] | FeatureTable[RelativeFrequency]],
   ...:      Str % Choices('weighted_unifrac', 'weighted_normalized_unifrac', 'generalized_unifrac')): Visualization
   ...: })

In [4]: T_table
Out[4]: List[FeatureTable[Frequency | RelativeFrequency | PresenceAbsence]]¹ | List[FeatureTable[Frequency] | FeatureTable[RelativeFrequency]]²

In [5]: P_metric
Out[5]: Str % Choices('unweighted_unifrac')¹ | Str % Choices('weighted_unifrac', 'weighted_normalized_unifrac', 'generalized_unifrac')²

Although... to be honest I kind of hate what that type will look like in our interfaces, so I am good ignoring this...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out I needed coffee, I think the only constraint here is whether P/A can be used in combination with the others, which means this would probably be fine:

In [10]: T_table, P_metric, _ = TypeMap({
    ...:     (List[FeatureTable[Frequency | RelativeFrequency | PresenceAbsence]],
    ...:      Str % Choices("unweighted_unifrac")): Visualization,  # a convention for no-op w.r.t. output
    ...:     (List[FeatureTable[Frequency | RelativeFrequency]],
    ...:      Str % Choices('weighted_unifrac', 'weighted_normalized_unifrac', 'generalized_unifrac')): Visualization
    ...: })

In [11]: T_table
Out[11]: List[FeatureTable[Frequency | RelativeFrequency | PresenceAbsence]]¹ | List[FeatureTable[Frequency | RelativeFrequency]]²

In [12]: P_metric
Out[12]: Str % Choices('unweighted_unifrac')¹ | Str % Choices('weighted_unifrac', 'weighted_normalized_unifrac', 'generalized_unifrac')²

It is less hideous...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean whether to allow a user to specify --i-table a_frequency_table.qza --i-table a_presenceabsence_table.qza ...? I don't believe there is enough data to guide whether this should or should not be restricted like this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question, can the list be polymorphic as described, or should it be monomorphic like List[FeatureTable[Frequency] | FeatureTable[RelativeFrequency] | FeatureTable[PresenceAbsence]] ?

A quick note on this - only one of these are actually possible here, because the diversity-lib beta-phylogenetic-meta-passthrough only works with FeatureTable[Frequency].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, though we may want to consider relaxing the underlying library

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed an issue here: qiime2/q2-diversity-lib#30

for id1 in actual.ids:
for id2 in actual.ids:
npt.assert_almost_equal(actual[id1, id2],
expected[id1, id2])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we refactor this into a test helper?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Note that most of this is not new, just indented, but it could definitely benefit from some refactor. Will clean up

expected = skbio.DistanceMatrix([[0.00, 0.25, 0.25],
[0.25, 0.00, 0.00],
[0.25, 0.00, 0.00]],
ids=['S1', 'S2', 'S3'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can probably sit outside the subtest loop, since it never changes.

ids = ('10084.PC.481', '10084.PC.593', '10084.PC.356',
'10084.PC.355', '10084.PC.354', '10084.PC.636',
'10084.PC.635', '10084.PC.607', '10084.PC.634')
expected = skbio.DistanceMatrix(data, ids=ids)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, can be moved outside the loop

for id1 in actual.ids:
for id2 in actual.ids:
npt.assert_almost_equal(actual[id1, id2],
expected[id1, id2])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto: test helper

@ebolyen
Copy link
Member

ebolyen commented Nov 23, 2020

I would also second @ChrisKeefe's point on plural for the inputs

@wasade
Copy link
Member Author

wasade commented Nov 23, 2020

For the inputs, the user is entering --i-table <path> --i-table <path> ... as I don't think there is a good way to describe multiple entries for a single argument right now. I think table makes sense here as other actions which take, for example, multiple metadata files I don't believe rely on the plural form

@thermokarst
Copy link
Contributor

I agree with @ChrisKeefe & @ebolyen, I think tables and phylogenies would be more clear (and if we update, we should fix that over in q2-div-lib before the release ships).

--i-tables table1.qza table2.qza table3.qza and --i-phylogenies tree1.qza tree2.qza tree3.qza looks pretty nice to me.

@ebolyen
Copy link
Member

ebolyen commented Nov 23, 2020

You can actually do this now:

--i-tables <path1> <path2> <path3>

(I got tired of not being able to do data/* as input...)

Other list-likes use plural as well.

@ebolyen
Copy link
Member

ebolyen commented Nov 23, 2020

@thermokarst jinx!

@wasade
Copy link
Member Author

wasade commented Nov 23, 2020

Oh, nice!!! I think that's new from when we discussed a few months back? I agree much cleaner, and agree with using the plural form

@thermokarst
Copy link
Contributor

if users are likely to pass Lists of them

I think that that is the point of the method, right @wasade? If that's the case, do we need to validate the length? Does it make sense to run only on a single table/tree pair?

@thermokarst
Copy link
Contributor

I think that's new from when we discussed a few months back?

@wasade, this functionality has been around since mid-2019. I think what we were discussing a few months back was a syntax for making these pairs full-fledged tuples, explicitly saying "this table belongs with this tree."

@wasade
Copy link
Member Author

wasade commented Nov 23, 2020

@thermokarst, ah okay. I thought what I put in here was what was the recommendation from that chat :) but I must have misread. The "most correct" thing would be to validate that each tree/table in a common index position correspond to each other or if QIIME2 was able to ensure the paired relationships. Operating on a single tree/table is fine and is identifical to using regular unifrac

@thermokarst
Copy link
Contributor

Discussed out of band with @wasade & @ebolyen, going to postpone this q2-diversity integration until a future release. In the meantime, the new unifac.meta functionality can be found in q2-diversity-lib (and will be published in the 2020.11 core distribution).

Some remaining questions to solve: do we expose this meta functionality in its own pipeline, do we expose it via the existing beta-phylogenetic pipeline, or do we only leave it exposed through q2-diversity-lib?

@lizgehret
Copy link
Member

Hey @ebolyen circling back here - should we revisit this?

@ebolyen
Copy link
Member

ebolyen commented Sep 21, 2022

This one's definitely quite old. @wasade do you or your team have any preferences on the matter?

Rereading the thread, it seems like this is all already implemented in q2-div-lib, so this PR is mostly just copying it into the "diversity" namespace so to speak.

I'm inclined to close it, since it hasn't really come up since and it may make more sense to modify beta-phylogenetic at some point instead. Please feel free to re-open at any point :)

@ebolyen ebolyen closed this Sep 21, 2022
@wasade
Copy link
Member Author

wasade commented Sep 21, 2022

It would be very exciting for meta to be available to the user community. Is it now possible for people to use it? If not, then this should be reopened and we should work toward merge.

@ebolyen
Copy link
Member

ebolyen commented Sep 22, 2022

Yep! It's already available in the default install as qiime diversity-lib beta-phylogenetic-meta-passthrough. It's perhaps not ideally named, but it's definitely been around for a while now.

If we'd like it in this particular diversity plugin it may make more sense to put it in beta-phylogenetic outright, since that's probably what a user would expect. Although the inputs might get tricky to explain.

@wasade
Copy link
Member Author

wasade commented Sep 22, 2022

Is that easy to do?

@ebolyen
Copy link
Member

ebolyen commented Sep 22, 2022

My hunch is no, because we would need to account for the different input type signatures here, list vs singular.

I know we can't take a union of List[X] | X so either that changes, or typemap accidentally does let this work. But then we'd have to double check that the interfaces can tolerate that.

Alternatively, we make the breaking change for beta-phylogenetic to the input parameters and update various call-sites like core-metrics-phylogenetic.


What if we rename it in diversity-lib so that it doesn't have "passthrough", which isn't particularly illuminating for end-users? That would be fast and a breaking change that is unlikely to rile anyone up.

@wasade
Copy link
Member Author

wasade commented Sep 22, 2022

I'm in favor of easy :) We have students actively exploring this method right now directly via the unifrac API, and its historical precedence is impressive, so simplifying use is I think a bit + for the user base

@ebolyen
Copy link
Member

ebolyen commented Sep 22, 2022

Would one of them be able to create a PR to update the name (dropping 'passthrough') and fix this issue: qiime2/q2-diversity-lib#30?

This could make a good first-PR for someone.

@wasade
Copy link
Member Author

wasade commented Sep 22, 2022

@ahdilmore, would you be interested in sorting this out? Basically, this would be to help make UniFrac's meta method more readily accessible via QIIME 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants