Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let collapse accept FeatureTable[PresenceAbsence] #104

Open
nbokulich opened this issue Jan 22, 2018 · 10 comments
Open

Let collapse accept FeatureTable[PresenceAbsence] #104

nbokulich opened this issue Jan 22, 2018 · 10 comments

Comments

@nbokulich
Copy link
Member

nbokulich commented Jan 22, 2018

Improvement Description
Could be used to calculate the number of unique features belonging to each taxonomic group.

References
forum xref

@ebolyen
Copy link
Member

ebolyen commented Jan 29, 2018

What would the output type be in that case?

@nbokulich
Copy link
Member Author

The output would still be a FeatureTable[Frequency].

Just as collapse sums (I assume) the frequencies of features when collapsing by taxonomy, so it could sum the presence/absence matrix.

I am assuming that presence/absence == 1/0

@ebolyen
Copy link
Member

ebolyen commented Jan 29, 2018

That makes me a little nervous as it wouldn't make sense to rarefy, add-pseudocount, or perform most other manipulations of the table. So that FeatureTable[Frequency] is not the same kind of FeatureTable[Frequency] as other tables.

@nbokulich
Copy link
Member Author

very good point.

I am not sure how much user demand there would be for this if we made a stand-alone method (e.g., features-per-taxon). It sounds like a potentially useful summary statistic, but something I haven't run into too frequently (and I'm not sure what practical use it would have — maybe test whether group A has greater diversity of ASVs belonging to taxon X than group B?)

@ebolyen
Copy link
Member

ebolyen commented Jan 29, 2018

I suppose I'm not really sure what the goal is either. In principle a FeatureData[Taxonomy] is all you need to get that information, so maybe it could just be a utility method of some kind, rather than an extra output of collapse?

@dannyw2594
Copy link

Hello,

I think this feature is important in order to compare number of features in a large taxonomic group (phylum) between two samples. When studying organisms like algae, our databases might only allow us to get as far as phylum or family and having the number of features in that group for comparison is important.

@nbokulich
Copy link
Member Author

@dannyw2594 that could also be achieved by:

  1. filter on taxonomy to only include features that match that phylum in your feature table
  2. proceed with normal alpha diversity analyses

That approach would be cumbersome if a user wanted to do this for many different taxa, e.g., compare the number of features belonging to each genus, but at the phylum level (and particularly on key phyla) it should be pretty easy to accomplish.

Do you think that covers your needs? Sorry I did not suggest this earlier — it sounds now like you have a particular set of phyla in mind, before it sounded like you had much more expansive aims.

@dannyw2594
Copy link

Well that method would still be difficult. For an example with algae, the phylum Charophyta contains both algae and land plants. I would prefer to be able to see the list of taxonomy along with Feature counts so I can select the groups I want.
If I am the only one who has asked for this type of feature I will figure something out.

@nbokulich
Copy link
Member Author

Maybe we should move this back to the forum if discussion goes any further...

Well that method would still be difficult. For an example with algae, the phylum Charophyta contains both algae and land plants. I would prefer to be able to see the list of taxonomy along with Feature counts so I can select the groups I want.

You can pass multiple include and multiple exclude terms to filter-table (see docs). Hence, you can make a pretty complex filter here. E.g., include all Viridiplantae (and any other clades that algae are in if they are polyphyletic) but exclude all Embryophyta. These terms do not need to be the same taxonomic level, either. You can toggle "exact" vs. "contains" mode with the mode parameter to make this as explicit or broad as you want.

This should only be problematic if:
a) your target groups are insanely polyphyletic and it would take a ridiculously complex filter to get what you want.
b) your ingroups/outgroups cannot be differentiated by available taxonomic information. E.g., if you need strain-level information to differentiate groups.

What If use the taxonomy summarize to download an excel file with taxonomy and feature Ids. Then import the feature ids as a metadata table and use that to filter the feature table? Time consuming but it should do what I want.

If there will necessarily be a manual checking step (e.g., you cannot devise a reasonable taxonomy filter that could exclude capture all ingroups while excluding the many outgroups, e.g., land plants, that you do not want), then this might make the most sense.

If one of these solutions work for you, would you mind posting that solution to the forum thread? Other users searching that thread might be looking for a similar solution. Thanks!

@dannyw2594
Copy link

Sure, Im going to mess around with it this afternoon and Ill post what ends up working. Thanks for talking through this with me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants