Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering low abundance Taxa #26

Closed
sarpiens opened this issue Apr 14, 2021 · 5 comments
Closed

Filtering low abundance Taxa #26

sarpiens opened this issue Apr 14, 2021 · 5 comments

Comments

@sarpiens
Copy link

Hello,

I have a count table of ASV with two populations: P1 and P2 (5 samples/group), with-out filtering ASVs by abundance:

out = ancombc(phyloseq = pseq_A1_raw, formula = "population",
p_adj_method = "holm", zero_cut = 0.90, lib_cut = 1000,
group = "population", struc_zero = TRUE, neg_lb = TRUE, tol = 1e-5,
max_iter = 100, conserve = TRUE, alpha = 0.05, global = FALSE)

However, I find some ASVs with a significant q_val, but when a see the corresponding row in the count table show very low differences, because they are low abundand ASVs.

For example:
P1_1 P1_2 P1_3 P1_4 P1_5 P2_1 P2_2 P2_3 P2_4 P2_5
ASV_2 2 2 11 1 0 0 0 0 0 0
[...]
ASV_726 1 43 0 15 1 0 0 0 0 0

I wonder if it would be okey to apply a filter to remove low abundand taxa? And when should I used before or after doing differential abundance with ancombc?

Thanks in advance

@FrederickHuangLin
Copy link
Owner

Hi @sarpiens,

The argument zero_cut could help. The manual has stated that “ Taxa with proportion of zeroes greater than zero_cut will be excluded in the analysis”, therefore, to remove more low abundant taxa, setting a smaller value to zero_cut would work.

Best,
Huang

@sarpiens
Copy link
Author

Thanks for the quick response,

The thing is that in some cases I also have ASVs, that seem "truly" abundant in one group, but absent on the other one. For example:

ASV/ /P1_1/ P1_2/ P1_3/ P1_4/ P1_5/
ASV_29/ /449/ 717/ 931/ 657/ 371/

ASV/ /P2_1/ P2_2/ P2_3/ P2_4/ P2_5
ASV_29/ 0/ 0/ 0/ 1/ 0/

And thus I'm worried that setting a smaller value to zero_cut, would remove this ASVs from the analysis too. Because of this I was thinking to apply some filtering step, something like prune ASV that in total have less that 80 counts, to filter those Low abundant ASVs like ASV_2 or ASV_726, but keeping ASVs like ASV_29. But I don't know if it would be better to apply this filter before or after the analysis with ancombc, because I'm worried that filtering these elements prior to the analysis would interfere in the normalization process.

In the original phyloseq object I have 1303 ASVs, but If I remove ASVs with counts < 80 counts for all samples, I keep 437 ASVs.

Thanks in advance

@sarpiens
Copy link
Author

I also wonder if filtering abundant taxa would interfere in the normalization process, in the case that I also wanted to filter elements with incomplete taxonomies that account for an important part of the counts, in the case that a wanted to repeat the analysis at higher taxonomic levels(genus, family, order,etc).

I'm new new to the CoDa and ANCOM-BC paradigm, so any help is very appreciated!

Thanks in advance

@FrederickHuangLin
Copy link
Owner

Thank you for your great suggestion, @sarpiens !

Yes, I think it makes a lot of sense to filter ASV by its total observed abundance. So far it can be done in the data-preprocessing step, for example, QIIME2 has the corresponding filtering steps when you generate the feature table (ASV/OTU table) from raw sequencing data (fastq) files. We will have that feature available in the ANCOMBC function in the next update.

For your second question, yes, theoretically, filtering taxa will not affect the following normalization step.

Best,
Huang

@sarpiens
Copy link
Author

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants