Filtering low abundance Taxa #26

sarpiens · 2021-04-14T16:50:52Z

Hello,

I have a count table of ASV with two populations: P1 and P2 (5 samples/group), with-out filtering ASVs by abundance:

out = ancombc(phyloseq = pseq_A1_raw, formula = "population",
p_adj_method = "holm", zero_cut = 0.90, lib_cut = 1000,
group = "population", struc_zero = TRUE, neg_lb = TRUE, tol = 1e-5,
max_iter = 100, conserve = TRUE, alpha = 0.05, global = FALSE)

However, I find some ASVs with a significant q_val, but when a see the corresponding row in the count table show very low differences, because they are low abundand ASVs.

For example:
P1_1 P1_2 P1_3 P1_4 P1_5 P2_1 P2_2 P2_3 P2_4 P2_5
ASV_2 2 2 11 1 0 0 0 0 0 0
[...]
ASV_726 1 43 0 15 1 0 0 0 0 0

I wonder if it would be okey to apply a filter to remove low abundand taxa? And when should I used before or after doing differential abundance with ancombc?

Thanks in advance

FrederickHuangLin · 2021-04-15T05:55:02Z

Hi @sarpiens,

The argument zero_cut could help. The manual has stated that “ Taxa with proportion of zeroes greater than zero_cut will be excluded in the analysis”, therefore, to remove more low abundant taxa, setting a smaller value to zero_cut would work.

Best,
Huang

sarpiens · 2021-04-15T09:18:40Z

Thanks for the quick response,

The thing is that in some cases I also have ASVs, that seem "truly" abundant in one group, but absent on the other one. For example:

ASV/ /P1_1/ P1_2/ P1_3/ P1_4/ P1_5/
ASV_29/ /449/ 717/ 931/ 657/ 371/

ASV/ /P2_1/ P2_2/ P2_3/ P2_4/ P2_5
ASV_29/ 0/ 0/ 0/ 1/ 0/

And thus I'm worried that setting a smaller value to zero_cut, would remove this ASVs from the analysis too. Because of this I was thinking to apply some filtering step, something like prune ASV that in total have less that 80 counts, to filter those Low abundant ASVs like ASV_2 or ASV_726, but keeping ASVs like ASV_29. But I don't know if it would be better to apply this filter before or after the analysis with ancombc, because I'm worried that filtering these elements prior to the analysis would interfere in the normalization process.

In the original phyloseq object I have 1303 ASVs, but If I remove ASVs with counts < 80 counts for all samples, I keep 437 ASVs.

Thanks in advance

sarpiens · 2021-04-15T14:54:39Z

I also wonder if filtering abundant taxa would interfere in the normalization process, in the case that I also wanted to filter elements with incomplete taxonomies that account for an important part of the counts, in the case that a wanted to repeat the analysis at higher taxonomic levels(genus, family, order,etc).

I'm new new to the CoDa and ANCOM-BC paradigm, so any help is very appreciated!

Thanks in advance

FrederickHuangLin · 2021-04-18T15:32:38Z

Thank you for your great suggestion, @sarpiens !

Yes, I think it makes a lot of sense to filter ASV by its total observed abundance. So far it can be done in the data-preprocessing step, for example, QIIME2 has the corresponding filtering steps when you generate the feature table (ASV/OTU table) from raw sequencing data (fastq) files. We will have that feature available in the ANCOMBC function in the next update.

For your second question, yes, theoretically, filtering taxa will not affect the following normalization step.

Best,
Huang

sarpiens · 2021-04-20T10:49:58Z

Thanks a lot!

FrederickHuangLin closed this as completed Apr 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filtering low abundance Taxa #26

Filtering low abundance Taxa #26

sarpiens commented Apr 14, 2021

FrederickHuangLin commented Apr 15, 2021

sarpiens commented Apr 15, 2021

sarpiens commented Apr 15, 2021

FrederickHuangLin commented Apr 18, 2021

sarpiens commented Apr 20, 2021

Filtering low abundance Taxa #26

Filtering low abundance Taxa #26

Comments

sarpiens commented Apr 14, 2021

FrederickHuangLin commented Apr 15, 2021

sarpiens commented Apr 15, 2021

sarpiens commented Apr 15, 2021

FrederickHuangLin commented Apr 18, 2021

sarpiens commented Apr 20, 2021