How do you define contrasts in the hierarchical multiple testing? #18

benkraj · 2017-06-06T16:12:45Z

Hi-

Within DESeq you normally apply some sort of contrast with the results function i.e. (from: https://www.bioconductor.org/packages/release/bioc/vignettes/phyloseq/inst/doc/phyloseq-mixture-models.html):
ddsTEST <- phyloseq_to_deseq2(ps.glom, ~ SeasonLoc)
ddsTEST = estimateSizeFactors(ddsTEST, geoMeans=geoMeans)
ddsTEST <- DESeq(ddsTEST, fitType="parametric")
resT = results(ddsTEST, contrast=c("SeasonLoc", "PermDry", "PermWet"))

Which then will give you log2fold changes and adjusted p-values between your defined groups (i.e. "PermDry" compared to "PermWet" in this example).

I understand with the hierarchical multiple testing it's accounting for higher level structure in the data based on phylogeny, but I don't see how I define which groups the procedure compares. In the workflow, which age bins is it finding have different Lachnospiraceae abundance as there are 3 bins (0-100, 100-200, 200-400)? Is there any sort of directionality you can get from the output (increasing/decreasing abundance)?

Let me know if I'm being unclear or you need more info.

Thanks for any help,
Ben

spholmes · 2017-06-07T15:09:29Z

Ben I think your question is about the structSSI component of the test which allows you to test different levels of taxonomy (ie is there a difference at the phyla level, is there a difference at the genus level for the phyla that are significantly different..etc). Reading the complete article about testing with hierarchical structure https://www.jstatsoft.org/article/view/v059i13 When doing testing for more than two groups, the only conclusion in a first round of tests is: there is a difference, the extra testing is done afterwards but looking at boxplots of the significant taxa gives the direction. Susan

…

On Tue, Jun 6, 2017 at 9:12 AM, benkraj ***@***.***> wrote: Hi- Within DESeq you normally apply some sort of contrast with the results function i.e.: ddsTEST <- phyloseq_to_deseq2(ps.glom, ~ SeasonLoc) resT = results(ddsTEST, contrast=c("SeasonLoc", "PermDry", "PermWet")) Which then will give you log2fold changes and adjusted p-values between your defined groups (i.e. "PermDry" compared to "PermWet" in this example). I understand with the hierarchical multiple testing it's accounting for higher level structure in the data based on phylogeny, but I don't see how I define which groups the procedure compares. In the workflow, which age bins is it finding have different Lachnospiraceae abundance as there are 3 bins (0-100, 100-200, 200-400)? Is there any sort of directionality you can get from the output (increasing/decreasing abundance)? Let me know if I'm being unclear or you need more info. Thanks for any help, Ben — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#18>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABJcvXVzsTI5kHCr4JcX2zvwLJCWY9R0ks5sBXp9gaJpZM4Nxk1V> .

-- Susan Holmes Professor, Statistics and BioX John Henry Samter Fellow in Undergraduate Education Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/

benkraj · 2017-06-07T20:24:01Z

Hi Susan-

Thanks for your response.

Yes I am after the structSSI portion of the analysis. I have read through that paper, and I think maybe I follow now. Is it best to define your group comparison (or rounds of group comparisons) using the treePValues command's "environments"?

i.e.
First round (comparing everything):

data("chlamydiae", package = "structSSI")
environments <- sample_data(chlamydiae)$SampleType
abundances <- otu_table(chlamydiae)
chl.tree <- get.edgelist(as.igraph(phy_tree(chlamydiae)))
chl.pval <- treePValues(chl.tree, abundances, environments)
chl.hfdr <- hFDR.adjust(chl.pval, chl.tree, alpha = 0.1)

Then say I want to compare only "Soil" vs "Skin" samples, I would just define a new environment, i.e.:

enviro2 <- as.factor(c("Soil", "Skin"))
chl.pval2 <- treePValues(chl.tree, abundances, enviro2)
chl.hfdr2 <- hFDR.adjust(chl.pval2, chl.tree, alpha = 0.1)

This would then be analogous to the contrasts in DESeq? It seems like it works, and then I wouldn't need to do any sort of phyloseq modification, i.e. subsetting the phyloseq to just my Soil and Skin samples.

In regards to the boxplots/ways to get fold change direction, if I understand correctly, I then could use basically what I had been before with DESeq to get the fold change information, but replace the p/adjusted p-values it calculates with the hierarchical multiple testing adjusted ones? Are there alternative helper functions I could use for this?

Does this seem to be correct in the approach?

Thanks,
Ben

spholmes · 2017-06-07T20:53:19Z

Ben Your approach is the best one, sorry not many helpers as everyone tailors their use of the tools to a different workflow so not much help having a completely general tool, but seems like your approach is good. If you are testing many different subsets for the same data, you may have to cost yourself a little for multiple testing too. Susan

…

On Wed, Jun 7, 2017 at 1:24 PM, benkraj ***@***.***> wrote: Hi Susan- Thanks for your response. Yes I am after the structSSI portion of the analysis. I have read through that paper, and I think maybe I follow now. Is it best to define your group comparison (or rounds of group comparisons) using the treePValues command's "environments"? i.e. First round (comparing everything): data("chlamydiae", package = "structSSI") environments <- sample_data(chlamydiae)$SampleType abundances <- otu_table(chlamydiae) chl.tree <- get.edgelist(as.igraph(phy_tree(chlamydiae))) chl.pval <- treePValues(chl.tree, abundances, environments) chl.hfdr <- hFDR.adjust(chl.pval, chl.tree, alpha = 0.1) Then say I want to compare only "Soil" vs "Skin" samples, I would just define a new environment, i.e.: enviro2 <- as.factor(c("Soil", "Skin")) chl.pval2 <- treePValues(chl.tree, abundances, enviro2) chl.hfdr2 <- hFDR.adjust(chl.pval2, chl.tree, alpha = 0.1) This would then be analogous to the contrasts in DESeq? It seems like it works, and then I wouldn't need to do any sort of phyloseq modification, i.e. subsetting the phyloseq to just my Soil and Skin samples. In regards to the boxplots/ways to get fold change direction, if I understand correctly, I then could use basically what I had been before with DESeq to get the fold change information, but replace the p/adjusted p-values it calculates with the hierarchical multiple testing adjusted ones? Are there alternative helper functions I could use for this? Does this seem to be correct in the approach? Thanks, Ben — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#18 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABJcvYl4TsK4JqiK4RKpNuXAl4idPvpYks5sBwbhgaJpZM4Nxk1V> .

-- Susan Holmes Professor, Statistics and BioX John Henry Samter Fellow in Undergraduate Education Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do you define contrasts in the hierarchical multiple testing? #18

How do you define contrasts in the hierarchical multiple testing? #18

benkraj commented Jun 6, 2017 •

edited

Loading

spholmes commented Jun 7, 2017 via email

benkraj commented Jun 7, 2017

spholmes commented Jun 7, 2017 via email

How do you define contrasts in the hierarchical multiple testing? #18

How do you define contrasts in the hierarchical multiple testing? #18

Comments

benkraj commented Jun 6, 2017 • edited Loading

spholmes commented Jun 7, 2017 via email

benkraj commented Jun 7, 2017

spholmes commented Jun 7, 2017 via email

benkraj commented Jun 6, 2017 •

edited

Loading