-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DESeq2 with phyloseq issue with tutorial code #1556
Comments
Hi @paddyhooper As shown in the tutorial, the DESeq2 used then was v1.2.10, released many years ago, much have changed since then. This issue #642 have discussion about adding pseudocount vs. alternative geometric mean If we choose the geometric mean approach on the demo dataset library(phyloseq)
library(DESeq2)
filepath = system.file("extdata", "study_1457_split_library_seqs_and_mapping.zip", package="phyloseq")
kostic = microbio_me_qiime(filepath)
kostic = subset_samples(kostic, DIAGNOSIS != "None")
# Calculate geometric means prior to estimate size factors
gm_mean = function(x, na.rm = TRUE){
exp(sum(log(x[x > 0]), na.rm = na.rm) / length(x))
}
diagdds = phyloseq_to_deseq2(kostic, ~ DIAGNOSIS)
geoMeans = apply(counts(diagdds), 1, gm_mean)
diagdds = estimateSizeFactors(diagdds, geoMeans = geoMeans)
diagdds = estimateDispersions(diagdds, fitType = "parametric")
diagdds <- nbinomWaldTest(diagdds)
res = results(diagdds, cooksCutoff = FALSE)
alpha = 0.01
sigtab = res[which(res$padj < alpha), ]
sigtab = cbind(as(sigtab, "data.frame"), as(tax_table(kostic)[rownames(sigtab), ], "matrix"))
head(sigtab)
|
Hi there, Thanks very much for your response! I hadn't considered the package date on the tutorial. From your response it sounds like generally it could be a better option to use the alternative geometric mean on 'sparse' ASV datasets. Thanks for pointing me in the way of these responses, I will look into these. Kind regards, |
Hi @joey711 and all,
Version info:
R Version: 4.1.0
RStudio Version 1.4.1717
I am hoping to use the DESeq2 plugin extension in phyloseq for my data. I am following this tutorial: https://joey711.github.io/phyloseq-extensions/DESeq2.html
When I try run the script I am able to access the data and create the phyloseq object. However, when I run:
diagdds = phyloseq_to_deseq2(kostic, ~ DIAGNOSIS)
I get this warning message:
In DESeqDataSet(se, design = design, ignoreRank) :
some variables in design formula are characters, converting to factors
diagdds = DESeq(diagdds, test="Wald", fitType="parametric")
I get this error:
Error in estimateSizeFactorsForMatrix(counts(object), locfunc = locfunc, :
every gene contains at least one zero, cannot compute log geometric means
I have seen on a few forums (https://www.biostars.org/p/440379/) that people have had similar issues and suggest adding a pseudocount to the data which I think may come from previous discussions on issue 445?
estimateSizeFactors(dds_PvsN, type = 'iterate')
However, I wanted to check there is not a more fundamental issue I am making (quite likely!) as I was surprised to have this issue using the sample data for the walkthrough.
Many thanks for your amazing work,
Paddy
The text was updated successfully, but these errors were encountered: