-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using transformed count data from DEseq2 #283
Comments
Adapting Negative Binomial Tutorial for using Variance Stabilized of Counts OnlyThe following example uses additional commands from DESeq2. An additional wrapper in phyloseq is not needed. The In the phyloseq-included vignette example I use the publicly available data from a study on colorectal cancer: Genomic analysis identifies association of Fusobacterium with colorectal carcinoma. Study ID: Project Name: Import data with phyloseq, convert to DESeq2Start by loading phyloseq. library("phyloseq")
packageVersion("phyloseq")
Defined file path, and import the published OTU count data into R. filepath = system.file("extdata", "study_1457_split_library_seqs_and_mapping.zip",
package = "phyloseq")
kostic = microbio_me_qiime(filepath)
Convert to DESeq2's DESeqDataSet classIn this example I'm using the major sample covariate, Here is the summary of the data variable kostic
DESeq2 conversionFirst load DESeq2. library("DESeq2")
packageVersion("DESeq2")
The following two lines actually do all the complicated DESeq2 work. The function diagdds = phyloseq_to_deseq2(kostic, ~DIAGNOSIS)
diagdds
New Part Not Shown in Original Vignette: DESeq2 Variance StabilizationYou must step through the size factor and dispersion estimates prior to calling the diagdds = estimateSizeFactors(diagdds)
diagdds = estimateDispersions(diagdds)
diagvst = getVarianceStabilizedData(diagdds)
dim(diagvst)
kostic
As you can see, the dimensions of the variance stabilized count table, # Save the untransformed data as a separate variable so you can go back to
# it
kostic0 = kostic
otu_table(kostic) <- otu_table(diagvst, taxa_are_rows = TRUE) This modified diagdds = DESeq(diagdds) #, fitType='local')
res = results(diagdds)
res = res[order(res$padj, na.last = NA), ]
alpha = 0.01
keepOTUs = rownames(res[res$padj > alpha, ])[1:50]
kosticTrimvs = prune_taxa(keepOTUs, kostic)
kosticTrim0 = prune_taxa(keepOTUs, kostic0)
plot_heatmap(kosticTrimvs, taxa.order = "Phylum", taxa.label = "Genus", sample.label = "DIAGNOSIS",
sample.order = "DIAGNOSIS") plot_heatmap(kosticTrim0, taxa.order = "Phylum", taxa.label = "Genus", sample.label = "DIAGNOSIS",
sample.order = "DIAGNOSIS") Comparing these two heatmaps gives you a rough idea of what the variance stabilizing transformation does to the counts. In this case it looks very similar to a log transformation, and probably is similar. |
Joey- The below method worked to export the otu_table from the kostic variable containing the variance-stabilized data, so I could visually check the values in excel. Thanks for providing the detailed method for adding the vst counts to the OTU table in the kostic variable which will allow me to utilize the full range of phyloseq's graphical options.
Thank you! |
Thanks for the programming work and documentation on how to obtain the VSD in deseq2 from a phyoseq object. I did a test run with a subset of my data yesterday, and everything worked out fine, but when I try using the phyloseq_to_deseq2 with my larger data set (pruned to minimum taxa sum >0) I receive this error message: I am not sure what to make of it. When looking on the web, there was a comment that this warning is a result of a zero-matrix being supplied. but when I look at the original DT (phyloseq object) and the dds_DT, the correct information seems to be there (see paste below). As with most species observation matrix, taxa are absent from some samples. I am wondering if this is a problem for generating VSD? Any help would be great!
|
Hi Barbara, One of vignettes in the latest version of phyloseq (1.10.0+) includes an example for dealing with this. you can find it here: The quick answer is that you need to provide you own geometric means (calculated in a manner that is tolerant of zeros)... or include pseudocount at the step that calculates geometric means. The strict definition of geometric mean includes a product of its terms, so one zero in the bunch will result in a zero value. For real data, and this data especially, we expect occasional zeros, so one of these approaches is necessary for the geometric mean to be useful. Hope that helps! joey |
Joey, Yes, that is of great help. Thanks for the information and also the script Thanks again for the prompt reply and useful information. Cheers, Barbara D Bahnmann MSc | Marie Curie PhD Fellow On Wed, Nov 26, 2014 at 10:12 PM, Paul J. McMurdie <notifications@github.com
|
Hi Joey, I wanted to do some further analyses on a phyloseq object I created in Error in phyloseq(DT) : I updated my phyloseq package in January and I thought this might be the I checked that the row_names matched between the otu_table and taxa_table rownames(taxa_table)=taxa_names(otu_table) I did not see any information on how to use the intersect() to check the Thanks in advance, Barbara Barbara D Bahnmann MSc | Marie Curie PhD Fellow |
Hi Barbara, Sorry you are having this issue. This sounds like a completely unrelated issue to the now-closed issue to which this comment has been added. Please re-post as a new issue. Copy and paste should be fine. Also, please note that your issue post is incomplete, as you have not indicated which version of phyloseq you were using before and after, and you did not post the code that you used to create the "new" phyloseq object from the original files. This latter case is more important, because it should have fixed your issue right away. Please indicate what the file formats are as well when you provide that code. Thanks in advance for your updated comment, and your continued interest in phyloseq! joey |
Hello, After going through this post and trying to save my variance stabilized counts in my original otu table, I am having problems with the conversion from a dseq object (after variance stabilizing) back to a phyloseq object. I am using phyloseq version 1.16.2. Below is my code: #load the data with low read samples pruned out print(otu_tablef_no10_coralf_sm_866f_mayf) # 39 samples, 6037 taxa #Try with updated gm_means, because my data has such a high prevalence of sparsely sampled OTUS #save the untransformed data as a separate variable so you can go back to it #replace the counts with variance stabilized counts Error Outputed:
Thank you very much for your help! All of these issues and vignettes on phyloseq are incredibly helpful. Jamie |
I'm having the same problem: error: Thank you again for all these helpful threads and the vignettes! |
I had the same problem when I tried this code: So I tried it this way... But when I try to get the matrix use the getVarianceStabilizedData func and use that as my otu table it seems to work. |
Joey-
In your "Waste not, want not" paper you suggest that researchers interested in PCA and clustering ought to consider using variance stabilized data rather than proportions or rarified data.
DEseq2 allows output of the transformed count matrix derived from the variancestabilizingTransformation function using
getVarianceStabilizedData(object)
. Do you have a convenient way to change that back into a phyloseq data object so that all of the available graphical and analytical functions of phyloseq can be used (barplots, heatmaps, etc)?Thanks,
Kristina
The text was updated successfully, but these errors were encountered: