-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
importing BIOM file #443
Comments
Binary file? That sounds like the hdf5-based "version 2" biom format. The original biom format was not binary, a JSON based format. The original biom format is what is currently supported in phyloseq. The biom-format team told me about the update probably 1 year ago. You are the first to post to phyloseq issue tracker about it. I've been discussing with @nosson (Joe Paulson) about adding biom-v2 support in the biom package on CRAN. The more this becomes an issue, the faster one of us will support it. I personally have not encountered a need for HDF5-based storage of microbiome count data. If you have some way of storing your OTU table in a biom v1 format, legacy QIIME format, or even just a tab-delimited table or any format that can be read by R, then you will be able to use phyloseq. Hope that helps. I'll leave this issue open for now until there are further (and more helpful) comments about workarounds and/or actual biom-v2 support. Cheers joey |
I had to convert my biom format using "convert biom" which is distributed with qiime, perhaps this is the issue? Maybe try:
and see if that works? All the best, Phill |
In QIIME 1.9 the biom files are all in the new format and are not read in correctly into phyloseq. The only work around I have going at the moment is an install of QIIME 1.8 to make the biom files. |
Hi Ara, If you'd like, can you go ahead and test this function: This will import biom vs. 2.0 files. If you find it works well I'm sure
On Sun, Mar 15, 2015 at 9:55 PM, Ara notifications@github.com wrote:
|
Joey and Phil, I did try the command that Phil suggested converting the BIOM file to json format, it worked, now I am able to import my data to phyloseq without any issues. Thanks! Alejandro |
@nosson Thanks! I will give that a try. |
@alejorojas2 @bioinfonm If you guys are willing to be testers for the gist I posted above I know Joey will import it into biom package. I just need to write the documentation. Is there a desire/need to "write" hdf5 biom2 files? |
@nosson Sounds good. Phyloseq is our main tech for processing and understanding our microbial data. So anything I can do to help out I am game for. I'll give it a try here around lunch! |
@nosson Does the biom file need to be a rich format? I dumped out a biom file with the default settings in QIIME 1.9
and the got this error in phyloseq using the biom 2.x function
|
@bioinfonm Mhmm, just for clarification - this is not a phyloseq error. We're using the R packages biom rhdf5 to import this file. Converting it into a biom-class object (as Joey defined in the biom package) then lets us easily convert that to an object that can be parsed by phyloseq or metagenomeSeq. Do you think you could send me a MRE (minimally reproducible example?). The ones I have from biom-format.org worked, but having a simple real example like yours would help with debugging. I don't personally have Qiime wrappers installed so I can't create my own without lots of effort. If you can get something to me my email is jpaulson@umiacs.umd.edu. |
@nosson Will do! I'll send the whole file over. Ara |
@nosson I am traveling right know but as soon as I can test it, I will let you know. About the need for hdf5, for me it was just by chance since it is the first time that used qiime for my analysis and I think it is the default behavior now. |
So a similar problem cropped up on the qiime forum here: Looks like something goofy with the way QIIME is handling the hdf5 format. The solution at the moment that works is: |
@nosson I did try your function and it worked for me, I had to install the library for The I ran the the function and run your command:
then I get this output, that actually makes sense based on my data:
So, If this is a
So it seems that your functions creates a list instead of a biom file, or maybe I am missing something? Thanks! Alejandro |
@alejorojas2 Awesome! Glad to hear it works for you - it seems like some of the default outputs by Qiime are not 'true' HDF5 files (@bioinfonm's was not). Glad to hear it worked for you! This is not an issue as Phyloseq uses
import_biom <- function(BIOMfilename,
treefilename=NULL, refseqfilename=NULL, refseqFunction=readDNAStringSet, refseqArgs=NULL,
parseFunction=parse_taxonomy_default, parallel=FALSE, version=1.0, ...){
# initialize the argument-list for phyloseq. Start empty.
argumentlist <- list()
# Read the data
if(class(x)=="biom"){ # or check if string first
x = BIOMfilename
} else {
x = read_biom(biom_file=BIOMfilename)
}
|
Hi there, I am following this thread with interest as I am also using the latest QIIME. The read_hdf5_biom function provided worked, but I was getting the same error as CarlyRae did (see comments here https://gist.github.com/nosson/324ac1fa3eab1bc7f845) for the import_biom2 function. The error suggests a list is not the expected input, as it is expecting the biom-class (subclass of List) as input, I modified the code to convert the list to a biom object... and it seems to work. But I just wanted to check with you guys if this was the Right Thing to Do :)
EDIT: I just realised I haven't got the latest bioconductor installed. I'll update everything and get back to you if there is still a problem. |
I upgraded to R 3.2 (Bioconductor version 3.1 (BiocInstaller 1.18.2)), installed the phyloseq library (‘1.13.2’), and still needed to add that line (x = biom(x)) to the import_biom2 function to get it to work. But I think there are still issues, as the taxonomy parsing isn't working correctly (I've tried default and the parse_taxonomy_qiime options too). I thought this all might work if I could install the dev version of phyloseq, but I can't seem to get this to work.
|
Hi @LBragg, Thanks for the research so far into the problem and apologies for the delay! If you could email me a MRE (minimally reproducible example) that you're working with, that'll really help me debug and fix up the code so that we can get biom-class 2.1 objects imported into |
@LBragg Thanks for the MRE - here's a link to the now updated import_biom2.R script https://gist.github.com/nosson/324ac1fa3eab1bc7f845 that should fix everything (sorry for the delay, been in Paris for the last week). |
@joey711 If you're happy with this, let me know and I can make import_biom flexible and modify |
Hey guys, all LGTM! |
The fixes work for me, the biom produced by qiime 1.9 can be imported. As an aside, I noticed when I passed a parameter (I was trying to pass the mapping file) to
|
The warning you showed seems like expected/desired behavior if the Btw everyone, @nosson et al, I was imagining we add a new argument to Thoughts? joey |
Sounds good to me. As for that issue I was describing, the |
Boom! I just made a large number of changes to the Check out: joey711/biom#11 |
Hi all, Sorry to be picking up on an earlier part of this thread so late in the game but I am just now running into my issue. I am trying to import my biom file, converted from hdf5 to json using Phil's earlier suggestion with the biom convert in QIIME, and receiving the following error:
I've tried using a txt file as well, and it just wont seem to work. Any help is appreciated, |
that's not a valid file path. the Your file path for your data should just be a standard relative or absolute Unix path string. jsonbiomfile = "path/to/my/data/file.biom"
import_biom(jsonbiomfile) |
By the way, the early version of the replacement that will solve this HDF5 / Biom-V2 issue is now a separate repository. Plan is to post it on Bioconductor: https://github.com/joey711/biomformat The release version of phyloseq package cannot officially support it as an internal dependency until it is released on BioC. The devel version can do so, as soon as the first version of "biomformat" is on BioC. I will close this issue once the solutions are implemented enough in phyloseq and its doc. |
Hi Joey, My apologies for not catching this was a tutorial specific command. Your suggestion worked wonderfully. Thanks for the HDF5 info as well. D |
Hi all, |
@mafed interesting - can you perhaps post this in the biomformat package: https://github.com/joey711/biomformat/issues and i'll go ahead and fix this |
@jnpaulson sure, I'll do it immediately, thanks |
The documentation for the
This is not correct, is it? Shouldn't it say "Old versions" instead of "New versions"? |
Has the hdf5 biom issue been fixed? I still get the error "Error in nchar(content) : invalid multibyte string 1" when trying to import a biom generated with QIIME 1.9.1. The documentation here seems to imply that after April this problem was fixed. I've checked that my version of phyloseq is up to date. |
This issue should be fixed for some time now. The dependency in phyloseq was migrated in previous release version to point to "biomformat" on Bioconductor, which supports both JSON and HDF5 versions of the format, rather than "biom" on CRAN, which should be considered deprecated. All functionality from "biom" package on CRAN is now subsumed within "biomformat" on Bioconductor. I don't know what QIIME did or did not do to fix the issue, but AFAIK biomformat is reading these formats just fine. I will close for now, but feel free @jmicrobe to post more details of your problem if it turns out it is not solved by biomformat/phyloseq. @jnpaulson feel free to re-open if you think there's something still TBD. |
Ok, I solved my own issue. But putting comments here in case someone does this too! Reproducible result: data("GlobalPatterns") MGS <- phyloseq_to_metagenomeSeq(GlobalPatterns) MGS p <- cumNormStatFast(MGS) b <- MRexperiment2biom(MGS, norm = T) write_biom(b, biom_file = "test.biom") Error messagebiom1 <- import_biom2("test.biom") You have to read in the biom file first for import biom to work, needs an object not a filebiom2 <-read_biom("test.biom") biom3 <- import_biom2(biom2) |
Hi everyone! I have my otu_table.biom constructed by Qiime 1.9 and my metadata file and I tried to import in Phyloseq and until now I have not couldn't; I changed my otu_table.biom to json with this command biom convert -i otu_table.biom -o otu_table_json.biom --table-type="OTU table" --to-json However, with this file, I have not couldn't, please if you have any advice, I'm new in bioinformatics :) Thanks in advance |
Sorry I forgot to wrote script that I used, it was: jsonbiomfile = "C/Users/metagenomic/Desktop/otu_table_json.biom" biom_file <- paste(jsonbiomfile, "otu_table_json.biom", sep = "") read_biom(biom_file) Is there something wrong?? |
This is what works for me from QIIME 1.9 .json files BIOM FILEbiom <- import_biom("milk_final.biom", parseFunction = parse_taxonomy_greengenes) rank_names(biom) get error message, but looks fine, adds a rank1 at the end for whatever reason. You can turn off the parse_taxonomy_greengenes maybe...MAPPING FILEmap <- import_qiime_sample_data("map_milk_final_alpha.txt") TREE FILEthe tree must be rooted. You can mid-point root it if you want see: http://www.phytools.org/static.help/midpoint.root.htmlMy tree was rooted with an outgroup in other R codetree = read.tree("milk_final_rooted.tre") GM <- merge_phyloseq(biom, tree, map) When you merge all these files, the OTUs that are in the biom file, but not in the tree file are removed, but these OTUs are still 'linked' to the phyloseq object. It is best to filter them in case they cause some issue. What happens is when you do the alignment in QIIME some of the OTUs don't align (they are often classified as bacteria) and thus don't show up in the tree. It is a good filtering strategy.GM_data = filter_taxa(GM, function(x) sum(x) != 0, TRUE) sum(taxa_sums(GM_data) == 0) ntaxa(GM_data) sort(sample_sums(GM_data)) difference in sequencing depthmax(sample_sums(GM_data))/min(sample_sums(GM_data)) Hope that works for you |
Thank you Carly, for your answer!!! I tested this ####biom.json ####Metadata import #####merge two phyloseq objects And it worked!!! Thnaks |
Hi Joey711,
I have been trying to import a biom file that I generated with qiime, the file is binary file, it seems that is new format for this kind of files, then when trying to work with it in R I get this error:
This is my code so far:
COI_data <- import_biom(biom_file)
Error in nchar(content) : invalid multibyte string 1
COI_data <- import_qiime(BIOMfilename = biom_file)
Error in arglist[sapply(arglist, is.component.class)] :
invalid subscript type 'list'
Maybe I am missing something or should I use an older version of BIOM file?
The text was updated successfully, but these errors were encountered: