Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem importing biom file using import_qiime() #302

Closed
anmwinter opened this issue Mar 2, 2014 · 16 comments
Closed

Problem importing biom file using import_qiime() #302

anmwinter opened this issue Mar 2, 2014 · 16 comments

Comments

@anmwinter
Copy link

Hi Joey,

A friend pointed me towards phyloseq as a tool I should be using for our upcoming papers. I had a few questions about the import_qiime function.

I am on an OS X 10.7.5 running R 3.x and RStudio. I installed the newest version of phyloseq from the github repo. I am using QIIME 1.8

Here is how I brought my qiime files in.

biom_file = "otu_table.biom"    
map_file = "VLmapping.csv"      
tree_file = "rep_set_tree.tre"
villaluz <- import_qiime(biom_file, map_file, tree_file)
Processing map file...
Processing otu/tax file...

Reading and parsing file in chunks ... Could take some time. Please be patient...

Building OTU Table in chunks. Each chunk is one dot.
.
Error in `colnames<-`(`*tmp*`, value = character(0)) : 
  attempt to set 'colnames' on an object with less than two dimensions

I saw this error from a year ago here on the forums but didn't find an answer to that.

I can still proceed downstream:

villaluz
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 4977 taxa and 7 samples ]
sample_data() Sample Data:       [ 7 samples by 10 sample variables ]
tax_table()   Taxonomy Table:    [ 4977 taxa by 8 taxonomic ranks ]
phy_tree()    Phylogenetic Tree: [ 4977 tips and 4975 internal nodes ]
rank_names(villaluz)
[1] "Rank1"   "Kingdom" "Phylum"  "Class"   "Order"   "Family"  "Genus"   "Species"
get_taxa_unique(villaluz, "Phylum")
 [1] NA                 "WPS-2"            "ZB3"              "AC1"             
 [5] "Gemmatimonadetes" "GN04"             "GOUTA4"           "FCPU426"         
 [9] "Actinobacteria"   "Firmicutes"       "NKB19"            "Proteobacteria"  
[13] "BHI80-139"        "Nitrospirae"      "Lentisphaerae"    "OP3"             
[17] "Verrucomicrobia"  "Planctomycetes"   "LD1"              "BRC1"            
[21] "WS3"              "OP8"              "PAUC34f"          "GAL15"           
[25] "GN02"             "Chloroflexi"      "OD1"              "Elusimicrobia"   
[29] "[Thermi]"         "WS2"              "TM7"              "Armatimonadetes" 
[33] "OP11"             "TM6"              "Acidobacteria"    "SR1"             
[37] "Spirochaetes"     "WS1"              "FBP"              "Cyanobacteria"   
[41] "Bacteroidetes"    "TPD-58"           "[Caldithrix]"     "Fibrobacteres"   
[45] "Chlorobi"  

The main problem I am running into is when I do anything with the tree all I get for node id's are the denovo numbers and not actual OTUs that were assigned in QIIME.

I will keep my questions to a minimum so I don't clutter up the thread to much.

  1. Why I am getting this error using import_qiime
    Error in colnames<-(*tmp*, value = character(0)) :
    attempt to set 'colnames' on an object with less than two dimensions

but I can see the phyla and running all the box plots of alpha diversity and make bar plots

  1. Why can't I get my tree nodes to assign their taxonomy correctly?

Thanks for the help and suggestions!
ara

@joey711
Copy link
Owner

joey711 commented Mar 2, 2014

Ara,

It's very unclear to me what is going on, because in your example, your import command failed with an error, but then you show commands afterward that include your data as a phyloseq object with 4 components. Somehow you successfully imported your data. Furthermore, files with a .biom extension usually indicated biom-format, which should be imported using the import_biom function in phyloseq. Please see the documentation about the import functions, via something like:

library("phyloseq")
?import_qiime
?import_biom
?import

While you're at it, you said you have installed "the latest version of phyloseq", but the devel version on GitHub is updated often. Please always report the version number that is returned via the following:

library("phyloseq"); packageVersion("phyloseq")

Question 1

Well, shouldn't be using import_qiime to import a biom-format file, even if it was produced by QIIME. Remember that an import function only cares about the file's format, not what produced it. The name import_qiime is referring to the QIIME-legacy table format, not the QIIME workflow itself. These functions are well-documented. Please make use of that documentation, and also see the vignettes and many examples online at phyloseq's tutorials, demos, and extensions pages.

The colnames<- assignment error usually shows up in this context when there is something that fails in the taxonomy assignment. I wouldn't worry about it, for now, though, until you verify what file format you have (biom-format or qiime-legacy?), and that you have tried the correct importer. I suspect you already solved this problem because you showed results from a properly-imported phyloseq data object.

Question 2

Do you mean the internal nodes, or the tree tips? There is essentially no processing by phyloseq of the internal nodes of the tree. Whatever is in your tree file will be stored in the tree-nodes labels. These might have some formatting constraint set by the ape package in read.tree. I'm not sure. I do know that exotic complicated node labels might be truncated or have special symbols removed, but I don't have a MWE (see below) to demonstrate that. As far as phyloseq is concerned, the taxonomy is solely contained in the taxonomy table, which is returned by tax_table(). If that table looks valid, then you are fine in any taxonomy-related tasks in phyloseq. The labels on the tree nodes are kept in case they are useful in tree plotting, but they are considered just labels and no rules are enforced. For instance, sometimes node labels are bootstrap values rather than taxonomy. There are multiple node label plotting styles supported in phyloseq, and you should see the documentation about these in ?plot_tree, and, for instance, ?nodeplotboot and related.

I can't diagnose your problem further without very specific details. I'm not sure what you mean by "assign taxonomy correctly". It appears from your own example that the taxonomy table is present and valid, so I think the taxonomy is fine. If you have a tree-plotting or node-plot formatting issue, please provide more detail. See the following about producing a Minimal Working Example, which explains a nice guideline for posting more easily-diagnosable and reproducible problems in future posts.

http://jaredknowles.com/journal/2013/5/27/writing-a-minimal-working-example-mwe-in-r

Thanks for giving phyloseq a try!

joey

@joey711
Copy link
Owner

joey711 commented Mar 2, 2014

p.s. Since you don't seem to be stuck (your data is imported), I will close this issue for now. Please respond with a clarified version of your issue that includes each step leading up to the place where you are stuck, and what the output is. Ideally your problem can be reproduced using an included example dataset, like ?GlobalPatterns, and if not, try to identify how your data differs.

Thanks again,

joey

@joey711 joey711 closed this as completed Mar 2, 2014
@joey711
Copy link
Owner

joey711 commented Mar 2, 2014

I almost forgot that you might want to try:

tree_file = "rep_set_tree.tre"
read_tree_greengenes(tree_file)

especially if you are using/appending a GreenGenes-formatted tree.

@anmwinter
Copy link
Author

Joey,

Thanks for the comments and pointers. All the examples included in phyloseq worked fine for me. Let me dig through your comments and try again.

ara

@michberr
Copy link
Contributor

michberr commented Mar 3, 2014

Hi Ara,

I struggled with some of the same issues as you, so maybe I can help.

  1. Definitely use import_biom() rather than import_qiime() since you're working with a biom formatted table. Joey, the names of these import functions are a little confusing for new users because qiime exclusively uses the biom format now. I definitely tried using import_qiime() several times with my data before i realized it wouldn't work. Maybe you can add some documentation to the main demo that says only to use the import_qiime() function if your OTUs are in a txt file
  2. The reason you're not getting taxonomy for some of your OTUs is because they weren't assigned in Qiime. This is not a phyloseq issue. Depending on the method you use for picking OTUs and then assigning taxonomy, you may have some OTUs that remain unassigned because qiime couldnt find anything in the database that matched it. In fact, when you use the import_biom() function, phyloseq should spit out a bunch of warnings that say:
In parseFunction(i$metadata$taxonomy) :
  No greengenes prefixes were found. 
Consider using parse_taxonomy_default() instead if true for all OTUs. 
Dummy ranks may be included among taxonomic ranks now.

However, one unresolved issue is that when you import an OTU table which has some OTUs with unassigned taxonomy, phyloseq adds in an extra column called "Rank1" to your tax_table. This column is entirely empty for me at least, so I simply removed it with the line below.

#Get rid of weird 8th taxonomic rank that resulted from warnings above
tax_table(biomdata) <- tax_table(biomdata)[, -8]

@anmwinter
Copy link
Author

I am getting all kinds of other weird issues cropping up now that I changed over from import_qiime. I'll start a new ticket.

@joey711
Copy link
Owner

joey711 commented Mar 3, 2014

Thanks @michberr for clarifying and giving pointers.

I will re-open this issue as a documentation problem. The import_qiime function should now mention that it is only provided to support legacy QIIME tables, and should not be used on the output from recent versions of QIIME.

joey

@joey711 joey711 reopened this Mar 3, 2014
@joey711
Copy link
Owner

joey711 commented Mar 3, 2014

I am updating the documentation for import_qiime now to avoid this confusion in the future...

@balaTHLkuopio
Copy link

Hi Joey,

I am getting the below warnings when i try the greengenes tree. Can i just ignore and proceed with the analysis or i should consider it as it might affect the results in the analysis???

tree_file = "97_otus.tree"
read_tree_greengenes(tree_file)

Phylogenetic tree with 99322 tips and 99321 internal nodes.

Tip labels:
4479984, 698544, 564724, 4465919, 3618043, 823988, ...
Node labels:
, , 'k__Archaea', , '0.961, '0.972, ...

Rooted; includes branch lengths.
There were 13 warnings (use warnings() to see them)

warnings()
Warning messages:
1: In go.down() : NAs introduced by coercion
2: In go.down() : NAs introduced by coercion
3: In go.down() : NAs introduced by coercion
4: In go.down() : NAs introduced by coercion
5: In go.down() : NAs introduced by coercion
6: In go.down() : NAs introduced by coercion
7: In go.down() : NAs introduced by coercion
8: In go.down() : NAs introduced by coercion
9: In go.down() : NAs introduced by coercion
10: In go.down() : NAs introduced by coercion
11: In go.down() : NAs introduced by coercion
12: In go.down() : NAs introduced by coercion
13: In go.down() : NAs introduced by coercion

Best Regards,
Bala

@balaTHLkuopio
Copy link

Hi Joey,

Actually, i thought of trying it out although there is warnings. I got the price with these error:

otufile <- system.file("extdata", "otu_table_with_taxa_closed_PASTURE.txt", package="phyloseq")

mapfile <- system.file("extdata", "mapping_file_twogroups.txt", package="phyloseq")
tree_file = "97_otus.tree"
trefile <- read_tree_greengenes(tree_file)
There were 13 warnings (use warnings() to see them)
test <- import_qiime(otufile, mapfile, trefile)
Processing map file...
Processing otu/tax file...
Reading file into memory prior to parsing...
Detecting first header line...
Header is on line 2
Converting input file to a table...
Defining OTU table...
Parsing taxonomy table...
Processing phylogenetic tree...
Error in cat(list(...), file, sep, fill, labels, append) :
argument 2 (type 'list') cannot be handled by 'cat'

I am using phyloseq_1.8.1 package and R 3.1.0. Please let me know if you want more details.

Do you think that this error is because of those warnings?? It will be nice if you help me to solve this issue. Thanks in advance.

Best Regards,
Bala

@joey711
Copy link
Owner

joey711 commented Jun 2, 2014

Bala,

Please indicate what version of QIIME you used, and what the file format is. For example, is it the legacy QIIME table format, or BIOM format? If it is the latter, then you should be using import_biom.

Also, 100,000 OTUs sounds a bit high... but that's a different topic...

@balaTHLkuopio
Copy link

Hi Joey

i used legacy qiime table format. I used qiime 1.8. I modified the table as per the example data in phyloseq extdata folder. Let me know if you need further details.

@balaTHLkuopio
Copy link

Hi Joey, can i make a seperate issue for this or just leave it here?

@jsilve24
Copy link

I am getting the same error when the following code is run: (Note I was just following your tutorial found here)

library(phyloseq)

otutax = '/Users/Justin/Research/data/_data_derived/HMP16s/otu_table_psn_v35.txt.gz'
map = '/Users/Justin/Research/data/_data_derived/HMP16s/v35_map_uniquebyPSN.txt.bz2'

tree = read_tree_greengenes('/Users/Justin/Research/data/_data_derived/HMP16s/rep_set_v13.tre.gz')
tree$tip.label <- gsub("'", "", tree$tip.label, fixed = TRUE)

HMPv35 <- import_qiime(otutax,map,tree)

I get the following error after the call to import_qiime

Processing map file...
Processing otu/tax file...
Reading file into memory prior to parsing...
Detecting first header line...
Header is on line 2  
Converting input file to a table...
Read 45383 rows and 4790 (of 4790) columns from 0.409 GB file in 00:00:07
Defining OTU table... 
Parsing taxonomy table...
Processing phylogenetic tree...

Error in cat(list(...), file, sep, fill, labels, append) : 
  argument 2 (type 'list') cannot be handled by 'cat'
In addition: There were 50 or more warnings (use warnings() to see the first 50)

@spholmes
Copy link
Contributor

spholmes commented Mar 2, 2016

Justin,
What people usually do is biom convert their old qiime files to the newer
biom format and then
import.
#480
If you can't get a more recent format.

I would separate the steps into 3 separate imports to pinpoint the error.
Some of our tutorials are written in earlier versions of both qiime and
phyloseq so some things have changed.

It might be worth trying a small test example and providing us with a
minimum working example
so we can try to see where it is going wrong,

Best
Susan

On Mon, Feb 29, 2016 at 12:32 PM, Justin Silverman <notifications@github.com

wrote:

I am getting the same error when the following code is run: (Note I was
just following your tutorial found here
http://joey711.github.io/phyloseq-demo/HMP_import_example.html)

library(phyloseq)

otutax = '/Users/Justin/Research/data/_data_derived/HMP16s/otu_table_psn_v35.txt.gz'
map = '/Users/Justin/Research/data/_data_derived/HMP16s/v35_map_uniquebyPSN.txt.bz2'

tree = read_tree_greengenes('/Users/Justin/Research/data/_data_derived/HMP16s/rep_set_v13.tre.gz')
tree$tip.label <- gsub("'", "", tree$tip.label, fixed = TRUE)

HMPv35 <- import_qiime(otutax,map,tree)

I get the following error after the call to import_qiime

Processing map file...
Processing otu/tax file...
Reading file into memory prior to parsing...
Detecting first header line...
Header is on line 2
Converting input file to a table...
Read 45383 rows and 4790 (of 4790) columns from 0.409 GB file in 00:00:07
Defining OTU table...
Parsing taxonomy table...
Processing phylogenetic tree...

Error in cat(list(...), file, sep, fill, labels, append) :
argument 2 (type 'list') cannot be handled by 'cat'
In addition: There were 50 or more warnings (use warnings() to see the first 50)


Reply to this email directly or view it on GitHub
#302 (comment).

Susan Holmes
Professor, Statistics and BioX
John Henry Fellow in Undergraduate Education
Sequoia Hall,
390 Serra Mall
Stanford, CA 94305
http://www-stat.stanford.edu/~susan/

@apoosakkannu
Copy link

Hi, I created biom file using qiime and imported in phyloseq using following command

OTU_TAXA<-import_biom("D:/Czech/Triatomine2019/microbiome_data_20182019/data/otutable97_taxa_20182019.biom")

The import was successful but i found that there were 20 taxonomy ranks instead of 7 taxonomy ranks in my phyloseq object. Some of my OTUs are having blank taxonomy, could that be a problem. If so please let me know, how to overcome this problem.

Thanking you and looking forward to hear from you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants