Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing Qiime2 biom file #821

Open
sbudree opened this issue Sep 16, 2017 · 21 comments
Open

Importing Qiime2 biom file #821

sbudree opened this issue Sep 16, 2017 · 21 comments

Comments

@sbudree
Copy link

sbudree commented Sep 16, 2017

Hi,

I have created a feature table using Qiime2 and have exported this as a biom file. However, this biom file cannot be imported into phyloseq [Error in colnames<-(*tmp*, value = c("ta1", "ta0")) :
length of 'dimnames' [2] not equal to array extent
In addition: There were 50 or more warnings (use warnings() to see the first 50)]

Can you advise me on how to import the Qiime2 biom file into Phloseq?

@joey711
Copy link
Owner

joey711 commented Sep 16, 2017

Can you share the file?

@sbudree
Copy link
Author

sbudree commented Sep 17, 2017 via email

@joey711
Copy link
Owner

joey711 commented Sep 17, 2017

Actually, can you post a link? Although you've responded via email, this is actually still on the phyloseq issues tracker:

#821

so email attachments don't work. Any file-hosting site will do. Some (like dropbox) make it easy to stop sharing the file after a certain amount of time, or when you say so.

@sbudree
Copy link
Author

sbudree commented Sep 17, 2017

@ViridianaAvila
Copy link

Hi,
Today I had the same error. I realized for some reason biom files generated by QIIME2 does not include the taxonomy. I had to extract the taxonomy and the tree from the qza files and merge them with tsv biom then back to a new biom file. That solved my issue, I hope this information is useful. Here is what I did:

  1. Uncompress the qza files (table, tree and taxonomy). unzip will do.

  2. Enter to the folder of uncompressed table, you will find a feature-table.biom file.

  3. For the easy manipulation of this feature-table.biom convert to .txt using:
    biom convert -i feature-table.biom -o otu_table.txt --to-tsv

  4. This table can be open in excel or R anything where you can merge by OTU ID the taxonomy information using the taxonomy.tsv file will do. (This taxonomy.tsv file comes from the decompression of taxonomy.qza)

  5. Once the taxonomy info is merged as a column in the otu_table.txt, convert again this file using the following command:
    biom convert -i otu_table.txt -o new_otu_table.biom --to-hdf5 --table-type="OTU table" --process-obs-metadata taxonomy

  6. Loads in R
    biom_otu <- import_biom(BIOMfilename = "new_biom.biom", treefilename = "tree.nwk")
    (The tree file is coming from decompressing unrooted-tree.qza or rooted-tree.qza)

@ksdiaz
Copy link

ksdiaz commented Sep 29, 2017

Hello,

I've been having this similar issue as well. I can't use the biom convert circumvention because biom throws an error when trying to do Step 5 ViridianaAvila suggested ("TypeError: can only join an iterable", I suspect this has to do with the qiime format itself, as trying to validate the original biom file from the qiime qza returns it as an invalid file). So, I tried adding the taxonomy as a new column into the converted text file. Trying to load this into phyloseq like this:

otufile <- "feature-table-taxonomy.txt"
mapfile <- "phylGastrotricha_mapping.tsv"
treefile <- "phylGastro_tree.nwk"
qiimetable <- import_qiime(otufile, mapfile, treefile, parseFunction = parse_taxonomy_qiime)

gives me the following error:

Processing map file...
Processing otu/tax file...
Reading file into memory prior to parsing...
Detecting first header line...
Header is on line 2  
Converting input file to a table...
Defining OTU table... 
Adding new column 'Consensus Lineage' then assigning NULL (deleting it).Adding new column
 '#OTU ID' then assigning NULL (deleting it).Parsing taxonomy table...
Error in taxlist[[i]] : subscript out of bounds

Traceback in Rstudio shows me this:

Error in taxlist[[i]] : subscript out of bounds 
3. build_tax_table(taxlist) 
2. import_qiime_otu_tax(otufilename, parseFunction, verbose = verbose) 
1. import_qiime(otufile, mapfile, treefile, parseFunction = parse_taxonomy_qiime)

I've attached the table I was trying to import.

feature-table-taxonomy.txt

@jme6f4
Copy link

jme6f4 commented Oct 4, 2017

Hi, Just came across this and had the same problem. I'm able to run the "moving pictures" tutorial fine (otu_table_mc2_w_tax_no_pynast_failures.biom), but not my own data. The only difference that I can spot is that my .biom file was produced by QIIME2.

@jackmen
Copy link

jackmen commented Oct 17, 2017

Hi guys,

the approach of @ViridianaAvila works and is the one I have been using quite often in the past. Be aware that you have to give a header to the taxonomy column in your otu_table.txt and this header should be the same name as used behind the command "obs-metadata". @ksdiaz : Your taxonomy has no column header. Name the column header for the taxonomy column in the otu_table.txt "taxonomy" when using this command:

biom convert -i otu_table.txt -o new_otu_table.biom --to-hdf5 --table-type="OTU table" --process-obs-metadata taxonomy

@jme6f4 : You might want to check if you did the same mistake.

Then in phyloseq simply do:

biom <-import_biom ("new_otu_table.biom", parseFunction = parse_taxonomy_greengenes)
map <-import_qiime_sample_data ('mapping_file.txt')
tree <- read_tree_greengenes ("tree.nwk")
class <- merge_phyloseq (biom, map, tree)

Also: The qiime2 taxonomy column header in taxonomy.tsv is "Taxon" and if only pasted to otu_table.txt is not recognized. I guess changing to "obs-metadata Taxon" should also work fine.

Good luck!

@HRRTPH
Copy link

HRRTPH commented Nov 2, 2017

Hi everyone,
I am trying to import Qiime2 output files into Phyloseq. My commands are here:
otufile = system.file("extdata", "feature-table.biom", package="phyloseq")
mapfile = system.file("extdata", "G3_metadata.txt", package="phyloseq")
trefile = system.file("extdata", "GP_tree_rand_short.newick.gz", package="phyloseq")
rs_file = system.file("extdata", "dna-sequences.fasta", package="phyloseq")
qiimedata = import_qiime(otufile, mapfile, trefile, rs_file)

I used qiime tools export to get those files except the mapfile which extension was changed from tsv to txt. After running qiimedata, i got an error:

Processing map file...
Error in read.table(file = mapfilename, header = TRUE, sep = "\t", comment.char = "") :
no lines available in input
In addition: Warning message:
In file(file, "rt") : file("") only supports open = "w+" and open = "w+b": using the former

Does anyone know this error?

Please help!!!

Thank you very much,

Toan

@Biancabrown
Copy link

Hello,
I'm currently having the same issue. When I try to convert the txt file that I generated with the taxonomy header to a biom file I get the following error message "ValueError: could not convert string to float: NA" Any ideas?

@Nourhanelsahly
Copy link

Nourhanelsahly commented Jan 7, 2018

hello,
I have the same problem (importing biom to phyloseq), I followed up the steps in @ViridianaAvila and @jackmen comments, it went fine until converting the merged file (otu and taxonomy) to biom format again. The new biom file didn't contain the taxonomy column (I converted it to txt to see). So, its not imported to phyloseq.

Do you have any clue please?
I am attaching the merged txt file

output.txt

@Ajsnevets
Copy link

Hi not sure if you have sorted this out, however, I was having similar problems. This may not be your issue, but it was one of mine. In your output.txt file you have quotation marks for the boundaries of each cell, and the convert function can't recognise these. Instead of writing a .txt file in R I wrote it as .csv and used sep="\t" to get rid of the quotation marks. I then opened in excel and checked the headers were aligned and saved it as a .txt file. I then opened it in notepad and pasted #Constructed from BIOM file# at the start of the txt file as this was inserted by the software after the first convert and figured it might be important. At the moment you have "#OTU ID" and when you remove the brackets the # will probably stop it reading anything after it. Mine starts with this: #Constructed from BIOM file# OTU ID Sample1 Sample2 and so on. Once I had this format sorted everything else worked fine, good luck.

@brookeweigel
Copy link

Hello! I am also having problems getting my QIIME2 data into phyloseq. So far, I tried all of the above steps, including making a new .biom file with OTU abundances + taxonomy. I had the same problem as @Nourhanelsahly, even after following the above changes from @Ajsnevets by changing the header to match the OTUID for each file. Please help! I would love to use phyloseq, and since QIIME2 is now widely used, I wish that it was a lot easier to transition from QIIME2 output to phyloseq. I really want to figure this out!

See .txt file below:
table.from_biom.txt

See code:
biom add-metadata -i core-metrics-results/rarified-feature-table/feature-table.biom -o table-with-taxonomy.biom --observation-metadata-fp core-metrics-results/rarified-feature-table/taxonomy.tsv --sc-separated taxonomy

Apart from the issue of getting a .biom file with both taxonomy and OTU abundances, I am having trouble importing my data into phyloseq. See below.
otufile = system.file("extdata", "table-with-taxonomy.biom", package="phyloseq")
mapfile = system.file("extdata", "Sea_Cucumber_metadata.txt", package="phyloseq")
trefile = system.file("extdata", "tree.nwk", package="phyloseq")
qiimedata = import_qiime(otufile, mapfile, trefile)

Here is my mapping file:
Sea_Cucumber_metadata.txt

I get this error:

Processing map file...
Error in read.table(file = mapfilename, header = TRUE, sep = "\t", comment.char = "") :
no lines available in input
In addition: Warning message:
In file(file, "rt") :
file("") only supports open = "w+" and open = "w+b": using the former

@Ajsnevets
Copy link

Ajsnevets commented Mar 8, 2018

There is no taxonomy data in this table, so I guess you didn't manage to merge them?
The first line of the table.from_biom.txt reads #[space]constructed from..[space]#[no space]OTU ID. This is one of several problems but the current placement of your [spaces] is going to stop it reading correctly, it has to be: #[no space]Constructed from BIOM file[no space]#[space]OTU ID Sample1 Sample2

Additionally though, the problem is that with your "table.from_biom.txt" file, the columns names are not lined up with the columns. Not sure how this got messed up.

(side note) Your tables seem to have been rarefied, I am no expert but my understanding is that DESEq2 allows you to look at the data without rarefying it like with qiime2.

I would go back to the original table.qza file, unzip your table.qza file and go through the folders until you found the .biom and then qiime2 convert it with:
biom convert -i feature-table.biom -o otu_table.txt --to-tsv
open the table in R where you can merge by OTU ID the taxonomy information using the taxonomy.tsv file. (This taxonomy.tsv file comes from the decompression of taxonomy.qza). If you can not open it in R (read.table(file = 'otu.tsv', sep = '\t', header = TRUE) then it is probably because of the #constructed from...# problem so open this in notepad, fix it and then open it in R

@brookeweigel
Copy link

Hi @Ajsnevets and everyone else trying to get QIIME2 data into phyloseq... after some exporting and merging in R (to get around the fact that after filtering out chloroplasts and mitochondria, my taxonomy files and OTU matrix have a different # of taxa), I was finally able to wrangle my QIIME2 data into Phyloseq without using any .biom files. Here is a pipeline that I wrote (see PDF below). It is pretty clunky and uses QIIME2 + R + excel, but at least it works!

It is also partially based on the answer from https://forum.qiime2.org/t/converting-biom-files-with-taxonomic-info-for-import-in-r-with-phyloseq/2542/5 from that doesn't use any .biom files.

QIIME2_to_Phyloseq.pdf

@laylaeb
Copy link

laylaeb commented Mar 14, 2018

@brookeweigel
MANY thanks...you made my day! its working perfectly :)

@kspeeriful
Copy link

As an update to helpful instructions posted by @brookeweigel, you can shorten these steps by formatting the taxonomy file in R using the below commands:

#Read in the .tsv version of the feature table, which should now have a column header "OTUID", not "#OTU ID"
features <- read.table(file="feature-table.txt", header=TRUE)
head(features)

#Read in the .tsv version of the taxonomy table, which should also have a column header "OTUID", not "Feature ID"
tax <- read.table(file="taxonomy.tsv", sep='\t', header=TRUE)
head(tax)

#Create a list of OTUIDs that are present in tax, but not in features, which need to be eliminated and remove them
#The remaining table should have the same number of rows as in the features data frame
tax_filtered <- tax[tax$OTUID %in% features$OTUID,]
head(tax_filtered)

#Separate the "Taxon" column in the tax_filtered data frame by semicolon so that each step of the taxonomy (e.g., kingdom, phylum, class, etc.) is its own column
tax_filtered <- separate(tax_filtered, Taxon, c("Kingdom","Phylum","Class","Order", "Family", "Genus","Species"), sep= ";", remove=TRUE)

#write one outfile containing the OTUID and taxonomic info
write.csv(tax_filtered, file="taxonomy_phyloseq.csv")

@DeniRibicic
Copy link

DeniRibicic commented Jan 18, 2019

Hi guys,

I am having a problem while importing QIIME2 biom file into phyloseq.

So, I have my own bash script running QIIME2 analysis from raw reads to assigning taxonomy and exporting files of interest. Lastly taxonomy is added to the biom file after properly changing header of taxonomy.tsv file:

biom add-metadata -i exported/feature-table.biom -o exported/table-with-taxonomy.biom --observation-metadata-fp exported/taxonomy.tsv --sc-separated taxonomy

To make things short, I've been running hundreds and hundreds of samples from different projects this way pain-free. But only this one particular run gives me this "itch" with its biom file.
Getting the following error while trying to import_biom:

Error in read_biom(biom_file = BIOMfilename) : 
  Both attempts to read input file:
exported/table-with-taxonomy.biom
either as JSON (BIOM-v1) or HDF5 (BIOM-v2).
Check file path, file name, file itself, then try again.
In addition: Warning message:
In strsplit(msg, "\n") : input string 1 is invalid in this locale

I have double-, triple-, quadruple-checked my path and file names, so that's not the issue...

The additional warning message usually occurs when I am importing biom files, but it doesn't really affect my phyloseq workflow.

So, for some reason, it seems that biom file itself could be corrupt. Anyone has an idea how to inspect the biom to check what might be wrong with it?
If someone wants to give it a try as well I am uploading it to the google drive:

https://drive.google.com/open?id=1n828eQmPWjfPpsURh1MH9KRMT0pi2rZl

There are 4 files; 1) original non-exported biom file; 3_table.qza, 2) exported biom file; feature-table.biom, 3) biom file with added taxonomy; table-with-taxonomy.biom and 4) taxonomy.tsv file

ps. when I continue using the .qza file in downstream QIIME2 analysis, it runs without a problem.

Any help would be appreciated,
Deni

@andreanuzzo
Copy link

Hi everyone,

I am not sure that issue has been solved yet. My current way to move from Qiime2 to Phyloseq still hasn't betrayed me as of version 2019.4, so I am sharing it if somebody might find it useful.

In command line I do as follows:

qiime tools export \
  table.qza \
  --output-dir biom

qiime tools export \
  taxonomy.qza \
  --output-dir biom

qiime tools export \
  rooted-tree-filtered.qza \
  --output-dir biom

#This is necessary because I have biom in python2
source deactivate
source activate qiime1-1.9.1

cd biom

biom convert \
  -i feature-table.biom \
  -o feature-json.biom \
  --table-type="OTU table" \
  --to-json

#This step is necessary for the metadata addition to he biom file
sed -i s/Taxon/taxonomy/ taxonomy.tsv | sed -i s/Feature\ ID/FeatureID/ taxonomy.tsv

biom add-metadata \
  -i feature-json.biom \
  -o feature_w_tax.biom \
  --observation-metadata-fp taxonomy.tsv \
  --observation-header FeatureID,taxonomy,Confidence \
  --sc-separated taxonomy --float-fields Confidence

Once this is done, then the phyloseq object is being built by:

library(phyloseq)
library(tidyverse)

biom_path <- file.path('biom/feature_w_tax.biom')
tree_path <- file.path('biom/tree.nwk')
map_path <- file.path('mapping.txt')
tree <- read_tree(tree_path)

table <- import_biom(BIOMfilename = biom_path,
                      parseFunction = parse_taxonomy_default,   #I use SILVA, so I rename the taxtable afterwards
                      parallel = T)
sample_map <- import_qiime_sample_data(map_path)

phylobj_full <- merge_phyloseq(table, sample_map, tree)

TBH I am planning to abandon Qiime2 and follow the Bioconductor workflow as soon as possible, but I hope this helps anybody who needs it!

@Nicheca
Copy link

Nicheca commented Jul 31, 2019

Hi @Ajsnevets and everyone else trying to get QIIME2 data into phyloseq... after some exporting and merging in R (to get around the fact that after filtering out chloroplasts and mitochondria, my taxonomy files and OTU matrix have a different # of taxa), I was finally able to wrangle my QIIME2 data into Phyloseq without using any .biom files. Here is a pipeline that I wrote (see PDF below). It is pretty clunky and uses QIIME2 + R + excel, but at least it works!

It is also partially based on the answer from https://forum.qiime2.org/t/converting-biom-files-with-taxonomic-info-for-import-in-r-with-phyloseq/2542/5 from that doesn't use any .biom files.

QIIME2_to_Phyloseq.pdf

QIIME2_to_Phyloseq.pdf

Hello @brookeweigel !
Many thanks for sharing your code! I am struggling with the sample_names(TAX) that return as NULL whereas my sample_name(OTU) includes my samples names. I do not know where my mistake could come from.
Any idea?
Thanks in advance!

@spholmes
Copy link
Contributor

spholmes commented Jul 31, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests