How to get ta taxonomy table from taxizedb? #64

GossypiumH · 2023-02-27T15:50:15Z

Hello,

In January I encountered a problem with taxize API due to my number of bacterial taxa from witch I want to retrieve taxonomy (10k+) (I posted about my problem here : ropensci/taxize#907)

People advised me to use taxizedb, it works offline and should fix my problem. However, when I try to apply a simple command as:

test = classification(name2taxid(c(taxa$specie_ID)))

taxa is a dataframe with only one collumn named specie_ID, as flolow:

> head(taxa$specie_ID) [1] "Staphylococcus sp." "Acinetobacter sp." "Cutibacterium sp." "Sphingomonas sp." "Paenarthrobacter sp." [6] "Paracoccus sp."

However, I receive an error:

> test = classification(name2taxid(c(taxa$specie_ID))) Error in name2taxid(c(taxa$specie_ID)) : Some of the input names are ambiguous, try setting out_type to 'summary'

When I set out_type to summary; I got that:

> test = classification(name2taxid(c(taxa$specie_ID), out_type="summary")) Error in dplyr::summarize(): ℹ In argument: taxids = paste(.data$tax_id, collapse = "|"). ℹ In group 1: name = "Morganella sp.". Caused by error in .data$tax_id: ! Column tax_idnot found in.data`.
Backtrace:

taxizedb::classification(name2taxid(c(taxa$specie_ID), out_type = "summary"))
rlang:::abort_data_pronoun(x, call = y)`

Apparently Morganella sp. is not recognized by taxizedb. I'm not particularly familiar with dplyr of with taxize. So I just would like to know, how I could retrieve the taxonomy for each of my species of bacteria, preferentially in the form of a table with collumns like that:

Specie_ID Kindom Phyllum Class Order family genus

The text was updated successfully, but these errors were encountered:

stitam · 2023-03-01T08:16:33Z

Thanks @GossypiumH for raising this issue.

The issue is caused by taxons that can be linked with multiple taxids:

taxizedb::name2taxid("morganella", out_type = "summary")
#> # A tibble: 3 × 2
#>   name       id    
#>   <chr>      <chr> 
#> 1 morganella 581   
#> 2 morganella 90690 
#> 3 morganella 108061

^{Created on 2023-03-01 with reprex v2.0.2}

A very small change to your approach should solve your issue: Run classification() on the id column of the name2taxid() output, not the whole object (maybe this is what you wanted to do in the first place, so it's just a typo thing?):

test = classification(name2taxid(c("morganella", "escherichia"), out_type = "summary")$id)

However, taxons with multiple taxids will inflate the number elements in your results which can cause problems in your downstream analysis. Because of this I would probably run name2taxid(out_type = "summary") first, resolve taxons with multiple taxids (investigate them manually, choose one and remove the rest from the tibble) and the then run classification()` on the data set with distinct taxons. I imagine there shouldn't be many taxons with multiple taxids.

Do you think this approach could be feasible?

GossypiumH · 2023-03-01T13:53:33Z

Hello,

Thank you for your reply ! I will try your solution, I hope I will not have too many taxon with multiple taxID.

Cheers,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get ta taxonomy table from taxizedb? #64

How to get ta taxonomy table from taxizedb? #64

GossypiumH commented Feb 27, 2023

stitam commented Mar 1, 2023 •

edited

Loading

GossypiumH commented Mar 1, 2023

How to get ta taxonomy table from taxizedb? #64

How to get ta taxonomy table from taxizedb? #64

Comments

GossypiumH commented Feb 27, 2023

stitam commented Mar 1, 2023 • edited Loading

GossypiumH commented Mar 1, 2023

stitam commented Mar 1, 2023 •

edited

Loading