New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parentnames to output of bold_identify() #36
Comments
So you want parentnames included in the output for the
I don't follow this. You want to get parentnames using a taxonomic ID rather than taxonomic name? But isn't that what |
Yes, the convenient thing would be to have parentnames as additional columns in the bold_identify output. In general, i would expect that the taxonomic position of a genus_species would not be obvious to me (e.g. what is the Class/Order/Family of Allomerus octospinosus?). Also, with parentname columns, i would be able to sort output tables by higher ranks (e.g. Insecta, Hymenoptera), which is quite useful. The reason that I request using taxid, not taxonomicidentification is that sometimes genus names are used in different kingdoms (famously, Anura is a plant and a frog). Thanks for replying so quickly. I don't seem to be able to set up an email notification that you have replied on github. I'll look around. |
not sure we're on the same page here. is this line of discussion talking about the |
for email notifications, perhaps go to this page https://github.com/settings/notifications |
@dougwyu i started a new fxn. reinstall like see let me know what you think |
Fantastic and thank you! I successfully ran the new command and was initially confused by all the additional rows, but i see what you've done: each ID is effectively its own little dataframe. I am thinking that it might be more useful for the output to be wider, such that each of the original hits remains one line. Here is an image of what I'm thinking. It maintains most of the newly added information but allows one to filter, sort, and tally the output more easily. Perhaps the number of returned taxids differs per sequence(?), but it seems fine to settle on a fixed set of taxonomic ranks: phylum, class, order, family, subfamily, genus, species. ps this is what I ran: testseq <- list(eb4909 = "GAATAAATAATATAAGATTTTGATTACTCCCTCCTTCTTTATTtttATTAATTTTAAGAAATTTTATTGGAACGGGTGTAGGAACCGGATGAACTTTATATCCTCCTTTATCATCTATTGTTGGACATGATTCACCTTCTGTAGATTTAGGAATTttttCTATCCATATTGCTGGAATTTCCTCAATTATAGGATCAATTAATTTTATTGTTACTATTTTAAATATACacacaAaaaCTCATTCACTAAATTTTCTTCCTTTATTCACATGATCAATTTTAATTACAGCAATTCTTCTTCTGTTATCATTACCAGTTCTTGCAGGAGCAATTACTATACTTCTTACAGATCGAAATCTTAATACATCTTtttttGATCCCGCAGGTGGgggggATCCAATTTTATACCAACACTTATTTT") |
I thought about using wide format, but thought it made more sense to give back the results as I did with repeated rows for each record. Data.frame for parents is like $`Paratergatis longimanus`
taxid taxon tax_rank tax_division parentid parentname taxonrep
1 20 Arthropoda phylum Animals 1 <NA> Arthropoda
2 69 Malacostraca class Animals 20 Arthropoda Malacostraca
3 336 Decapoda order Animals 69 Malacostraca Decapoda
4 1541 Xanthidae family Animals 336 Decapoda Xanthidae
5 305321 Zosiminae subfamily Animals 1541 Xanthidae <NA>
6 322442 Paratergatis genus Animals 305321 Zosiminae <NA>
7 503362 Paratergatis longimanus species Animals 322442 Paratergatis <NA> In your eg above you just use two of those columns. When I was thinking wide format, i thought it way to many columns to add if we used all, but I guess if it's just two columns it's more palatable to add those columns. |
@dougwyu try it again after reinstalling, see new parameter |
That works great! I have tried with one sequence and with 5 sequences. Exactly what I need (and I suspect many others). Thanks very much Scott. |
great |
Hi there,
When I use bold_identify, I get the lowest level taxonomic identification for that sequence (taxonomicidentification field), but it would be very useful if we could get the parentnames for that identification. The bold APIs do provide this information if i use 3 different bold package commands (see below), but i now would have to do some programming in R (not my strength) to insert the parentnames into the bold_identify output table. It seems to me that this would be better done within the bold package, if you fancy it.
also, maybe i have missed something, but i think it would be nicer to get parentnames from a taxid, not a taxonomicidentification field, given the (small) possibility for ambiguity.
thanks,
doug
The text was updated successfully, but these errors were encountered: