-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update microorganisms
data set to latest taxonomy
#135
Comments
Remember to add the taxons mentioned in #131 |
hopefully will be fixed by an update to microorganism data set but can't seem to match Clavispora (Candida) lusitaniae. Also hoping that Nakaseomyces glabratus and Pichia kudriavzevii will arrive with update and their old names redirected to these new names, as lab are moving over to these new names. |
That’s true, it’s a discussion at our lab as well. Do you know if it’s formal already? Meaning, accepted by authoritative taxonomy sources? I’m well aware of the publications suggesting these changes, but do you know whether they are formally adopted already? |
Afraid I don't know for sure about the International Taxonomy groups definitely accepting all these changes. This is a useful paper detailing the most common changes from a medical perspective if you've not already seen it https://doi.org/10.1093/ofid/ofac559 It may well be a moving target unfortunately... |
Yes, I know that one. Unfortunately, Open Forum Infectious Diseases is ‘just’ a journal, not a taxonomically reliable source. The results/propositions of such papers must be ratified by taxonomic sources first. But I found on MycoBank, a great and reliable taxonomic source for fungi, that they do have new names for many Candida species already. I found a couple of inconsistencies though, I’ll share them here and hope that we both could have a look at it. Will be later this week probably. |
From #144, lookup these:
Perhaps they are in MycoBank? |
So I've had a further look at this. I've taken the OFID paper as a starting point. I initially compared new and old OFID names to Mycobank downloadable dataset but then discovered I've produced a reprex which may hopefully help with deciding which toxonomic names to go with. It's not perfect but hopefully can help. In an ideal world I would probably use all the names within a biotech company's MALDI-TOF database and cross reference taxon name to GBIF status. I suspect biotech are never going to give that out kind of information and it is probably not allowed as part of licence when using their software pull that kind of info from the instrument, even if it is possible, due to it being commercially sensitive. It's probably inevitable biotech companies will progress to using "new" names and to me it seems reasonable to use GBIF as the reference standard, even if the GBIF accepted taxonomic name isn't what we use most often in human/vet medicine, as long we can convert all the synonyms to this reference taxonomix name. I've corrected some spelling mistakes within the OFID paper names. Some of the old OFID names are the accepted taxonomic name in GBIF but do not have a species column when GBIF is interograted and therefore return library(tidyverse)
library(rvest)
library(rgbif)
# print all tbl rows
options(pillar.print_max = Inf)
#download and extract OFID paper tables
url <- "https://academic.oup.com/ofid/article/10/1/ofac559/6974385"
tables <- url %>%
read_html() %>%
html_table() %>%
.[seq(1, 24, 4)]# for some reason it is extracting each table 4 times
# cleean OFID tables up
clean_ofid_tlbs <- function(x) {
janitor::clean_names(x) %>%
mutate(current_name = str_replace_all(current_name, "([a-z])(?=[A-Z][^A-Z]+)", "\\1 ")) %>%
separate_longer_delim(current_name, delim = regex("\\s(?=[A-Z])")) %>%
separate_longer_delim(previous_name_s, delim = regex("\\s(?=var )")) %>%
separate_longer_delim(previous_name_s, delim = ",") %>%
mutate(
current_name = str_replace_all(
current_name,
c("Nakaseomyces bracarensisa" = "Nakaseomyces bracarensis",
"Nakaseomyces glabrataa" = "Nakaseomyces glabratus",
"Nakaseomyces nivariensisa" = "Nakaseomyces nivariensis",
"Paracoccidioides restrepoanaa" = "Paracoccidioides restrepoana",
"Talaromyces marneffeib" = "Talaromyces marneffei",
"Moesziomyces antarticus" = "Moesziomyces antarcticus",
"Apiotricum domesticum" = "Apiotrichum domesticum",
"Trematospheria grisea" = "Trematosphaeria grisea",
"Rhizopus arrhizus var delemar" = "Rhizopus arrhizus var. delemar"
)
),
current_name = str_remove(current_name, "\\(varieties no longer recognized\\)"),
current_name = case_when(current_name == "" ~ NA,
.default = current_name),
previous_name_s = str_replace_all(
previous_name_s,
c("var interdigitale" = "Trichophyton mentagrophytes var. interdigitale",
"var mentagrophytes" = "Trichophyton mentagrophytes var. mentagrophytes",
"genotype VIII" = "Trichophyton mentagrophytes genotype VIII",
"var chinensis" = "Rhizopus microsporus var. chinensis",
"var oligosporus" = "Rhizopus microsporus var. oligosporus",
"var rhizopodiformis" = "Rhizopus microsporus var. rhizopodiformis"
)
)
) %>%
drop_na() %>%
rename(ofid_old = previous_name_s, ofid_new = current_name) %>%
select(ofid_old, ofid_new)
}
ofid_tbls <- map(tables, clean_ofid_tlbs)
# Check names against GBIF backbone dataset to see if they are synomyn or accepted name
check_gbif_synonym <- function(x) {
mutate(x,
ofid_old_GBIF_ = rgbif::name_backbone_checklist(ofid_old)["status"],
GBIF_ = rgbif::name_backbone_checklist(ofid_old)["species"],
.after = ofid_old) %>%
mutate(GBIF_ofid_new_match = case_when(GBIF_$species == ofid_new ~ TRUE,
GBIF_$species != ofid_new ~ FALSE,
GBIF_$species == NA_character_ ~ FALSE,
),
.after = ofid_new)
}
syn_status <- map(ofid_tbls, check_gbif_synonym)
syn_status
#> [[1]]
#> # A tibble: 41 × 5
#> ofid_old ofid_old_GBIF_$status GBIF_$species ofid_new GBIF_ofid_new_match
#> <chr> <chr> <chr> <chr> <lgl>
#> 1 Candida bra… SYNONYM Nakaseomyces… Nakaseo… TRUE
#> 2 Candida cat… SYNONYM Diutina cate… Diutina… TRUE
#> 3 Candida col… ACCEPTED Candida coll… Torulas… FALSE
#> 4 Candida fab… SYNONYM Cyberlindner… Cyberli… TRUE
#> 5 Candida fam… ACCEPTED <NA> Debaryo… NA
#> 6 Candida gla… SYNONYM Nakaseomyces… Nakaseo… TRUE
#> 7 Candida gui… SYNONYM Meyerozyma g… Meyeroz… TRUE
#> 8 Candida kru… SYNONYM Issatchenkia… Pichia … FALSE
#> 9 Candida kef… SYNONYM Kluyveromyce… Kluyver… TRUE
#> 10 Candida pse… SYNONYM Kluyveromyce… Kluyver… TRUE
#> 11 Candida lip… SYNONYM Yarrowia lip… Yarrowi… TRUE
#> 12 Candida lus… SYNONYM Clavispora l… Clavisp… TRUE
#> 13 Candida niv… SYNONYM Nakaseomyces… Nakaseo… TRUE
#> 14 Candida neo… SYNONYM Diutina neor… Diutina… TRUE
#> 15 Candida nor… SYNONYM Pichia norve… Pichia … TRUE
#> 16 Candida par… SYNONYM Wickerhamiel… Diutina… FALSE
#> 17 Candida pel… SYNONYM Wickerhamomy… Wickerh… TRUE
#> 18 Pichia anom… SYNONYM Wickerhamomy… Wickerh… TRUE
#> 19 Candida pse… SYNONYM Diutina pseu… Diutina… TRUE
#> 20 Candida rug… SYNONYM Diutina rugo… Diutina… TRUE
#> 21 Cryptococcu… SYNONYM Naganishia a… Naganis… TRUE
#> 22 Cryptococcu… SYNONYM Cutaneotrich… Cutaneo… FALSE
#> 23 Cryptococcu… SYNONYM Cutaneotrich… Cutaneo… TRUE
#> 24 Cryptococcu… SYNONYM Papiliotrema… Papilio… TRUE
#> 25 Pseudozyma … SYNONYM Moesziomyces… Moeszio… TRUE
#> 26 Pseudozyma … SYNONYM Moesziomyces… Moeszio… TRUE
#> 27 Pseudozyma … SYNONYM Dirkmeia chu… Dirkmei… TRUE
#> 28 Pseudozyma … ACCEPTED <NA> Triodio… NA
#> 29 Pseudozyma … SYNONYM Moesziomyces… Moeszio… TRUE
#> 30 Pseudozyma … ACCEPTED <NA> Ustilag… NA
#> 31 Geotrichum … SYNONYM Saprochaete … Magnusi… FALSE
#> 32 Geotrichum … SYNONYM Magnusiomyce… Magnusi… TRUE
#> 33 Saprochaete… SYNONYM Magnusiomyce… Magnusi… TRUE
#> 34 Pichia ohme… SYNONYM Kodamaea ohm… Kodamae… TRUE
#> 35 Trichosporo… SYNONYM Cutaneotrich… Cutaneo… TRUE
#> 36 Trichosporo… SYNONYM Cutaneotrich… Cutaneo… TRUE
#> 37 Trichosporo… SYNONYM Apiotrichum … Apiotri… TRUE
#> 38 Trichosporo… SYNONYM Apiotrichum … Apiotri… TRUE
#> 39 Trichosporo… SYNONYM Cutaneotrich… Cutaneo… TRUE
#> 40 Trichosporo… SYNONYM Apiotrichum … Apiotri… TRUE
#> 41 Trichosporo… ACCEPTED Trichosporon… Apiotri… FALSE
#>
#> [[2]]
#> # A tibble: 37 × 5
#> ofid_old ofid_old_GBIF_$status GBIF_$species ofid_new GBIF_ofid_new_match
#> <chr> <chr> <chr> <chr> <lgl>
#> 1 Acremonium … SYNONYM Sarocladium … Sarocla… TRUE
#> 2 Acremonium … SYNONYM Gliomastix r… Gliomas… TRUE
#> 3 Acremonium … SYNONYM Sarocladium … Sarocla… TRUE
#> 4 Arthroderma… SYNONYM Trichophyton… Trichop… TRUE
#> 5 Cerinosteru… SYNONYM Quambalaria … Quambal… TRUE
#> 6 Sporothrix … SYNONYM Quambalaria … Quambal… TRUE
#> 7 Fusarium di… SYNONYM Bisifusarium… Bisifus… TRUE
#> 8 Fusarium fa… ACCEPTED Fusarium fal… Neocosm… FALSE
#> 9 Acremonium … SYNONYM Fusarium fal… Neocosm… FALSE
#> 10 Fusarium ke… ACCEPTED Fusarium ker… Neocosm… FALSE
#> 11 Fusarium li… ACCEPTED Fusarium lic… Neocosm… FALSE
#> 12 Fusarium pe… ACCEPTED Fusarium pet… Neocosm… FALSE
#> 13 Fusarium so… ACCEPTED Fusarium sol… Neocosm… FALSE
#> 14 Geosmithia … SYNONYM Rasamsonia a… Rasamso… TRUE
#> 15 Penicillium… SYNONYM Rasamsonia a… Rasamso… TRUE
#> 16 Gibberella … ACCEPTED Gibberella f… Fusariu… FALSE
#> 17 Lecythophor… SYNONYM Coniochaeta … Conioch… TRUE
#> 18 Phialophora… SYNONYM Coniochaeta … Conioch… TRUE
#> 19 Microsporum… SYNONYM Paraphyton c… Paraphy… TRUE
#> 20 Microsporum… ACCEPTED Microsporum … Nannizz… FALSE
#> 21 Microsporum… ACCEPTED Microsporum … Lophoph… FALSE
#> 22 Microsporum… ACCEPTED Microsporum … Nannizz… FALSE
#> 23 Microsporum… ACCEPTED Microsporum … Nannizz… FALSE
#> 24 Microsporum… SYNONYM Trichophyton… Nannizz… FALSE
#> 25 Neosartorya… SYNONYM Aspergillus … Aspergi… TRUE
#> 26 Neosartorya… SYNONYM Aspergillus … Aspergi… TRUE
#> 27 Aspergillus… SYNONYM Aspergillus … Aspergi… TRUE
#> 28 Neosartorya… SYNONYM Aspergillus … Aspergi… TRUE
#> 29 Paecilomyce… SYNONYM Purpureocill… Purpure… TRUE
#> 30 Paecilomyce… SYNONYM Marquandomyc… Marquan… TRUE
#> 31 Penicillium… SYNONYM Talaromyces … Talarom… TRUE
#> 32 Penicillium… SYNONYM Talaromyces … Talarom… TRUE
#> 33 Trichophyto… SYNONYM Arthroderma … Arthrod… TRUE
#> 34 Trichophyto… ACCEPTED Trichophyton… Arthrod… FALSE
#> 35 Trichophyto… SYNONYM Trichophyton… Trichop… FALSE
#> 36 Trichophyto… SYNONYM Trichophyton… Trichop… TRUE
#> 37 Trichophyto… ACCEPTED Trichophyton… Trichop… FALSE
#>
#> [[3]]
#> # A tibble: 27 × 5
#> ofid_old ofid_old_GBIF_$status GBIF_$species ofid_new GBIF_ofid_new_match
#> <chr> <chr> <chr> <chr> <lgl>
#> 1 Emmonsia cr… ACCEPTED <NA> Emergom… NA
#> 2 Emmonsia he… SYNONYM Blastomyces … Blastom… TRUE
#> 3 Emmonsia pa… SYNONYM Blastomyces … Blastom… TRUE
#> 4 Emmonsia so… ACCEPTED Emmonsia soli Emergom… FALSE
#> 5 Emmonsia “s… ACCEPTED <NA> Blastom… NA
#> 6 Emmonsia “s… ACCEPTED <NA> Emergom… NA
#> 7 Emmonsia pa… SYNONYM Emergomyces … Emergom… TRUE
#> 8 Histoplasma… ACCEPTED <NA> Histopl… NA
#> 9 Histoplasma… ACCEPTED <NA> Histopl… NA
#> 10 Histoplasma… ACCEPTED <NA> Histopl… NA
#> 11 Histoplasma… ACCEPTED <NA> Histopl… NA
#> 12 Lacazia lob… SYNONYM Paracoccidio… Paracoc… TRUE
#> 13 Paracoccidi… ACCEPTED Paracoccidio… Paracoc… FALSE
#> 14 Paracoccidi… ACCEPTED Paracoccidio… Paracoc… FALSE
#> 15 Paracoccidi… ACCEPTED Paracoccidio… Paracoc… FALSE
#> 16 Paracoccidi… ACCEPTED Paracoccidio… Paracoc… FALSE
#> 17 Paracoccidi… ACCEPTED Paracoccidio… Paracoc… FALSE
#> 18 Penicillium… SYNONYM Talaromyces … Talarom… TRUE
#> 19 Sporothrix … ACCEPTED Sporothrix s… Sporoth… FALSE
#> 20 Sporothrix … ACCEPTED Sporothrix s… Sporoth… FALSE
#> 21 Sporothrix … ACCEPTED Sporothrix s… Sporoth… FALSE
#> 22 Sporothrix … ACCEPTED Sporothrix s… Sporoth… FALSE
#> 23 Sporothrix … ACCEPTED Sporothrix p… Sporoth… FALSE
#> 24 Sporothrix … ACCEPTED Sporothrix p… Sporoth… FALSE
#> 25 Sporothrix … ACCEPTED Sporothrix p… Sporoth… FALSE
#> 26 Sporothrix … ACCEPTED Sporothrix p… Sporoth… FALSE
#> 27 Sporothrix … ACCEPTED Sporothrix p… Sporoth… FALSE
#>
#> [[4]]
#> # A tibble: 9 × 5
#> ofid_old ofid_old_GBIF_$status GBIF_$species ofid_new GBIF_ofid_new_match
#> <chr> <chr> <chr> <chr> <lgl>
#> 1 Bipolaris au… SYNONYM Curvularia a… Curvula… TRUE
#> 2 Bipolaris ha… SYNONYM Curvularia h… Curvula… TRUE
#> 3 Bipolaris sp… SYNONYM Curvularia s… Curvula… TRUE
#> 4 Ochroconis g… SYNONYM Verruconis g… Verruco… TRUE
#> 5 Phialophora … SYNONYM Pleurostoma … Pleuros… TRUE
#> 6 Pseudallesch… ACCEPTED Pseudallesch… Scedosp… FALSE
#> 7 Ramichloridi… SYNONYM Rhinocladiel… Rhinocl… TRUE
#> 8 Ramichloridi… SYNONYM Myrmecridium… Myrmecr… TRUE
#> 9 Scedosporium… ACCEPTED Scedosporium… Lomento… FALSE
#>
#> [[5]]
#> # A tibble: 8 × 5
#> ofid_old ofid_old_GBIF_$status GBIF_$species ofid_new GBIF_ofid_new_match
#> <chr> <chr> <chr> <chr> <lgl>
#> 1 Leptosphaeri… SYNONYM Falciformisp… Falcifo… TRUE
#> 2 Leptosphaeri… SYNONYM Falciformisp… Falcifo… TRUE
#> 3 Scytalidium … SYNONYM Neoscytalidi… Neoscyt… TRUE
#> 4 Scytalidium … SYNONYM Neoscytalidi… Neoscyt… TRUE
#> 5 Hendersonula… SYNONYM Neoscytalidi… Nattras… FALSE
#> 6 Pyrenochaeta… SYNONYM Medicopsis r… Medicop… TRUE
#> 7 Pyrenochaeta… SYNONYM Nigrograna m… Nigrogr… TRUE
#> 8 Madurella gr… SYNONYM Trematosphae… Tremato… TRUE
#>
#> [[6]]
#> # A tibble: 13 × 5
#> ofid_old ofid_old_GBIF_$status GBIF_$species ofid_new GBIF_ofid_new_match
#> <chr> <chr> <chr> <chr> <lgl>
#> 1 Absidia cor… SYNONYM Lichtheimia … Lichthe… TRUE
#> 2 Mycocladus … SYNONYM Lichtheimia … Lichthe… TRUE
#> 3 Rhizopus az… SYNONYM Rhizopus mic… Rhizopu… TRUE
#> 4 Rhizopus de… SYNONYM Rhizopus arr… Rhizopu… FALSE
#> 5 Rhizopus mi… ACCEPTED Rhizopus mic… Rhizopu… TRUE
#> 6 Rhizopus mi… SYNONYM Rhizopus mic… Rhizopu… TRUE
#> 7 Rhizopus mi… SYNONYM Rhizopus mic… Rhizopu… TRUE
#> 8 Rhizopus mi… SYNONYM Rhizopus mic… Rhizopu… TRUE
#> 9 Rhizopus or… SYNONYM Rhizopus arr… Rhizopu… TRUE
#> 10 Rhizomucor … SYNONYM Mucor irregu… Mucor i… TRUE
#> 11 Saksenaea v… ACCEPTED Saksenaea va… Saksena… FALSE
#> 12 Saksenaea v… ACCEPTED Saksenaea va… Saksena… FALSE
#> 13 Saksenaea v… ACCEPTED Saksenaea va… Saksena… FALSE Created on 2024-05-05 with reprex v2.1.0 |
In addtion to the above I have identified some other issues which came up when using AMR on some data. The following were the problems I identified from the output of
Although LPSN lists M. bovis taxonomically as M. tb, UK Mycobacterial reference labs still refer to it as M. bovis -- this has clinical implications as M. bovis is intrinsically resistant to pyrazinamide and therefore requires a longer treatment duration than M. tb standard short course therapy. Salmonellae are always difficult but I thought Enteritidis was a serovar and so should be Salmonella enterica Enteritidis. This is the current WHO acrredited list for Salmonella serovars in case you don't have it https://www.pasteur.fr/sites/default/files/veng_0.pdf
|
Thanks for the great work up! GBIF is not very up to date with bacterial taxonomy - if they release in November (which they do annually) then there are still hundreds of outdated species according to LPSN that strictly follows IJSEM publications. But the I’ll look deeper into what you mentioned here, great to have this as a reference, so many thanks! |
NB Since a visit by the MALDI-TOF engineer in Aug 2023, our MALDI-TOF in Newcastle now only reports e.g. Pichia kudriavzevii rather than C. krusei & e.g. Clavispora lusitaniae rather than Candida lusitaniae.. |
NB Currently AMR accidentally parses Candida lusitaniae as Coryne. flaccumfaciens (using as.mo()) |
NB Currently AMR accidentally parses Scedosporium prolificans as Arachnia propionica |
I understand that clinical impact as well, didn't check M. bovis before. They base themselves on https://www.ncbi.nlm.nih.gov/pubmed/29205127. I'll think about a useful solution. For the fungi, such as the old Candida lusitaniae and Scedosporium prolificans mentioned, we will rely on MycoBank for AMR v3.0. It will be MUCH more comprehensive on fungi than what we had before, I think it's great! Still working on it this summer, hope to release a new version soon after. |
We further recommend use of the infrasubspecific term 'variant' ('var.') and infrasubspecific designations that generally retain the historical nomenclature associated with the groups or otherwise convey such characteristics, e.g. M. tuberculosis var. bovis. from https://www.ncbi.nlm.nih.gov/pubmed/29205127 Yeah, but LPSN didn't do that, they just make it refer to M. tuberculosis... Pfff, taxonomy is so hard sometimes. |
Sorry Mathjis, didn't see this reply before I posted #164 Also confirming troubles with some of the abovementioned Having problems also with Enterobius vermicularis and Trichuris trichuria - even if I were to manually code to their mo codes (e.g. AN_ENTRB_VRMC) and parse through as.mo it changes them all to B_GRAMN... Currently getting around it by modifying input to genus only (no problems observed for this) |
Hi @send2dan, @silverfoxdoc, and @theanita1, Today I updated the Please check if all works as you expect, by running So I hope you like it! You can quickly check it by installing this way: # raw source - this always works
remotes::install_github("msberends/AMR")
# compiled binaries - this should work within a couple of hours if all tests pass
install.packages("AMR", repos = "https://msberends.r-universe.dev") Latest version is |
[celebrate] WEIAND, Daniel (THE NEWCASTLE ... reacted to your message:
…________________________________
From: Matthijs Berends ***@***.***>
Sent: Sunday, September 29, 2024 8:38:22 PM
To: msberends/AMR ***@***.***>
Cc: WEIAND, Daniel (THE NEWCASTLE UPON TYNE HOSPITALS NHS FOUNDATION TRUST) ***@***.***>; Mention ***@***.***>
Subject: Re: [msberends/AMR] Update `microorganisms` data set to latest taxonomy (Issue #135)
This message originated from outside of NHSmail. Please do not click links or open attachments unless you recognise the sender and know the content is safe.
Hi @send2dan<https://github.com/send2dan>, @silverfoxdoc<https://github.com/silverfoxdoc>, and @theanita1<https://github.com/theanita1>,
Today I updated the microorganisms, now containing over 20,000 new fungal species compared to AMR v2.1.1 that is currently on CRAN!
Please check if all works as you expect, by running as.mo() or mo_name() on any fungal species that you feel should be supported. Otherwise, I'll add more species on your cues. It's still hard to keep the data set small with only microbial species, while also supporting all relevant fungal species. Simply adding all fungi (including mushrooms) would cause an addition of over 300,000 species - CRAN will never accept that amount of data and it would make the package super slow.
So I hope you like it! You can quickly check it by installing this way:
# raw source - this always works
remotes::install_github("msberends/AMR")
# compiled binaries - this should work within a couple of hours if all tests pass
install.packages("AMR", repos = "https://msberends.r-universe.dev")
Latest version is 2.1.1.9081 I believe.
This should also address #131<#131>, #157<#157>, and #164<#164>.
—
Reply to this email directly, view it on GitHub<#135 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AVYNORNWQRJO4HD42FY3NC3ZZBQL5AVCNFSM6AAAAABDTHNDP2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOBRGU4TINJZG4>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
************************************************************************************** ******************************
This message may contain confidential information. If you are not the intended recipient please:
i) inform the sender that you have received the message in error before deleting it; and
ii) do not disclose, copy or distribute information in this e-mail or take any action in relation to its content (to do so is strictly prohibited and may be unlawful).
Thank you for your co-operation.
NHSmail is the secure email, collaboration and directory service available for all NHS staff in England. NHSmail is approved for exchanging patient data and other sensitive information with NHSmail and other accredited email services.
For more information and to find out how you can switch visit Joining NHSmail – NHSmail Support<https://support.nhs.net/article-categories/joining-nhsmail/>
|
No description provided.
The text was updated successfully, but these errors were encountered: