You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am very happy you had the function name_backbone_checklist() to rgbif! 👍
I had no idea you were interested on something like this. It will make our own inborutils::gbif_species_name_match() unnecessary and we will deprecate it soon.
While using name_backbone_checklist() I found slightly strange that verbose arg is described as:
(logical) should the matching return non-exact matches
but the following occurs:
case1: verbose is FALSE
fuzzy matches are returned. I have always conceived fuzzy matches as non-exact matches.
See example below:
library(rgbif)
name_data<-data.frame(
name= c(
"Cirsium arvense (L.) Scop.", # a plant"Puma concuolor (Linnaeus, 1771)", # a mis-spelled big cat"Fake species (John Waller 2021)", # a fake species"Calopteryx"# Just a Genus
), description= c(
"a plant",
"a mis-spelled big cat",
"a fake species",
"just a GENUS"
),
kingdom= c(
"Plantae",
"Animalia",
"Johnlia",
"Animalia"
)
)
output<- name_backbone_checklist(name_data)
output[,c("scientificName", "verbatim_name", "matchType")]
#> # A tibble: 4 × 3#> scientificName verbatim_name matchType#> <chr> <chr> <chr> #> 1 Cirsium arvense (L.) Scop. Cirsium arvense (L.) Scop. EXACT #> 2 Puma concolor (Linnaeus, 1771) Puma concuolor (Linnaeus, 1771) FUZZY #> 3 <NA> Fake species (John Waller 2021) NONE #> 4 Calopteryx Leach, 1815 Calopteryx EXACTCreatedon2022-11-30withreprexv2.0.2
case2: verbose is TRUE and matchType is NONE
If verbose is TRUE I get, as expected, more rows. However, these new rows have machType equal to EXACT or FUZZY, which seems a contradiction based on documentation of arg verbatim. So, filtering on matchType = EXACT is different depending on the value of verbose arg. The only logic rule to identify these "suspect" exact matches in the output df is that they are linked to the same verbatim_index values with matchType = NONE.
Example:
library(rgbif)
name_data<-data.frame(
name= c(
"Cirsium arvense (L.) Scop.", # a plant"Puma concuolor (Linnaeus, 1771)", # a mis-spelled big cat"Fake species (John Waller 2021)"# a fake species
), description= c(
"a plant",
"a mis-spelled big cat",
"a fake species"
),
kingdom= c(
"Plantae",
"Animalia",
"Johnlia"
)
)
output<- name_backbone_checklist(name_data, verbose=TRUE)
output[,c("scientificName", "verbatim_name", "matchType")]
#> # A tibble: 6 × 3#> scientificName verbatim_name matchType#> <chr> <chr> <chr> #> 1 Cirsium arvense (L.) Scop. Cirsium arvense (L.) Scop. EXACT #> 2 Cirsium arcense Scop. Cirsium arvense (L.) Scop. FUZZY #> 3 Cirsium apoense Nakai Cirsium arvense (L.) Scop. FUZZY #> 4 Puma concolor (Linnaeus, 1771) Puma concuolor (Linnaeus, 1771) FUZZY #> 5 <NA> Fake species (John Waller 2021) NONE #> 6 Faku Péringuey, 1916 Fake species (John Waller 2021) FUZZYno_match_idx<- subset(output, matchType=="NONE")$verbatim_indexno_match_output_verbatim<- subset(
output,
verbatim_index%in%no_match_idx&matchType!="NONE"
)
no_match_output_verbatim#> # A tibble: 1 × 26#> usageKey scientifi…¹ canon…² rank status confi…³ match…⁴ kingdom phylum order#> <int> <chr> <chr> <chr> <chr> <int> <chr> <chr> <chr> <chr>#> 1 1725165 Faku Périn… Faku GENUS SYNON… 39 FUZZY Animal… Arthr… Orth…#> # … with 16 more variables: family <chr>, genus <chr>, species <chr>,#> # kingdomKey <int>, phylumKey <int>, classKey <int>, orderKey <int>,#> # familyKey <int>, genusKey <int>, speciesKey <int>, synonym <lgl>,#> # class <chr>, acceptedUsageKey <int>, verbatim_name <chr>,#> # verbatim_kingdom <chr>, verbatim_index <dbl>, and abbreviated variable#> # names ¹scientificName, ²canonicalName, ³confidence, ⁴matchTypeCreatedon2022-11-30withreprexv2.0.2
I hope I explained the issue clearly enough. Improving the documentation of verbatim arg can help, but I am not sure will be enough. Maybe reshaping the output with verbatim = TRUE could help (see #515)?
The text was updated successfully, but these errors were encountered:
damianooldoni
changed the title
Unclear documentation or suspect behavior of non-exact matches
name_backbone_checklist: unclear documentation or suspect behavior of non-exact matches
Nov 30, 2022
damianooldoni
changed the title
name_backbone_checklist: unclear documentation or suspect behavior of non-exact matches
name_backbone_checklist: unclear documentation or suspect behavior for non-exact matches
Nov 30, 2022
I am very happy you had the function
name_backbone_checklist()
torgbif
! 👍I had no idea you were interested on something like this. It will make our own
inborutils::gbif_species_name_match()
unnecessary and we will deprecate it soon.While using
name_backbone_checklist()
I found slightly strange thatverbose
arg is described as:but the following occurs:
case1:
verbose
isFALSE
fuzzy matches are returned. I have always conceived fuzzy matches as non-exact matches.
See example below:
case2:
verbose
isTRUE
andmatchType
isNONE
If verbose is TRUE I get, as expected, more rows. However, these new rows have
machType
equal toEXACT
orFUZZY
, which seems a contradiction based on documentation of argverbatim
. So, filtering onmatchType
=EXACT
is different depending on the value ofverbose
arg. The only logic rule to identify these "suspect" exact matches in the output df is that they are linked to the sameverbatim_index
values withmatchType
=NONE
.Example:
case2:
verbose
isTRUE
andmatchType
isFUZZY
One exact match with
verbose
=FALSE
:versus four exact matches with
verbose
=TRUE
(plus three fuzzy, which I would expect as we show non exact matches viaverbose
):Conclusion
I hope I explained the issue clearly enough. Improving the documentation of
verbatim
arg can help, but I am not sure will be enough. Maybe reshaping the output withverbatim
=TRUE
could help (see #515)?Thanks a lot!
Session Info
The text was updated successfully, but these errors were encountered: