get_names result order #78

lisafisler · 2020-09-22T06:50:26Z

Hello,

My issue is quite simple: the get_names function gives me the correct names when I feed it with a species code (here "itis", but it's the same with "col") but the result is in a weird order. For example here "ITIS:715228", which gives the species Megapodius decollatus, appears as first element in the second request, although it should be second. This problem does not occur with the get_ids function which gives me the right order.

library(tidyverse)
library(taxadb)
td_create("itis")
get_names("ITIS:715228")
[1] "Megapodius decollatus"
get_names(c("ITIS:553896", "ITIS:715228", NA))
[1] "Megapodius decollatus" "Falcipennis canadensis" NA

Thank you for your help with this issue.

For info, my sessionInfo() gives out:

R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] fr_CH.UTF-8/fr_CH.UTF-8/fr_CH.UTF-8/C/fr_CH.UTF-8/fr_CH.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] forcats_0.5.0 stringr_1.4.0 dplyr_1.0.1 purrr_0.3.4 readr_1.3.1 tidyr_1.1.1 tibble_3.0.3
[8] ggplot2_3.3.2 tidyverse_1.3.0 taxadb_0.1.0

loaded via a namespace (and not attached):
[1] progress_1.2.2 tidyselect_1.1.0 haven_2.3.1 colorspace_1.4-1 vctrs_0.3.2 generics_0.0.2
[7] yaml_2.2.1 blob_1.2.1 rlang_0.4.7 pillar_1.4.6 glue_1.4.1 withr_2.2.0
[13] DBI_1.1.0 rappdirs_0.3.1 bit64_4.0.2 dbplyr_1.4.4 modelr_0.1.8 readxl_1.3.1
[19] lifecycle_0.2.0 munsell_0.5.0 gtable_0.3.0 cellranger_1.1.0 rvest_0.3.6 memoise_1.1.0
[25] curl_4.3 fansi_0.4.1 broom_0.7.0 arkdb_0.0.5 Rcpp_1.0.5 backports_1.1.8
[31] scales_1.1.1 jsonlite_1.7.0 fs_1.5.0 bit_4.0.4 hms_0.5.3 digest_0.6.25
[37] stringi_1.4.6 duckdb_0.2.1 grid_4.0.2 cli_2.0.2 tools_4.0.2 magrittr_1.5
[43] RSQLite_2.2.0 crayon_1.3.4 pkgconfig_2.0.3 ellipsis_0.3.1 xml2_1.3.2 prettyunits_1.1.1
[49] reprex_0.3.0 lubridate_1.7.9 assertthat_0.2.1 httr_1.4.2 rstudioapi_0.11 R6_2.4.1
[55] compiler_4.0.2

The text was updated successfully, but these errors were encountered:

cboettig · 2020-09-22T16:47:21Z

Wow, that's crazy! Sorry about that. I'm having trouble reproducing this.

Does it do that without the NA entry too?

Here's my sessionInfo:


> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-openmp/libopenblasp-r0.3.8.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C              LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] taxadb_0.1.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5        pillar_1.4.6      compiler_4.0.2    dbplyr_1.4.4      R.methodsS3_1.8.1 prettyunits_1.1.1 R.utils_2.10.1    tools_4.0.2       progress_1.2.2    bit_4.0.4         digest_0.6.25     packrat_0.5.0     MonetDBLite_0.6.1 RSQLite_2.2.0     jsonlite_1.7.1    evaluate_0.14     memoise_1.1.0     lifecycle_0.2.0   tibble_3.0.3     
[20] pkgconfig_2.0.3   rlang_0.4.7       DBI_1.1.0         rstudioapi_0.11   curl_4.3          yaml_2.2.1        xfun_0.17         duckdb_0.2.1      arkdb_0.0.6       dplyr_1.0.2       knitr_1.29        rappdirs_0.3.1    generics_0.0.2    vctrs_0.3.4       hms_0.5.3         bit64_4.0.5       tidyselect_1.1.0  glue_1.4.2        R6_2.4.1         
[39] rmarkdown_2.3     readr_1.3.1       purrr_0.3.4       blob_1.2.1        magrittr_1.5      codetools_0.2-16  ellipsis_0.3.1    htmltools_0.5.0   assertthat_0.2.1  stringi_1.5.3     crayon_1.3.4      R.oo_1.24.0

cboettig · 2020-09-22T17:02:35Z

p.s. you may already know this, but meanwhile use filter_id etc instead to get the full table, rather than rely on ordering in get_names(). You might try updating packages to the latest versions too (e.g. via update.packages()

lisafisler · 2020-09-22T19:16:26Z

It does unfortunately the same with or without the NA. I have updated all my possible packages, and no change.

Thanks yes, it works well with filter_id instead, even though it takes one more step to get to the information I want. Keep me posted if you can find the problem and I will work with filter_id meanwhile.

cboettig · 2020-09-22T22:46:37Z

Thanks, some database backends don't enforce consistent row-ordering. I've added an additional command to assert consistent order, can you please test again with the dev version?

remotes::install_github("ropensci/taxadb")

lisafisler · 2020-09-23T04:43:28Z

Great, thanks! It seems to have done the trick! Hurray :-)

The only trouble I see is that the get_names function seems a bit slower now than when I had the other version. It's only slightly slower, but as I can clearly see the difference with my small dataset of 3 species, I am just worried that it would much increase with a huge dataset. But maybe this second step will always take up the same amount of time, no matter how many species, in which case it wouldn't increase that much the time needed and that wouldn't be a problem in the end.

cboettig · 2020-09-23T04:49:32Z

thanks! Interesting that it's noticeably slower. I think you won't see that scale linearly with a very large number of names. Can you tell me what td_connect() shows?

If the speed of get_names is important to your workflow; you could be our first beta tester for https://github.com/cboettig/taxalight/ ? 😊

lisafisler · 2020-09-23T06:16:16Z

It gives the result almost instantly with the original version, and it takes approximately 3 seconds for one get_names request with the dev version.

> td_connect()
<duckdb_connection 27f60 driver=<duckdb_driver 05450 dbdir='/Users/lisafisler/Library/Application Support/taxadb/database/duckdb' read_only=FALSE>>

I don't really have a very big database, I was just concerned for people who do. But I'd be happy to test taxalight anyway! What are the main differences with taxadb?

cboettig · 2020-09-23T17:05:50Z

taxalight has only get_names() get_ids() and tl (which returns the taxonomic table for the requested species and/or ids in question). You can't do operations on the full taxonomic database with taxalight, like asking "how many names are in the family Aves". It's also stricter about the matching; i.e. a scientific name must match case exactly and there's no 'starts with' etc options.

At the moment it only has accepted taxonomic identifiers and scientific names available as queries. We can probably add query by common name and query by synonym identifier (for authorities that assign IDs to synonyms).

lisafisler · 2020-09-25T13:23:00Z

Thanks! I'll give it a go.

cboettig closed this as completed Mar 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get_names result order #78

get_names result order #78

lisafisler commented Sep 22, 2020 •

edited

Loading

cboettig commented Sep 22, 2020

cboettig commented Sep 22, 2020

lisafisler commented Sep 22, 2020

cboettig commented Sep 22, 2020 •

edited

Loading

lisafisler commented Sep 23, 2020

cboettig commented Sep 23, 2020

lisafisler commented Sep 23, 2020

cboettig commented Sep 23, 2020

lisafisler commented Sep 25, 2020

get_names result order #78

get_names result order #78

Comments

lisafisler commented Sep 22, 2020 • edited Loading

cboettig commented Sep 22, 2020

cboettig commented Sep 22, 2020

lisafisler commented Sep 22, 2020

cboettig commented Sep 22, 2020 • edited Loading

lisafisler commented Sep 23, 2020

cboettig commented Sep 23, 2020

lisafisler commented Sep 23, 2020

cboettig commented Sep 23, 2020

lisafisler commented Sep 25, 2020

lisafisler commented Sep 22, 2020 •

edited

Loading

cboettig commented Sep 22, 2020 •

edited

Loading