Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with validate_names() #121

Closed
PMassicotte opened this issue May 12, 2017 · 6 comments
Closed

Problem with validate_names() #121

PMassicotte opened this issue May 12, 2017 · 6 comments
Labels
Milestone

Comments

@PMassicotte
Copy link

Hi there.

I am trying to use this library to extract information from my collected species. I think I found a bug when validate_names() is not finding the requested species. As you can see in the second example, the valid name is not found, hence it is repeating the last valid name. One solution would be to always return a vector of the same size as the requested list. If the species is not found, simply return NA.

                                                       
library(dplyr)                                         
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(rfishbase)                                     

This works

df <- data_frame(                                      
myname = c("Oreochromis niloticus", "Salmo trutta")    
)                                                      
                                                       
df %>%                                                 
mutate(valide_name = validate_names(myname))           
#> Warning: FishBase says that 'Salmo trutta' can also be misapplied to other species
#>                     but is returning only the best match.  
#>                     See synonyms('Salmo trutta') for details
#> # A tibble: 2 × 2
#>                  myname           valide_name
#>                   <chr>                 <chr>
#> 1 Oreochromis niloticus Oreochromis niloticus
#> 2          Salmo trutta          Salmo trutta

This crashes

df <- data_frame(                                      
myname = c("Oreochromis niloticus", "Salmo truttaxxxx")
)                                                      
                                                       
df %>%                                                 
mutate(valide_name = validate_names(myname))           
#> Warning in check_and_parse(resp): Bad Request (HTTP 400).
#> Warning: no results found for query https://fishbase.ropensci.org/synonyms?
#> SynSpecies=truttaxxxx&SynGenus=Salmo&limit=50&fields=SynGenus%2CSynSpecies
#> %2CValid%2CMisspelling%2CStatus%2CSynonymy%2CCombination%2CSpecCode
#> %2CSynCode%2CCoL_ID%2CTSN%2CWoRMS_ID
#> Warning: Unknown or uninitialised column: 'SpecCode'.

#> Warning: Unknown or uninitialised column: 'SpecCode'.
#> Warning: No match found for species 'Salmo truttaxxxx'
#> # A tibble: 2 × 2
#>                  myname           valide_name
#>                   <chr>                 <chr>
#> 1 Oreochromis niloticus Oreochromis niloticus
#> 2      Salmo truttaxxxx Oreochromis niloticus
@cboettig
Copy link
Member

@sckott can you take a look at this when you get a chance?

@sckott
Copy link
Contributor

sckott commented May 16, 2017

@PMassicotte can you share your sessionInfo()

@PMassicotte
Copy link
Author

PMassicotte commented May 16, 2017

@sckott

                                                       
library(dplyr)                                         
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(rfishbase)                                     

This works

df <- data_frame(                                      
myname = c("Oreochromis niloticus", "Salmo trutta")    
)                                                      
                                                       
df %>%                                                 
mutate(valide_name = validate_names(myname))           
#> Warning: FishBase says that 'Salmo trutta' can also be misapplied to other species
#>                     but is returning only the best match.  
#>                     See synonyms('Salmo trutta') for details
#> # A tibble: 2 × 2
#>                  myname           valide_name
#>                   <chr>                 <chr>
#> 1 Oreochromis niloticus Oreochromis niloticus
#> 2          Salmo trutta          Salmo trutta

This crashes

df <- data_frame(                                      
myname = c("Oreochromis niloticus", "Salmo truttaxxxx")
)                                                      
                                                       
df %>%                                                 
mutate(valide_name = validate_names(myname))           
#> Warning in check_and_parse(resp): Bad Request (HTTP 400).
#> Warning: no results found for query https://fishbase.ropensci.org/synonyms?
#> SynSpecies=truttaxxxx&SynGenus=Salmo&limit=50&fields=SynGenus%2CSynSpecies
#> %2CValid%2CMisspelling%2CStatus%2CSynonymy%2CCombination%2CSpecCode
#> %2CSynCode%2CCoL_ID%2CTSN%2CWoRMS_ID
#> Warning: Unknown or uninitialised column: 'SpecCode'.

#> Warning: Unknown or uninitialised column: 'SpecCode'.
#> Warning: No match found for species 'Salmo truttaxxxx'
#> # A tibble: 2 × 2
#>                  myname           valide_name
#>                   <chr>                 <chr>
#> 1 Oreochromis niloticus Oreochromis niloticus
#> 2      Salmo truttaxxxx Oreochromis niloticus
                                                       
sessionInfo()                                          
#> R version 3.4.0 (2017-04-21)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Linux Mint 18.1
#> 
#> Matrix products: default
#> BLAS: /usr/lib/openblas-base/libblas.so.3
#> LAPACK: /usr/lib/libopenblasp-r0.2.18.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8    
#>  [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8   
#>  [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] rfishbase_2.1.2 dplyr_0.5.0    
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_0.12.10       tidyr_0.6.2        digest_0.6.12     
#>  [4] rprojroot_1.2      assertthat_0.2.0   R6_2.2.1          
#>  [7] jsonlite_1.4       DBI_0.6-1          backports_1.0.5   
#> [10] magrittr_1.5       evaluate_0.10      httr_1.2.1        
#> [13] stringi_1.1.5      curl_2.6           lazyeval_0.2.0    
#> [16] rmarkdown_1.5.9000 tools_3.4.0        stringr_1.2.0     
#> [19] yaml_2.1.14        compiler_3.4.0     htmltools_0.3.6   
#> [22] knitr_1.15.1       tibble_1.3.0

sckott added a commit that referenced this issue May 18, 2017
give back NA when no result found in query
added utility fxn for defauling to NA when result is NULL or of length 0
bump dev version
@sckott
Copy link
Contributor

sckott commented May 18, 2017

@PMassicotte can you reinstall and try again. let me know if it works for you now.

@PMassicotte
Copy link
Author

Looks like the problem is fixed.

                                                       
library(dplyr)                                         
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(rfishbase)                                     

This works

df <- data_frame(                                      
myname = c("Oreochromis niloticus", "Salmo trutta")    
)                                                      
                                                       
df %>%                                                 
mutate(valide_name = validate_names(myname))           
#> Warning: FishBase says that 'Salmo trutta' can also be misapplied to other species
#>                     but is returning only the best match.  
#>                     See synonyms('Salmo trutta') for details
#> # A tibble: 2 x 2
#>                  myname           valide_name
#>                   <chr>                 <chr>
#> 1 Oreochromis niloticus Oreochromis niloticus
#> 2          Salmo trutta          Salmo trutta

This works now

df <- data_frame(                                      
myname = c("Oreochromis niloticus", "Salmo truttaxxxx")
)                                                      
                                                       
df %>%                                                 
mutate(valide_name = validate_names(myname))           
#> Warning in check_and_parse(resp): Bad Request (HTTP 400).
#> Warning: no results found for query https://fishbase.ropensci.org/synonyms?
#> SynSpecies=truttaxxxx&SynGenus=Salmo&limit=50&fields=SynGenus%2CSynSpecies
#> %2CValid%2CMisspelling%2CStatus%2CSynonymy%2CCombination%2CSpecCode
#> %2CSynCode%2CCoL_ID%2CTSN%2CWoRMS_ID
#> Warning: No match found for species 'Salmo truttaxxxx'
#> # A tibble: 2 x 2
#>                  myname           valide_name
#>                   <chr>                 <chr>
#> 1 Oreochromis niloticus Oreochromis niloticus
#> 2      Salmo truttaxxxx                  <NA>
                                                       
sessionInfo()                                          
#> R version 3.4.0 (2017-04-21)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Linux Mint 18.1
#> 
#> Matrix products: default
#> BLAS: /usr/lib/openblas-base/libblas.so.3
#> LAPACK: /usr/lib/libopenblasp-r0.2.18.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8    
#>  [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8   
#>  [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] rfishbase_2.1.2.1 dplyr_0.5.0      
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_0.12.10       tidyr_0.6.3        digest_0.6.12     
#>  [4] rprojroot_1.2      assertthat_0.2.0   R6_2.2.1          
#>  [7] jsonlite_1.4       DBI_0.6-1          backports_1.0.5   
#> [10] magrittr_1.5       evaluate_0.10      httr_1.2.1        
#> [13] rlang_0.1.1        stringi_1.1.5      curl_2.6          
#> [16] lazyeval_0.2.0     rmarkdown_1.5.9000 tools_3.4.0       
#> [19] stringr_1.2.0      yaml_2.1.14        compiler_3.4.0    
#> [22] htmltools_0.3.6    knitr_1.16         tibble_1.3.1

Thank you!

@sckott sckott added the bug label May 19, 2017
@sckott
Copy link
Contributor

sckott commented May 19, 2017

great, glad it works 👍

@sckott sckott closed this as completed May 19, 2017
@cboettig cboettig modified the milestone: 2.2.0 Jul 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants