RB: R package to search for records and download images from the RB (Jardim Botânico do Rio de Janeiro) herbarium
RB is not on CRAN, so please install it from github:
# Install devtools
install.packages("devtools")
# Install finch (YOU MUST INSTALL THE VERSION FROM GITHUB IF USING R ON WINDOWS!)
devtools::install_github("gustavobio/finch")
# Install RB:
devtools::install_github("gustavobio/RB")
RB tools have three main usage scenarios:
- Searching for records in the RB database.
- Navigating images.
- Downloading images.
The main function here is search_rb
. When you call it for the first time
the package will try to download the DWCA dataset provided by the Jardim Botânico
do Rio de Janeiro. This dataset is a little over 100M, so it may take a while. Memory usage
is also high and will likely impact performance in computers with lower specs:
> library(RB)
> download_rb_data(encoding = "UTF-8")
trying URL 'http://ipt.jbrj.gov.br/jbrj/archive.do?r=jbrj_rb&v=84.109'
downloaded 103.9 MB
Read 691354 rows and 45 (of 45) columns from 0.352 GB file in 00:00:13
Read 625625 rows and 7 (of 7) columns from 0.153 GB file in 00:00:07
153 GB file in 00:00:07
> miconias <- search_rb("Miconia albicans")
This call to search_rb
returns a data frame with 477 rows and 45 columns, including scientific name, family,
collector, collector notes, dates, determiner, county, state, and so on:
> str(miconias)
'data.frame': 477 obs. of 45 variables:
$ id : chr "urn:catalog:JBRJ:RB:741399" "urn:catalog:JBRJ:RB:1045255" "urn:catalog:JBRJ:RB:1045306" "urn:catalog:JBRJ:RB:1045294" ...
$ type : chr "Collection" "Collection" "Collection" "Collection" ...
$ modified : chr "2014-07-08 00:45:06.721283" "2015-11-30 17:52:29.673752" "2015-11-30 17:52:57.868959" "2015-11-30 19:04:47.934176" ...
$ rightsHolder : chr "" "RB" "RB" "RB" ...
$ institutionCode : chr "RB" "RB" "RB" "RB" ...
$ collectionCode : chr "" "RB" "RB" "RB" ...
$ basisOfRecord : chr "PreservedSpecimen" "PreservedSpecimen" "PreservedSpecimen" "PreservedSpecimen" ...
$ occurrenceID : chr "urn:catalog:JBRJ:RB:741399" "urn:catalog:JBRJ:RB:1045255" "urn:catalog:JBRJ:RB:1045306" "urn:catalog:JBRJ:RB:1045294" ...
$ catalogNumber : chr "RB00741399" "RB01045255" "RB01045306" "RB01045294" ...
$ recordNumber : chr "431" "488" "551" "538" ...
$ recordedBy : chr "N.L. Britton; H.H. Rusby" "P. Rosa; T.S.Pereira; A. Pintor & JF.A. Baumgratz" "P. Rosa; T.S.Pereira; A. Pintor & JF.A. Baumgratz" "P. Rosa; T.S.Pereira; A. Pintor & JF.A. Baumgratz" ...
...
The first argument in search_rb
is a scientific name. Please see the helpfile (?search_rb
) for a list of all possible arguments. You can combine them to refine your search:
# By scientific name and year
> miconias_2015 <- search_rb("Miconia albicans", year = 2015)
> dim(miconias_2015)
[1] 13 45
# By genus and county
> miconias_itirapina <- search_rb(genus = "Miconia", county = "Itirapina")
> dim(miconias_itirapina)
[1] 24 45
# By scientific name and collector
> myrcias_van <- search_rb("Myrcia guianensis", collector = "Staggemeier")
> dim(myrcias_van)
[1] 3 45
You can also open images in the default browser using open_rb_images
and passing the results from search_rb
:
> open_rb_images(myrcias_van)
This will open all images in your browser in new tabs. The maximum number of images is given by the argument max
, which defaults to 5.
> open_rb_images(myrcias_van, max = 15)
You can also tweak image resolution using the argument width
(the default width in pixels is 600):
> open_rb_images(myrcias_van, width = 3000)
If you are just browsing images (for instance, to check plant characteristics) you can do the search directly in open_rb_images
:
> open_rb_images(scientific_name = "Miconia albicans")
This will open the first 5 images in the database. If you want random 5, use the argument random
:
> open_rb_images(scientific_name = "Miconia albicans", random = TRUE, width = 3000) # files here will be large
Images can also be downloaded and stored locally, in folders in your current path. See getwd()
if you don't know where this is. You can change this with setwd()
.
> getwd()
[1] "/Users/gustavo"
The function download_rb_images
is the workhorse here. For instance, you can download images from a previous search:
> download_rb_images(myrcias_van)
3 images found. Continue? (y/n): y
|===================================================================| 100%
>
This call downloads images in the default resolution (in this case, width = 3000) to a local dir:
Please beware that if your search yields several results, downloading all images can impact RB servers. Use discretion here. You could, for instance, download all images for a given family:
> download_rb_images(family = "Melastomataceae")
26890 images found. Continue? (y/n): y
Downloading 26890 images to /Users/gustavo/RB_images__30_Jun_17_22_37/:
| | 0%