BioSys is a data warehouse for biological survey data run by the Western Australian Department of Biodiversity, Conservation and Attractions (DBCA).
BioSys is accessible to DBCA staff behind a single-sign-on firewall, whereas the BioSys API is accessible both to staff (behind SSO firewall) through a GUI and from scripts protected (read and write) through basicauth using a BioSys username and password. The BioSys API documentation provides a graphical API browser.
The BioSys API returns JSON dictionaries of projects, datasets, records, and other entities.
If a data consumer wishes to analyse data in a statistical package like R, the data need to be transformed from a nested list of lists (JSON) into a two-dimensional tablular structure.
The main purpose of this R package, somewhat uncreatively named biosysR
, is to facilitate accessing and using BioSys data by providing helpers to access the API and flatten the API outputs into a tidy dplyr::tibble
.
Install biosystR
from GitHub:
# install.packages("devtools")
devtools::install_github("parksandwildlife/biosysR")
library(biosysR)
The BioSys API is only accessible with basicauth using a valid Biosys username and password. To get up and running, execute the following commands with your own BioSys username and password:
Sys.setenv(BIOSYS_UN = "USERNAME")
Sys.setenv(BIOSYS_PW = "PASSWORD")
See the package vignette for a comprehensive run-down on BioSys API authentication and setup options.
All examples assume that authentication credentials are available as environment variables. See the vignette for more authentication options.
projects <- biosysR::biosys_projects()
dplyr::glimpse(projects)
#> Observations: 7
#> Variables: 13
#> $ id <chr> "1", "2", "3", "4", "7", "6", "5"
#> $ name <chr> "Berkeley Incidental Records", "Kimberley Is...
#> $ code <chr> "BER", "KI", "LCI", "KNC", "PRS", "SBS", "SCTI"
#> $ description <chr> "Incidental mainland records captured as par...
#> $ site_count <int> 41, 163, 208, 27, 104, 0, 64
#> $ dataset_count <int> 3, 9, 13, 10, 3, 2, 8
#> $ record_count <int> 154, 42118, 14561, 1621, 726, 3696, 3730
#> $ longitude <dbl> 127.8207, 125.5086, 126.8049, 128.5613, NA, ...
#> $ latitude <dbl> -14.48498, -14.60075, -15.54562, -16.08126, ...
#> $ datum <chr> "4326", "4326", "4326", "4326", "4326", "432...
#> $ timezone <chr> "Australia/Perth", "Australia/Perth", "Austr...
#> $ site_data_package <list> [NULL, NULL, NULL, NULL, NULL, NULL, NULL]
#> $ custodians <list> [2, 2, 2, 2, 2, [2, 9, 12], 2]
datasets <- biosysR::biosys_datasets(project_id = 6)
dplyr::glimpse(datasets)
#> Observations: 48
#> Variables: 7
#> $ id <chr> "101", "107", "118", "30", "45", "99", "108", "11...
#> $ record_count <int> 4582, 38, 3307, 33, 414, 426, 95, 42, 23, 1163, 6...
#> $ data_package <list> [["tabular-data-package", "BioSys Config", "anim...
#> $ name <chr> "Animal Observations", "Animal Observations", "An...
#> $ type <chr> "species_observation", "species_observation", "sp...
#> $ description <chr> "", "", "", "", "", "", "", "", "", "", "", "", "...
#> $ project_id <int> 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 3, 3, 4, 5, 2...
records <- biosysR::biosys_records(project_id = 6)
dplyr::glimpse(records)
#> Observations: 3,696
#> Variables: 54
#> $ id <chr> "147647", "147648", "147649", "147650",...
#> $ datetime <chr> "2016-03-17T16:00:00Z", "2016-03-17T16:...
#> $ species_name <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
#> $ name_id <chr> "-1", "-1", "-1", "-1", "-1", "-1", "-1...
#> $ file_name <chr> "SBY_2016-03_Seagrass_biosys.csv", "SBY...
#> $ file_row <chr> "2", "3", "4", "5", "6", "7", "8", "9",...
#> $ last_modified <chr> "2017-09-20T08:34:13.411874Z", "2017-09...
#> $ dataset <chr> "126", "126", "126", "126", "126", "126...
#> $ site <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
#> $ Impact <list> ["", "Epiphyte", "", "Epiphyte", "", "...
#> $ Level3Class <list> ["SAND", "RUBBLE", "Posidonia spp.", "...
#> $ RecordNo <list> ["0", "1", "2", "3", "4", "5", "6", "7...
#> $ Level4Class <list> ["SAND", "RUBBLE", "Posidonia australi...
#> $ Level2Class <list> ["SAND", "RUBBLE", "Posidoniaceae", "P...
#> $ Replicate <list> ["Transect 1", "Transect 1", "Transect...
#> $ Level1ClassCode <list> ["CBC22022011163926270", "CBC220220111...
#> $ Level3ClassCode <list> ["CBC22022011163926270", "CBC220220111...
#> $ Latitude <list> ["-26.20245", "-26.20245", "-26.20245"...
#> $ TotalPoints <list> ["6", "6", "6", "6", "6", "6", "6", "6...
#> $ SubstrateCode <list> ["SBC22022011171457536", "SBC220220111...
#> $ ZoneCode <list> ["SBY-GUZ-WG", "SBY-GUZ-WG", "SBY-GUZ-...
#> $ Level2ClassCode <list> ["CBC22022011163926270", "CBC220220111...
#> $ Zone <list> ["Western Gulf", "Western Gulf", "West...
#> $ Level4ClassCode <list> ["CBC22022011163926270", "CBC220220111...
#> $ Baseclassmodifiers <list> ["NA", "Rubble", "NA", "NA", "NA", "Ru...
#> $ Date <list> ["18/03/2016", "18/03/2016", "18/03/20...
#> $ SubstrateModifier <list> ["No relief", "No relief", "No relief"...
#> $ Level5Class <list> ["SAND", "RUBBLE", "Posidonia australi...
#> $ Level1Class <list> ["SAND", "RUBBLE", "SEAGRASS", "SEAGRA...
#> $ ImpactCode <list> ["", "IMP15042011111556150", "", "IMP1...
#> $ ClassLevel <list> ["Level 1", "Level 1", "Level 4", "Lev...
#> $ Substrate <list> ["Sand", "Sand", "Sand", "Sand", "Sand...
#> $ FeatureType <list> ["Point", "Point", "Point", "Point", "...
#> $ Region <list> ["Shark Bay Marine Park", "Shark Bay M...
#> $ Analysis <list> ["Random 6 pts", "Random 6 pts", "Rand...
#> $ PointNo <list> ["0", "1", "2", "3", "4", "5", "0", "1...
#> $ RegionCode <list> ["SBY", "SBY", "SBY", "SBY", "SBY", "S...
#> $ ImageName <list> ["SBY-GUZ-WG-ULS-T1-L_20160318134127",...
#> $ Survey <list> ["SBY-GUZ-WG-ULS-T1-2016318134127", "S...
#> $ Longitude <list> ["113.46631", "113.46631", "113.46631"...
#> $ Time <list> ["1:41:27 PM", "1:41:27 PM", "1:41:27 ...
#> $ BaseclassmodifiersCode <list> ["NA", "BMC22022011163936550", "NA", "...
#> $ SectorCode <list> ["SBY-GUZ", "SBY-GUZ", "SBY-GUZ", "SBY...
#> $ Sector <list> ["General Use Zone", "General Use Zone...
#> $ ImageNo <list> ["0", "0", "0", "0", "0", "0", "1", "1...
#> $ Projection <list> ["+proj=longlat +ellps=WGS84 +no_defs"...
#> $ Level5ClassCode <list> ["CBC22022011163926270", "CBC220220111...
#> $ Site <list> ["Useless Loop South", "Useless Loop S...
#> $ Month <list> ["March", "March", "March", "March", "...
#> $ SubstrateModifierCode <list> ["SBM22022011171457583", "SBM220220111...
#> $ Year <list> ["2016", "2016", "2016", "2016", "2016...
#> $ CameraSide <list> ["Left", "Left", "Left", "Left", "Left...
#> $ SiteCode <list> ["SBY-GUZ-WG-ULS", "SBY-GUZ-WG-ULS", "...
#> $ ReplicateCode <list> ["SBY-GUZ-WG-ULS-T1", "SBY-GUZ-WG-ULS-...
In case the BioSys API is not accessible, a sample of available data is supplied.
data(projects)
data(datasets)
data(records)
dplyr::glimpse(projects)
#> Observations: 7
#> Variables: 13
#> $ id <chr> "1", "2", "3", "4", "7", "6", "5"
#> $ name <chr> "Berkeley Incidental Records", "Kimberley Is...
#> $ code <chr> "BER", "KI", "LCI", "KNC", "PRS", "SBS", "SCTI"
#> $ description <chr> "Incidental mainland records captured as par...
#> $ site_count <int> 41, 163, 208, 27, 104, 0, 64
#> $ dataset_count <int> 3, 9, 13, 10, 3, 2, 8
#> $ record_count <int> 154, 42118, 14561, 1621, 726, 3696, 3730
#> $ longitude <dbl> 127.8207, 125.5086, 126.8049, 128.5613, NA, ...
#> $ latitude <dbl> -14.48498, -14.60075, -15.54562, -16.08126, ...
#> $ datum <chr> "4326", "4326", "4326", "4326", "4326", "432...
#> $ timezone <chr> "Australia/Perth", "Australia/Perth", "Austr...
#> $ site_data_package <list> [NULL, NULL, NULL, NULL, NULL, NULL, NULL]
#> $ custodians <list> [2, 2, 2, 2, 2, [2, 9, 12], 2]
dplyr::glimpse(datasets)
#> Observations: 48
#> Variables: 7
#> $ id <chr> "101", "107", "118", "30", "45", "99", "108", "11...
#> $ record_count <int> 4582, 38, 3307, 33, 414, 426, 95, 42, 23, 1163, 6...
#> $ data_package <list> [["tabular-data-package", "BioSys Config", "anim...
#> $ name <chr> "Animal Observations", "Animal Observations", "An...
#> $ type <chr> "species_observation", "species_observation", "sp...
#> $ description <chr> "", "", "", "", "", "", "", "", "", "", "", "", "...
#> $ project_id <int> 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 3, 3, 4, 5, 2...
dplyr::glimpse(head(records))
#> Observations: 6
#> Variables: 54
#> $ id <chr> "147647", "147648", "147649", "147650",...
#> $ datetime <chr> "2016-03-17T16:00:00Z", "2016-03-17T16:...
#> $ species_name <chr> NA, NA, NA, NA, NA, NA
#> $ name_id <chr> "-1", "-1", "-1", "-1", "-1", "-1"
#> $ file_name <chr> "SBY_2016-03_Seagrass_biosys.csv", "SBY...
#> $ file_row <chr> "2", "3", "4", "5", "6", "7"
#> $ last_modified <chr> "2017-09-20T08:34:13.411874Z", "2017-09...
#> $ dataset <chr> "126", "126", "126", "126", "126", "126"
#> $ site <chr> NA, NA, NA, NA, NA, NA
#> $ Impact <list> ["", "Epiphyte", "", "Epiphyte", "", ""]
#> $ Level3Class <list> ["SAND", "RUBBLE", "Posidonia spp.", "...
#> $ RecordNo <list> ["0", "1", "2", "3", "4", "5"]
#> $ Level4Class <list> ["SAND", "RUBBLE", "Posidonia australi...
#> $ Level2Class <list> ["SAND", "RUBBLE", "Posidoniaceae", "P...
#> $ Replicate <list> ["Transect 1", "Transect 1", "Transect...
#> $ Level1ClassCode <list> ["CBC22022011163926270", "CBC220220111...
#> $ Level3ClassCode <list> ["CBC22022011163926270", "CBC220220111...
#> $ Latitude <list> ["-26.20245", "-26.20245", "-26.20245"...
#> $ TotalPoints <list> ["6", "6", "6", "6", "6", "6"]
#> $ SubstrateCode <list> ["SBC22022011171457536", "SBC220220111...
#> $ ZoneCode <list> ["SBY-GUZ-WG", "SBY-GUZ-WG", "SBY-GUZ-...
#> $ Level2ClassCode <list> ["CBC22022011163926270", "CBC220220111...
#> $ Zone <list> ["Western Gulf", "Western Gulf", "West...
#> $ Level4ClassCode <list> ["CBC22022011163926270", "CBC220220111...
#> $ Baseclassmodifiers <list> ["NA", "Rubble", "NA", "NA", "NA", "Ru...
#> $ Date <list> ["18/03/2016", "18/03/2016", "18/03/20...
#> $ SubstrateModifier <list> ["No relief", "No relief", "No relief"...
#> $ Level5Class <list> ["SAND", "RUBBLE", "Posidonia australi...
#> $ Level1Class <list> ["SAND", "RUBBLE", "SEAGRASS", "SEAGRA...
#> $ ImpactCode <list> ["", "IMP15042011111556150", "", "IMP1...
#> $ ClassLevel <list> ["Level 1", "Level 1", "Level 4", "Lev...
#> $ Substrate <list> ["Sand", "Sand", "Sand", "Sand", "Sand...
#> $ FeatureType <list> ["Point", "Point", "Point", "Point", "...
#> $ Region <list> ["Shark Bay Marine Park", "Shark Bay M...
#> $ Analysis <list> ["Random 6 pts", "Random 6 pts", "Rand...
#> $ PointNo <list> ["0", "1", "2", "3", "4", "5"]
#> $ RegionCode <list> ["SBY", "SBY", "SBY", "SBY", "SBY", "S...
#> $ ImageName <list> ["SBY-GUZ-WG-ULS-T1-L_20160318134127",...
#> $ Survey <list> ["SBY-GUZ-WG-ULS-T1-2016318134127", "S...
#> $ Longitude <list> ["113.46631", "113.46631", "113.46631"...
#> $ Time <list> ["1:41:27 PM", "1:41:27 PM", "1:41:27 ...
#> $ BaseclassmodifiersCode <list> ["NA", "BMC22022011163936550", "NA", "...
#> $ SectorCode <list> ["SBY-GUZ", "SBY-GUZ", "SBY-GUZ", "SBY...
#> $ Sector <list> ["General Use Zone", "General Use Zone...
#> $ ImageNo <list> ["0", "0", "0", "0", "0", "0"]
#> $ Projection <list> ["+proj=longlat +ellps=WGS84 +no_defs"...
#> $ Level5ClassCode <list> ["CBC22022011163926270", "CBC220220111...
#> $ Site <list> ["Useless Loop South", "Useless Loop S...
#> $ Month <list> ["March", "March", "March", "March", "...
#> $ SubstrateModifierCode <list> ["SBM22022011171457583", "SBM220220111...
#> $ Year <list> ["2016", "2016", "2016", "2016", "2016...
#> $ CameraSide <list> ["Left", "Left", "Left", "Left", "Left...
#> $ SiteCode <list> ["SBY-GUZ-WG-ULS", "SBY-GUZ-WG-ULS", "...
#> $ ReplicateCode <list> ["SBY-GUZ-WG-ULS-T1", "SBY-GUZ-WG-ULS-...
See the vignette for in-depth examples of authenticating, transforming, analysing and visualising BioSys data. (Note: work in progress)
vignette("biosysR")
Every contribution, constructive feedback, or suggestion is welcome!
Send us your ideas and requests as issues or submit a pull request.
Pull requests should eventually pass tests and checks (not introducing new ERRORs, WARNINGs or NOTEs apart from the "New CRAN package" NOTE):
devtools::document()
devtools::test()
pkgdown::build_site()
devtools::check(check_version = T, force_suggests = T, cran = T)
Code coverage is automatically calculated and reported from TravisCI. To manually submit code coverage reports, run:
Sys.setenv(CODECOV_TOKEN=Sys.getenv("BIOSYS_CODECOV_TOKEN"))
covr::codecov()