# Exploring OGSL.ca/bio/ data using `robis` and `obistools`

Two packages maintained by the programmers at OBIS, `robis` allows users to access the contents of the OBIS database, and `obistools` provides some QA/QC checks for data held on your local machine.

In [None]:
# obistools is a github-only install
library(devtools)
# QC helper tools
devtools::install_github('iobis/obistools')
# Access and visualization of OBIS-held data
devtools::install_github('iobis/robis')

In [None]:
# Get your source file(s) from ogsl.ca/bio/ and put them into a data frame
library(tidyverse)

# You may want to use latin1 as the locale so that it handles accents correctly.
data <- read_csv('data/data_20200119-1640_fd181d2d/export.csv', local = locale(encoding = "latin1"))

In [None]:
# Begin to explore it

# print a summary of the dataframe
data

In [None]:
# print the column names of the dataframe
names(data)

In [None]:
# explore the contents of individual column names to help classification.

unique(data$"Institution propriétaire")

In [None]:
# From exploring this data downloaded via ogsl.ca/bio, 
# The occurrence data from the portal appears to be completely DwC-mappable and ready for OBIS ingestion.
# as Occurrence or Occurrence + MoF

"""
 Date + Format             -> map timezone + format to create ISO-8601 as datecollected
 Emplacement                -> locality ? Basis of higherGeography? [Emplacement]
 Longitude / Latitude       -> decimalLongitude, decimalLatitude, footprintWKT, geodeticDatum
 Taxon                      -> vernacularName
 Nom Latin                  -> scientificName
 Nombre d'individus         -> individualCount
 Poids                      -> dynamicProperties{weightInGrams:[Poids].toGrams()} 
                                    and/or MeasurementOrFact entry w/ weight(s)
 Présence                   -> occurrenceStatus = present  /  absent
 Biomasse                   ->      if mass in percentage : organismQuantity: [Biomasse] w/ organismQuantityType %biomass
                                    else if by total mass:  dynamicProperties{weightInGrams:[Biomasse]}          
                                    else: MeasurementOrFact entry w/ individuals + weight(s)  
 
 Densité                    ->  != individualCount  , perhaps dynamicProperties{density:[Densité]}
 Couverture                 ->  ? 
 


 Méthode d'échantillonnage  -> BasisOfRecord map{ Visuel, (Chalut, Trappe à anguilles + other fishing methods) -> HumanObservation, 
                                   ?? -> LivingSpecimen, 
                                   ?? -> MachineObservation }
 Provenance                 -> establishmentMeans map{ Exotique Envahissante -> invasive,
                                                       Exotique Naturalisée  -> naturalized,
                                                                             -> introduced,
                                                                             -> managed,}
 Collection                  -> Title? DatasetName?
 Institution propriétaire    -> institutionCode
"""

### Migrating Columns to OBIS

In [None]:
library(lubridate) # Semi-intelligent date-parsing function, timezone-aware. Could supply format manually

            # Rename the columns that need straight renaming
obis_df <- rename(data, vernacularName=Taxon, locality=Emplacement, 
                  decimalLatitude=Latitude, decimalLongitude=Longitude, 
                  individualCount='Nombre d\'individus', scientificName='Nom Latin', 
                  Title=Collection, institutionCode='Institution propriétaire') %>%
            # Mutate to create the columns that need calculation steps
            mutate(eventDate = date(Date), basisOfRecord = 'HumanObservation', 
                   occurrenceStatus='present', 
                   minimumDepthInMeters=-150,     
                   maximumDepthInMeters=0)  




### Getting a Data Report 

In [None]:
# Produce a report in html format evaluating your dataset's fitness for OBIS. 
obistools::report(obis_df)

In [None]:
# We don't have matching IDs for our scientificnames. We should go get them!
# Match the taxa using the World Register of Marine Species
matched_taxa <- obistools::match_taxa(obis_df$scientificName)

In [None]:
matched_taxa

In [None]:
# assign the matches back to the dataframe
obis_df <- mutate(obis_df, scientificNameID = matched_taxa$scientificNameID)

In [None]:
# Check the report again.
obistools::report(obis_df)

### Spatial Checks

Verify if any points aren't at positive elevation (i.e. are on land).

In [None]:
onland <- obistools::check_onland(obis_df)
onland

In [None]:
# obistools::plot_map(onland, zoom=TRUE) # no points found to be on land, so nothing to plot!
obistools::plot_map(obis_df, zoom = TRUE) # plot the whole dataset instead.

In [None]:
depthreport <- obistools::check_depth(obis_df, report=TRUE)

In [None]:
depthreport

### Check presence / contents of date fields

In [None]:
obistools::check_eventdate(obis_df)
# obistools::check_eventdate(obis_df %>% select(-eventDate))

In [None]:
# What does it look like when there are malformed dates?
data_badformats <- data.frame(
  eventDate = c(
    "2016/01/02",
    "2016-01-02 13h00"),
  stringsAsFactors = FALSE)

obistools::check_eventdate(data_badformats)

### Questions for our hands-on sessions:
* What sort of metadata do we collect about each collection?
* What is the format of the data as it's found in the ogsl.ca system? 
* Is there an even more direct conversion/mapping than this one?
* Could some of your data providers benefit from access to this sort of instant feedback about their data?
* Where do / could checks such as these occur in your biodiversity data ingestion pipeline?