# Accessing OBIS through R: `robis` package

`robis` is the main gateway for OBIS data through R. You can learn more about `robis` in the [OBIS manual.](https://manual.obis.org/access.html#r-package)

Hands-on outline:

1. Downloading occurrence records
2. Getting additional information about datasets
3. Download records for a particular region
4. Get species lists
5. Filtering data
6. Obtaining time series
7. Getting extended measurements

Before starting, we need to install a few packages on Google Colab:

In [None]:
# It will take approximately 5 minutes, but then you don't need to install
# again to use with the other R notebooks, while the session is alive.
install.packages("bspm")
suppressMessages(bspm::enable())
install.packages("robis")
install.packages("arrow")
install.packages("rnaturalearth")
install.packages("DBI")
install.packages("duckdb")
install.packages("h3jsr")
install.packages("sf")
install.packages("gifski")
devtools::install_github("iobis/obistools")
devtools::install_github("ropensci/mregions2")

### 1. Downloading occurrence records

To download records from OBIS we use the function `occurrence`. There are many arguments you can pass to download data for a specific species, taxonomic level or region.

We will start by getting data for three taxonomic entities:

<div style="display: flex; flex-direction: row; max-height: 200px; padding: 5px;">
<div>
<p>Species: <i>Lytechinus variegatus</i></p><img src="https://upload.wikimedia.org/wikipedia/commons/f/f5/Lytechinus_variegatus.jpg" height=200></img>
</div>
<div>
<p>Genus: <i>Lytechinus</i></p><img src="https://upload.wikimedia.org/wikipedia/commons/f/f5/Lytechinus_semituberculatus_12770656.jpg" height=200></img>
</div>
<div>
<p>Family: Toxopneustidae</p><img src="https://upload.wikimedia.org/wikipedia/commons/5/51/Toxopneustes_pileolus_%28Sea_urchin%29.jpg" height=200></img>
</div>
</div>

In [None]:
library(robis)
library(dplyr)

lych_var <- occurrence("Lytechinus variegatus")

nrow(lych_var)
head(lych_var)

We can make a map of the records. For that, we can use a handy function from the `obistools` package.

In [None]:
library(obistools)

plot_map(lych_var)

OBIS is different in that it was created and is optimized for marine data. It matches its taxonomic names (the species identity) with the World Register of Marine Species (WoRMS). Each species (or any other taxonomic level) has a unique ID called **AphiaID**. You can see for example the entry for _Lytechinus variegatus_ [here.](https://www.marinespecies.org/aphia.php?p=taxdetails&id=367850)

This is important because species names can change over time, in case a species is reclassified into another group or when we discover that one species can be in fact multiple species.

The `occurrence` function also accepts taxonid. So, the same we did on the first cell, can be done with:

In [None]:
# This is the same as
# lych_var <- occurrence("Lytechinus variegatus")
lych_var <- occurrence(taxonid = 367850)

Let's try the other taxonomic levels:

In [None]:
lych_genus <- occurrence("Lytechinus")

nrow(lych_genus)
table(lych_genus$scientificName)

toxop_fam <- occurrence("Toxopneustidae")

nrow(toxop_fam)
table(toxop_fam$scientificName)

### 2. Getting dataset information

Records are organized in **datasets** that group data that was collected in a particular survey, study, monitoring, etc. We can get additional information about datasets using the `dataset` function (which can also be used to list all datasets for a specific `scientificName`).

In [None]:
# Get the number of records by dataset
lych_var_ds <- lych_var |> 
    group_by(dataset_id) |> 
    summarise(records = n())

high_n <- lych_var_ds[order(lych_var_ds$records, decreasing = T), "dataset_id"][1,]

ds_info <- dataset(datasetid = high_n$dataset_id)

head(ds_info[,1:5])

### 3. Downloading records for a region

There are two ways to download data for a specific region. The first is to use the OBIS regions. You can explore it using the mapper, for example this one: https://mapper.obis.org/?areaid=27

This is the region of Trindade archipelago, in Brazil.

<img src="https://upload.wikimedia.org/wikipedia/commons/c/c2/Simone_Marinho_-_Trindade_-_2010_05_08_edited.jpg" height=200></img>

In [None]:
trindade <- occurrence(areaid = 27)

Another approach is to pass a geometry in Well-known Text (WKT). You can draw polygons in this website: https://wktmap.com/

In [None]:
wkt_area <- "POLYGON ((-79.189453 27.293689, -79.584961 23.765237, -75.9375 22.43134, -73.959961 24.726875, -74.750977 27.176469, -79.189453 27.293689))"

occ_area <- occurrence(
    scientificname = "Acanthuridae",
    geometry = wkt_area
)

nrow(occ_area)

plot_map_leaflet(occ_area)
# If the leaflet map does not show correctly, use plot_map(occ_area)


### 4. Get list of species (checklist)

Sometimes we are only interested in knowing which species are present in a region. This information, called a checklist, can be easily obtained through the function `robis::checklist()`. Let's try with this same region.

In [None]:
check_area <- checklist(geometry = wkt_area)

head(check_area)

<span style="color: #cf0f4f;">NOTE: if you try to pass a very complex geometry, all functions will probably fail! Usually `nchar(geom_wkt)` should be less than 1,500 characters.</span>

### 5. Filtering data

Once you have a dataset, you can filter the way you want. The sky is the limit! You can for example get records from just a period of time. Or, you can filter records that comes from a specific depth. Or even organisms that were sampled in areas with a certain maximum depth.

<span style="color: #cf0f4f;">NOTE: there is a difference between depth, which is related to the depth at which the organism was recorded and bathymetry, that is the depth of the area where the organism was recorded.</span>

In [None]:
occ_1998_1999 <- occ_area |> 
    filter(date_year >= 1998 & date_year <= 1999)

head(occ_1998_1999)

occ_d10 <- occ_area |> 
    filter(depth >= 10)

head(occ_d10)

occ_b100 <- occ_area |> 
    filter(bathymetry >= 100)

head(occ_b100)

Check also the [quality flags of OBIS](https://manual.obis.org/dataquality.html) and how you can filter by those.

### 6. Time-series

OBIS does not have (yet) a direct filter for time-series data. As you learned, records can be organized in events, with parentEvents aggregating events of the same survey. But, we can also use other techniques to get time series data.

In [None]:
occ_area_ts <- occ_area |> 
    group_by(dataset_id, 
             eventDate,
             round(decimalLongitude, 3),
             round(decimalLatitude, 3)) |> 
    distinct() |> 
    ungroup() |> 
    group_by(dataset_id) |> 
    distinct(eventDate)  |> 
    summarise(total = n()) |> 
    filter(total >= 3)

ds_info <- dataset(datasetid = occ_area_ts$dataset_id[4])
ds_info$title


A special case is that of tracking data. You can learn more about tracking data in OBIS in the Ocean Tracking Network [website](https://oceantrackingnetwork.org/). Those datasets contains multiple measurements for the same organism, each at a time point.

Let's download one of those [datasets from OTN](https://obis.org/dataset/78bf6b7f-555c-4bf7-8d81-a766c5bc736e).

In [None]:
otn_dataset <- occurrence(datasetid = "78bf6b7f-555c-4bf7-8d81-a766c5bc736e")

otn_dataset |> 
    select(scientificName, organismID, organismName, eventDate) |> 
    slice_head(n = 5)

As you can see, each organism has a unique ID and name that we can use to track all records pertaining to that animal.

In [None]:
# Get the number of records by organism
ind_recs <- otn_dataset |> 
    group_by(organismID, eventDate, decimalLongitude, decimalLatitude) |> 
    distinct(.keep_all = T) |> 
    ungroup() |> group_by(organismID) |> 
    summarise(total = n()) |> 
    arrange(desc(total))

# Select the one with more records
blue_shark1 <- otn_dataset |> 
    filter(organismID == ind_recs$organismID[1])
head(blue_shark1)

# Arrange by date and plot
blue_shark1 <- blue_shark1 |> 
    group_by(eventDate, decimalLongitude, decimalLatitude) |> 
    distinct() |> 
    ungroup() |> 
    mutate(eventDate = lubridate::as_date(eventDate)) |> 
    arrange(eventDate)

plot(blue_shark1[,c("decimalLongitude", "decimalLatitude")])


In [None]:

# Create an animated plot of movement
wrld <- rnaturalearth::ne_countries(returnclass = "sf")
wrld <- sf::st_as_sfc(wrld[,1])
lims_lon <- range(blue_shark1$decimalLongitude)
lims_lat <- range(blue_shark1$decimalLatitude)
lims_lon <- lims_lon + c(-0.1, 0.1)
lims_lat <- lims_lat + c(-0.1, 0.1)
coords <- c("decimalLongitude", "decimalLatitude")

png_path <- file.path(tempdir(), "frame%03d.png")
png(png_path)
for (i in seq_len(nrow(blue_shark1))) {
    plot(wrld, xlim = lims_lon, ylim = lims_lat, col = "grey70", main = blue_shark1$eventDate[i])
    points(blue_shark1[i,coords], col = "#044c98", pch = 20, cex = 2)
    if (i == 2) {
        points(blue_shark1[(i-1),coords], col = "#044b9889", pch = 20, cex = 2)
    } else if (i == 3) {
        points(blue_shark1[(i-1),coords], col = "#044b9889", pch = 20, cex = 2)
        points(blue_shark1[(i-2),coords], col = "#044b982f", pch = 20, cex = 2)
    } else if (i > 3) {
        points(blue_shark1[(i-1),coords], col = "#044b9889", pch = 20, cex = 2)
        points(blue_shark1[(i-2),coords], col = "#044b982f", pch = 20, cex = 2)
        points(blue_shark1[(i-3),coords], col = "#044b9812", pch = 20, cex = 2)
    }
}
dev.off()
png_files <- sprintf(png_path, seq_len(nrow(blue_shark1)))
gif_file <- tempfile(fileext = ".gif")
gifski::gifski(png_files, gif_file, delay = .1)
unlink(png_files)
utils::browseURL(gif_file)
# Note: if the file don't open, add this line, run again, and then you will
# see the GIF on your files explorer on the left side of Colab (refresh if needed) 
# Then, just download it to your computer.
file.copy(gif_file, "movement.gif")

### 7. Extended measurements

OBIS is not only occurrence data! Actually, OBIS has more than 180 million measurements associated to records. Those can have information like abundance of organisms, size, weight, or environmental measurements like temperature and salinity. To learn more about the extended Measurements or Facts (eMoF) [click here.](https://manual.obis.org/format_emof.html)

To get the eMoF we use the `occurrence` function with the option `mof = TRUE`. It will download a rather complicated format with a column containing lists. The `robis` package comes with the function `unnest_extension` to help us. 

In [None]:
# Code by Elizabeth Lawrence
canspp <- occurrence(areaid = 34, startdepth = 500, mof = T, startdate = "2020-01-01")
hi <- unnest_extension(df = canspp, extension = "MeasurementOrFact", fields = "eventDate")
hi$eventDate <- as.Date(hi$eventDate)
hi$measurementValue <- as.numeric(hi$measurementValue)
filtered_data <- hi %>%
    filter(measurementType == "Standardized weight of total capture for 15 min tow")

head(filtered_data)