# Accessing OBIS data

OBIS holds more than 136 million records, across >5,000 datasets, and with >181 million measurements. This huge amount of data is driving research across a diverse range of topics.

Here we will explore how to access OBIS data through **R**, but all the operations can also be done through **Python**. Check the [`pyobis` module]() and the other notebooks in this repo. We will focus on getting the different types of data, rather than on its direct application/integration with remote sensing, although a few examples will be provided.

----

`robis` is the main gateway for OBIS data through R. You can learn more about `robis` in the [OBIS manual.](https://manual.obis.org/access.html#r-package)

Hands-on outline:

1. Downloading occurrence records 
2. Get species lists 
3. Events and time series
4. Extended measurements

In a second notebook, we will explore two other products:

1. Full export 
2. Gridded product 

Before starting, we need to install a few packages on Google Colab (if you are using Binder, just skip this cell):

In [None]:
# On your local computer you can use install.packages() for everything
# Here we use the system interface to some, as the installation
# with r2u is faster.
system("apt-get install r-cran-sf r-cran-terra") # Spatial packages
system("apt-get install r-cran-robis")           # OBIS API interface
system("apt-get install r-cran-arrow")           # To deal with full export
system("apt-get install r-cran-DBI")             # To deal with full export
system("apt-get install r-cran-duckdb")          # To deal with full export
system("apt-get install r-cran-rnaturalearth")   # For mapping
system("apt-get install r-cran-h3jsr")           # For indexing (H3 sytem)

# Install additional from GitHub
system("apt-get install r-cran-bspm")
bspm::enable()
devtools::install_github("iobis/obistools")
devtools::install_github("ropensci/mregions2")

We also need to download some sample data.

In [None]:
# Install AWS CLI interface, to get speciesgrids
system('curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"')
system('unzip awscliv2.zip')
system('sudo ./aws/install')
system('rm awscliv2.zip')
system('rm -r aws')

# Download some data that we will be using in this practical
#### Download species grids
fs::dir_create("speciesgrids")
system("aws s3 cp --recursive s3://obis-products/speciesgrids/h3_7 speciesgrids --no-sign-request")
#### Download full export sample

#### Download Copernicus data sample

### 1. Downloading occurrence records

To download records from OBIS we use the function `occurrence`. There are many arguments you can pass to download data for a specific species, taxonomic level or region.

We will start by getting data for three taxonomic entities:

<div style="display: flex; flex-direction: row; max-height: 200px; padding: 5px;">
<div>
<p>Species: <i>Lytechinus variegatus</i></p><img src="https://upload.wikimedia.org/wikipedia/commons/f/f5/Lytechinus_variegatus.jpg" height=200></img>
</div>
<div>
<p>Genus: <i>Lytechinus</i></p><img src="https://upload.wikimedia.org/wikipedia/commons/f/f5/Lytechinus_semituberculatus_12770656.jpg" height=200></img>
</div>
<div>
<p>Family: Toxopneustidae</p><img src="https://upload.wikimedia.org/wikipedia/commons/5/51/Toxopneustes_pileolus_%28Sea_urchin%29.jpg" height=200></img>
</div>
</div>

In [None]:
library(robis)
library(dplyr)

lych_var <- occurrence("Lytechinus variegatus")

nrow(lych_var)
head(lych_var)

We can make a map of the records. For that, we can use a handy function from the `obistools` package.

In [None]:
library(obistools)

plot_map(lych_var)

OBIS is different in that it was created and is optimized for marine data. It matches its taxonomic names (the species identity) with the World Register of Marine Species (WoRMS). Each species (or any other taxonomic level) has a unique ID called **AphiaID**. You can see for example the entry for _Lytechinus variegatus_ [here.](https://www.marinespecies.org/aphia.php?p=taxdetails&id=367850)

This is important because species names can change over time, in case a species is reclassified into another group or when we discover that one species can be in fact multiple species.

The `occurrence` function also accepts taxonid. So, the same we did on the first cell, can be done with:

In [None]:
# This is the same as
# lych_var <- occurrence("Lytechinus variegatus")
lych_var <- occurrence(taxonid = 367850)

Let's try the other taxonomic levels:

In [None]:
lych_genus <- occurrence("Lytechinus")

nrow(lych_genus)
table(lych_genus$scientificName)

toxop_fam <- occurrence("Toxopneustidae")

nrow(toxop_fam)
table(toxop_fam$scientificName)

Records are organized in **datasets** that group data that was collected in a particular survey, study, monitoring, etc. We can get additional information about datasets using the `dataset` function (which can also be used to list all datasets for a specific `scientificName`).

In [None]:
# Get the number of records by dataset
lych_var_ds <- lych_var |> 
    group_by(dataset_id) |> 
    summarise(records = n())

high_n <- lych_var_ds[order(lych_var_ds$records, decreasing = T), "dataset_id"]

ds_info <- dataset(datasetid = high_n$dataset_id)

head(ds_info[,1:5])