# Using AphiaIDs to download OBIS occurrences for Bio/Eco EOVs

Created: 2024-09-13

The Global Ocean Observing System (GOOS) is a global network of ocean observing systems led by the Intergovernmental Oceanographic Commission of UNESCO. NOAA's U.S. Integrated Ocean Observing System (IOOS) is a part of this global network. GOOS has developed Essential Ocean Variables (EOVs) to help harmonize data that is collected across various ocean observing systems around the globe. The [GOOS Biology and Ecosystems Variables](https://goosocean.org/what-we-do/framework/essential-ocean-variables/) are focused on the abundance and distribution of specific groups of aquatic organisms that are important for ecosystems. 

The [IOOS Marine Life Data Network](https://ioos.github.io/marine_life_data_network/) has developed [lists of biological taxa](https://https://github.com/ioos/marine_life_data_network/tree/main/eov_taxonomy) and their identifiers (in this case, aphiaIDs) that can be used to query published biological occurrence data in the [Ocean Biodiversity Information System](https://www.obis.org/) (OBIS).

OBIS uses the [World Register of Marine Species](https://marinespecies.org) (WoRMS) to provide a taxonomic backbone for all of the records in the database, and WoRMS only contains marine species. Therefore, the lists developed by the IOOS Marine Life Data Network leverage the taxonomic scope of WoRMS to perform high-level queries without requiring a detailed list of every species of seabird, for example. Once a query is complete, users can begin analyzing OBIS occurrence data to assess the abundance, distribution, and other characteristics of the taxa in these BioEco Variables.

This notebook provides an example in R for how to use the Marine Life Data Network's list of AphiaIDs for the GOOS BioEco Variables to perform an OBIS query. This particular notebook uses mangroves as an example, but the query could be edited for any of the BioEco EOVs.

In [None]:
library(readr)
library(robis)
library(dplyr)
library(htmlwidgets)

First, we will pull the file with the mangrove aphiaIDs from the Marine Life Data Network GitHub repo.

**Note**: the acceptedTaxonIds in these files are based on what was up-to-date in the WoRMS database as of the date this script was written.

In [5]:
mangroves <- read.csv("https://raw.githubusercontent.com/ioos/marine_life_data_network/main/eov_taxonomy/mangroves.csv")

Now we will do a bit of cleanup to get a list of aphiaIDs for mangroves so we can run our [`robis`](https://iobis.github.io/robis/articles/getting-started.html) query using these as taxon identifiers.

In [3]:
mangroves$ID <- gsub("urn:lsid:marinespecies.org:taxname.", "", mangroves$acceptedTaxonId)
mangroves$ID <- as.numeric(mangroves$ID)
mangroveIdentifiers <- paste(mangroves$ID, collapse = ", ")

Using the taxonIDs from the last step, let's query OBIS for occurrence data for mangroves. This step may take a bit of time to run. When this script was written, there were over 130,000 records for mangroves in OBIS.


**Note**: You can edit this query to pull down less data if you are not planning to use every field. Here is how that might look.

`mangrove_occ <- robis::occurrence(taxonid = mangroveIdentifiers, fields = c("occurrenceID", "species", "decimalLongitude", "decimalLatitude", "date_year"))`

In [None]:
mangrove_occ <- robis::occurrence(taxonid = mangroveIdentifiers)
# let's check how many occurrences we got from OBIS
nrow(mangrove_occ)

Now that we have all of our mangrove records from OBIS, we will map the global distribution of records using the `map_leaflet` function found in the `robis` package. With the leaflet functionality, you can zoom into records, click them, and it list the scientific name for that occurrence record.

In [None]:
m <- map_leaflet(mangrove_occ,
            provider_tiles = "Esri.WorldGrayCanvas",
            popup = function(x) { x["scientificName"] },
            )
m

This next step is not required, but if you'd like to save this map to view it outside of R, here's how.

In [None]:

saveWidget(m, "mangroveMap.html", selfcontained = TRUE)

From here, you could further subset the data by species or year to run more detailed analyses about mangrove biology. You could also use the map of mangrove species to identify geographic gaps where more occurrence data might be needed to better, or to identify errors in the data. Do you see the dot on the mangrove map in the Arctic?

The purpose of this notebook is, because you used the AphiaIDs to search for all families and genera of known mangrove species, you can be assured that you have all of the mangrove data currently published to OBIS. This could be repeated for any of the other lists of AphiaIDs for BioEco EOVs. This is a beginning step to many possible analyses.

For more information and code about how to get EOV data from OBIS, see the [NOAA GIS For The Ocean GitHub project](https://github.com/NOAA-GIS4Ocean/BioEco_EOV).