<img src="obis.jpg"/>
<h1> OBIS RESTful API Walkthrough</h1>
<p> OBIS REST API can be accessed using python request library.</p>
<p>the Recommended version of python is <b>Python 3.7+</b> which can be downloaded from <a href="https://python.org/">this</a> link.<br>The required python libraries are listed in the accompanying environment.yml file</p>
<hr>
<h2> Python Imports </h2>
<p>Always run the cell below to initialize your python running environment. The requests package will be used to make calls to the OBIS API return JSON objects<br>
    More information about requests can be obtained <a href="https://requests.readthedocs.io/">here</a></p>

In [None]:
# Import requests and set the OBIS API base URL. 
import requests
import json
import pandas as pd
import urllib

# Convenience function to pretty print JSON objects
def print_json(myjson):
    print(json.dumps(
        myjson,
        sort_keys=True,
        indent=4,
        separators=(',', ': ')
    ))
    

# Initialize the base URL for OBIS. This variable will be used for every API call
OBIS_URL = "https://api.obis.org/v3/"

## OCCURRENCE
OBIS occurrence data API takes several variables to return the relevant occurrence record data. 

<h4> Designing Query Strings </h4>
<p>Creating such a long query strings are prone to errors and are difficult to debug when an issue occurs.

Using a python dictionary along with <b>urlib</b> we can simplify the process for creating complicated url queries.</p>


In [None]:
# Here a complete dictionary of terms we'll use to create our occurrence query string.
# The list includes definitions from https://api.obis.org/#/Occurrence/get_occurrence with a few expanded where needed.
query = {
"scientificname": None, # (string) Scientific name. Leave empty to include all taxa.
"taxonid": None, # (string) Taxon AphiaID.
"datasetid": None, # (string) Dataset UUID.
"areaid": None, # (string) Area ID.
"instituteid": None, # (string) Institute ID.
"nodeid": None, # (string) Node UUID.
"startdate": None, # (string) Start date formatted as YYYY-MM-DD.
"enddate": None, # (string) End date formatted as YYYY-MM-DD.
"startdepth": None, # (integer) Start depth, in meters.
"enddepth": None, # (integer) End depth, in meters.
"geometry": None, # (string) Geometry, formatted as WKT.
"redlist": None, # (string) Red List species only, true/false.
"hab": None, # (boolean) HAB species only, true/false.
"mof": 'true', # (boolean) Include measurements, true/false.
"measurementtype": None, # (string) Measurement type to be present for occurrence.
"measurementtypeid": None, # (string) Measurement type ID to be present for occurrence.
"measurementvalue": None, # (string) Measurement value to be present for occurrence.
"measurementvalueid": None, # (string) Measurement value ID to be present for occurrence.
"measurementunit": None, # (string) Measurement unit to be present for occurrence.
"measurementunitid": None, # (string) Measurement unit ID to be present for occurrence.
"exclude": None, # (string) Comma separated list of flags to exclude
"fields": None, # (string) Fields to include in record set, Leave as NULL to return all fields (comma separated)
"after": None, # (string) Occurrence UUID up to which to skip
"size": 10 # (integer) Response size - how many results to query
}

Note: When setting booleans for RESTful APIs in Python, use a string lowercase 'true' or 'false' value instead of Python's built in varible types

In [None]:
# using dictionary concatination to remove any 'None' (NULL) values
query = {key:query[key] for key in query.keys() if query[key] != None}
query_str = urllib.parse.urlencode(query, doseq=False)

# construct the query and show the output
occur_query_string = f'{OBIS_URL}occurrence?{query_str}'
occur_query_string

In [None]:
# Now lets use our query to get occurence data

# occurrence?
req = requests.get(f'{OBIS_URL}/occurrence?{query}')
req.json()['results']

In [None]:
# Looking over the data output, it's clear we need to further define our query to get only the results we need.

# Rewrite the query to take all the 2014 Atlantic Salmon occurance records
query = {"scientificname": "Salmo salar",
         "startdate": "2014-01-01",
         "enddate" : "2014-12-31",
         #"fields": "" #'decimalLatitude,decimalLongitude'
        }
query = urllib.parse.urlencode(query, doseq=False)

# occurrence?
req = requests.get(f'{OBIS_URL}/occurrence?{query}')
req.json() # returns 30 Atlantic salmon occurences

### Using Occurrence Call to return geometry 
OBIS's Occurrence API call includes additional arguments that can take your initial query and return geometry.

 - `/occurrence/centroid` - return a central point for the queried occurrence records
 - `/occurrence/grid/{precision}` - returns a bounding volume that contains all queried occurrence records.
 - `/occurrence/grid/{precision}/kml` - return bounding volume in kml format
 - `/occurrence/grid/points` - return unique points of all the occurrence records returned from the query
 - `/occurrence/grid/point/{x}/{y}` - return only occurrence records at a specific location point
 - `/occurrence/grid/point/{x}/{y}/{z}` - return only occurrence records at a specific location/depth point
 - `/occurrence/tile/{x}/{y}/{z}` - not entirely sure about this one.

## TAXON
Basic WoRMS functionality used to retrieve taxonomic data. 

For more complete suite of queries, use the WoRMS RESTful API directly. Examples for using WoRMS can be found here:  <a href="Worms API Walkthrough.ipynb"> WoRMS Walkthrough</a>

In [None]:
# Lets query use the taxon function using the sceintificname Salmo salar (Atlantic salmon)

scientific_name = 'Salmo salar'

# taxon/
req = requests.get(f'{OBIS_URL}/taxon/{scientific_name}')
req.json()

In [None]:
# We get back a taxonID which is the same as a WoRMS AphiaID
# now feed that id into the same call to taxon
taxonID = 127186

# taxon/
req = requests.get(f'{OBIS_URL}/taxon/{taxonID}')
req.json()

The Taxon API call works with both scientific name and taxonIDs

## Checklist
Returns taxonomic records fitting certian criteria. All the query terms are decribed in depth on OBIS's website:
<a href="https://api.obis.org/#/Checklist"> here </a>

In [None]:
# Given a taxonID value the checklist performs the same as the taxon API call
taxonID = 127186

# checklist?
req = requests.get(f'{OBIS_URL}/checklist?taxonid={taxonID}')
print_json(req.json())

In [None]:
# We can change the query to only include the records listed on the IUCN redlist:

# checklist/redlist
req = requests.get(f'{OBIS_URL}/checklist/redlist')
print_json(req.json())

In [None]:
# Also with the taxonID entered, we can check if the Atlantic Salmon is on the IUCN redlist

# checklist/redlist?taxonid=
req = requests.get(f'{OBIS_URL}/checklist/redlist?taxonid={taxonID}')
print_json(req.json())

In [None]:
# If we want to check what species have been added recently, use the /checklist/newest call 
# note: this call takes more time to process

# /checklist/newest
req = requests.get(f'{OBIS_URL}/checklist/newest')
print_json(req.json())

## Node
Get information on a OBIS Node given the node ID value. If the node isn't set, the *node* API call will return all of the OBIS node records.

In [None]:
# We are not sure which node ID to query so lets get all of the OTN nodes.

# node
req = requests.get(f'{OBIS_URL}/node')
nodes_json = req.json()

# count the amount of OBIS nodes
nodes_json['total']

In [None]:
# Print all the names and IDs for each node
for node in nodes_json['results']:
    print(f'Name: {node["name"]} - ID: {node["id"]}')

In [None]:
# Oh look, the Ocean Tracking Network is a OBIS node, lets just return our record using the id value:
nodeID = '68f83ea7-69a7-44fd-be77-3c3afd6f3cf8'
# node/{nodeID}
req = requests.get(f'{OBIS_URL}/node/{nodeID}')
otn_json = req.json()

# Show OTN's OBIS node record
print_json(otn_json)

### Node Activities
Gets a OBIS nodes reported activities

In [None]:
# Getting the activity records for the Antarctic OBIS node
Antarctic_nodeID = 'dc6c6ea2-83f5-4b18-985a-9efff6320d69'

#/node/{id}/activities
req = requests.get(f'{OBIS_URL}/node/{Antarctic_nodeID}/activities')
activities = req.json()

# Show Antarctic OBIS node activities
print_json(activities)

## Dataset
Query information about OBIS and OBIS node held datasets.

In [None]:
# We are now check the amount of obis held datasets on species listed on the IUCN Red List

# /dataset?
req = requests.get(f'{OBIS_URL}/dataset?redlist=true')
datasets = req.json()
print('Number of Red List datasets in OBIS:', datasets['total'])

In [None]:
# That's quite the amount of datasets but we only want a single specific dataset
# Using the pandas library we can turn the datasets return into easily searchable DataFrame

# feed the retruned json's results values into a pandas DataFrame constructor
dataset_df = pd.DataFrame(datasets['results'])

# Say we want to narrow our dataset down to only abstracts that mention turtles?
turtle_data = dataset_df[dataset_df.abstract.str.lower().str.contains('turtle')]
turtle_data.head()

In [None]:
# If we know the dataset ID beforehand we can query OBIS directly for it's dataset record
dataset_id = "ca78b5b9-d4e4-4ab0-bbe1-9f75659769e2"

# dataset/id
req = requests.get(f'{OBIS_URL}/dataset/{dataset_id}')
req.json()

The **url** value returned in the dataset JSON points to the datasets location.

# Institute
Returns OBIS held institution records. 

Institution records provide brief summaries including **id**, **name**, **country**, **parent** institution, **children** institutions and the number of **records** held in obis.

In [None]:
# We can use the code in this cell to get the names and ids for all the institution listed on OBIS

# institute/
req = requests.get(f'{OBIS_URL}/institute')
for inst in req.json()['results']:
    print(inst['name'], inst['id'])

In [None]:
# It appears that Ocean Tracking Network's Institution ID is 18704
# Lets query this ID directly
institution_id = 18704
# institute/id
req = requests.get(f'{OBIS_URL}/institute/{institution_id}')
req.json()

# Area / Country
Area records represent a list of areas reported in OBIS. Country RESTful call returns a list of country records in the OBIS system.

In [None]:
# lets just get all of the OBIS area records

# /area
req = requests.get(f'{OBIS_URL}/area/')
req.json()

In [None]:
# and the same for country

# /country
req = requests.get(f'{OBIS_URL}/country')
req.json()

In [None]:
# 59 results? It looks like the OBIS country API call will not return the complete set of country records

# With a little bit code we can retrive the first 9 missing records
for mid in range(1,9):
    req = requests.get(f'{OBIS_URL}/country/{mid}')
    country_json = req.json()
    if country_json['results']:
        print(country_json['results'])

## Statistics
Get basic statistics on all OBIS datasets. Every call to **statics** can use the same query string arguments as the calls to **occurrence**, **institute**, **dataset** and the **checklist**

In [None]:
# the presence data query takes no additional arguments 

# /statistics
req = requests.get(f'{OBIS_URL}/statistics')
req.json()

## Other statistics records
other base stats can be retrieved by providing the following arguments to the statistics call:

 - `/statistics/absence` - Get basic statistics for absence records.
 - `/statistics/dropped` - Get basic statistics for dropped records.
 - `/statistics/all` - Get basic statistics for all records (presence, absence, and dropped).
 - `/statistics/all/count` - Get count only for all records (presence, absence, and dropped).
 - `/statistics/years` - Get number of presence records per year.
 - `/statistics/env` - Get number of records per SST, SSS or depth bin.
 - `/statistics/qc` - Get a QC summary, including missing or invalid values, number of records on land, number of non marine records and number of records without Aphia ID.
 - `/statistics/composition` - Get an overview of taxonomic composition.
 - `/statistics/outliers` - Get SST and SSS distribution quartiles.
 

source: https://api.obis.org/#/Statistics