# Demo

This Jupyter Notebook demonstrates the changes to the BIM species database and how the taxonomic information can be maintained.

## Setup

Load functions and packages:

In [None]:
import os
import sys
module_path = os.path.abspath(os.path.join('../scripts'))
if module_path not in sys.path:
    sys.path.append(module_path)

In [None]:
import sqlalchemy as db
import logging
import gbif_match
import vernacular_names
import exotic_status
import populate_scientificname_annex
from helpers import execute_sql_from_file, get_database_connection, get_config, setup_log_file

Define location of log file:

In [None]:
LOG_FILE_PATH = "./logs/transform_db.log"
setup_log_file(LOG_FILE_PATH)

Connect to (a copy of) the BIM database:

In [None]:
conn = get_database_connection()

Get access to the configuration details (server address, demo mode, etc.) stored in config file `config.ini`:

In [None]:
config = get_config()

Is demo mode active?

In [None]:
demo = config.getboolean('demo_mode', 'demo')
demo

Define annex file location and its demo version containing a small but significant subset of annex names:

In [None]:
__location__ = os.path.realpath(os.path.join(os.getcwd(), os.path.abspath('')))
# Full file with all names in official annexes
ANNEX_FILE_PATH = os.path.join(__location__, "../data/raw/official_annexes.csv")
# Annex demo version
ANNEX_FILE_PATH_DEMO = os.path.join(__location__, "../data/raw/official_annexes_demo.csv")

Define dataset key of the [_Global Register of Introduced and Invasive Species - Belgium_](https://www.gbif.org/dataset/6d9e952f-948c-4483-9807-575348147c7e):

In [None]:
GRIIS_DATASET_UUID = "6d9e952f-948c-4483-9807-575348147c7e"

Finally, define a SQLAlchemy connection to show changes of the database in this demo:

In [None]:
user = config.get('database', 'user')
pwd = config.get('database', 'password')
host = config.get('database', 'host')
port = config.get('database', 'port')
dbname = config.get('database', 'dbname')
db_conn = f'postgresql://{user}:{pwd}@{host}:{port}/{dbname}'
db.create_engine(db_conn)

In [None]:
%load_ext sql
%sql $db_conn

## Create the new tables

Create the following tables:

1. `scientificname`: table with scientific names
2. `taxonomy`: taxonomy backbone of all scientific names. Table entirely populated with information from GBIF Backbone
3. `scientificnameannex`: all names (scientific names or expressions) contained in official annexes
4. `vernacularname`: vernacular names of all taxa in `taxonomy`. Table entirely populated with information from GBIF

In [None]:
message = "Step 2: create the new tables"
print(message)
logging.info(message)
execute_sql_from_file(conn, 'create_new_tables.sql')

These tables can be dropped and recreated if errors occur in any of the following steps.

## Populate the `scientificname` table based on the actual content

We populate the `scientificname` table with taxa in `taxon`. From `taxon` we select the fields:
1. `id`
2. `acceptedname`
3. `scientificnameauthorship`

and we store them as:
1. `deprecatedTaxonId`
2. `scientificName`
3. `authorship`

We select only the taxa in use, i.e. taxa which are used in any of the linked tables.

In [None]:
message = "Step 3: populate the scientificname table based on the actual content"
print(message)
logging.info(message)
execute_sql_from_file(conn, 'populate_scientificname.sql',
                      {'limit': config.get('transform_db', 'scientificnames-limit')})

Preview `scientificname` table:

In [None]:
%sql SELECT * FROM biodiv.scientificname LIMIT 10

Number of names in `scientificname`table:

In [None]:
%sql SELECT COUNT(*) from biodiv.scientificname

## Populate the `scientificnameannex` table based on official annexes

Similarly to previous step, we populate the `scientificnameannex` table with all names (scientific names or expresssions) listed in official annexes. These are stored in an external file: [`official_annexes.csv`](https://github.com/inbo/speciesbim/blob/master/data/raw/official_annexes.csv). Where possible, some type correcting or simplifying taxa was performed.

In this demo we use a small but significant subset of these names: [`official_annexes_demo.csv`](https://github.com/inbo/speciesbim/blob/master/data/raw/official_annexes_demo.csv).

In [None]:
message = "Step 4: populate the scientificnameannex table based on official annexes"
print(message)
logging.info(message)
if not demo:
    populate_scientificname_annex.populate_scientificname_annex(conn, config_parser=config,
                                                                annex_file=ANNEX_FILE_PATH)
else:
    populate_scientificname_annex.populate_scientificname_annex(conn, config_parser=config,
                                                                annex_file=ANNEX_FILE_PATH_DEMO)

Preview `scientificnameannex` table:

In [None]:
%sql SELECT * FROM biodiv.scientificnameannex

## Populate `taxonomy` table with matches to GBIF Backbone and corresponding backbone tree

In this step all scientific names in `scientificname` table are evaluated against the [_GBIF Backbone Taxonomy_](https://www.gbif.org/dataset/d7dddbf4-2cf0-4f39-9b2a-bb099caae36c) or simply _GBIF Backbone_.
If a match occurs, the taxon and its related tree is added to `taxonomy`. In case of a synonym, the corresponding accepted taxon is added as well.

In this demo, we will focus on a small subset of names:
- _Mellitiosporium pteridium_: no match to GBIF Backbone will be found
- _Rana ridibunda_: synonym of _Pelophylax ridibundus_
- _Fallopia japonica_: exotic and synonym of _Reynoutria japonica_
- _Sonchus_: accepted genus

In [None]:
message = "Step 5: populate taxonomy table with matches to GBIF Backbone and related backbone tree " +\
          "and update scientificname table"
print(message)
logging.info(message)
gbif_match.gbif_match(conn, config_parser=config, unmatched_only=False)

In [None]:
%sql SELECT * FROM biodiv.taxonomy

When there is a match, the `taxonomyId` is populated in `scientificname` to make a connection between the two tables.

In [None]:
%%sql 
SELECT * FROM biodiv.scientificname 
WHERE "scientificName" IN (
    'Mellitiosporium pteridium', -- no matchc to GBIF Backbone
    'Rana ridibunda', -- Synonym of Pelophylax ridibundus
    'Fallopia japonica', -- Exotic and synonym of Reynoutria japonica
    'Sonchus' -- accepted genus
)

Everytime existing names are improved or added, this step can be repeated using the parameter `unmatched_only=True` in `gbif_match()`. However, we suggest to update the entire table (`unmatched_only=False`) at least every year in order to update the table with taxonomic changes from the GBIF Backbone.

This step populates also the table `rank`:

In [None]:
%sql SELECT * FROM biodiv.rank

## Vernacular names

In this step we lookup all vernacular names recorded at GBIF for all taxa in `taxonomy`. This is done for the languages French, Dutch and English and stored in the table `vernacularnames`.

In [None]:
message = "Step 6: populate vernacular names from GBIF for each entry in the taxonomy table"
print(message)
logging.info(message)
# list of 2-letters language codes (ISO 639-1)
languages = ['fr', 'nl', 'en']
vernacular_names.populate_vernacular_names(conn, config_parser=config, empty_only=False, filter_lang=languages)

Show table `vernacularnames`:

In [None]:
%sql SELECT * FROM biodiv.vernacularname

As for the previous step, we recommend to update this table using the `empty_only=True` parameter in `populate_vernacular_names()` every time new names are added or improved. 

## Add exotic status of taxa in `taxonomy`

The exotic status (`True` or `False`) for all taxa in `taxonomy` is filled by consulting the GBIF checklist
[_Global Register of Introduced and Invasive Species - Belgium_](https://www.gbif.org/dataset/6d9e952f-948c-4483-9807-575348147c7e):

In [None]:
message = "Step 7: populate field exotic_be (values: True of False) from GRIIS checklist for each entry in " \
          "taxonomy table "
print(message)
logging.info(message)
exotic_status.populate_is_exotic_be_field(conn, config_parser=config, exotic_status_source=GRIIS_DATASET_UUID)


Exotic taxa:

In [None]:
%sql SELECT * FROM biodiv.taxonomy WHERE exotic_be IS TRUE

This step should be repeated everytime the `taxonomy` table changes. 