# Demo

This Jupyter notebook is a demo to show how to improve the postgreSQL species database of BIM and how to maintain it.

## Setup

Load functions and packages:

In [14]:
import os
import logging
import gbif_match
import vernacular_names
import exotic_status
import populate_scientificname_annex
from helpers import execute_sql_from_file, get_database_connection, get_config, setup_log_file

Define location of log file:

In [15]:
LOG_FILE_PATH = "./logs/transform_db.log"
setup_log_file(LOG_FILE_PATH)

We connect to (the copy of) the BIM database:

In [16]:
conn = get_database_connection()

Get access to the configuration details (server adress, demo mode, etc.) stored in config file `config.ini`:

In [17]:
config = get_config()

Is demo mode active?

In [18]:
demo = config.getboolean('demo_mode', 'demo')
print(demo)

True


Define annex file location and its demo version containing a small but significant subset of annex names:

In [19]:
__location__ = os.path.realpath(os.path.join(os.getcwd(), os.path.abspath('')))
# Full file with all names in official annexes
ANNEX_FILE_PATH = os.path.join(__location__, "../data/raw/official_annexes.csv")
# Annex demo version
ANNEX_FILE_PATH_DEMO = os.path.join(__location__, "../data/raw/official_annexes_demo.csv")

### Step 1: drop new tables if exist

This step is a kind of reset, quite useful during development, but not in production.

In [20]:
message = "Step 1: Drop our new tables if they already exists (idempotent script)"
print(message)
logging.info(message)
execute_sql_from_file(conn, 'drop_new_tables_if_exists.sql')

Step 1: Drop our new tables if they already exists (idempotent script)


<cursor object at 0x0000000004F43580; closed: 0>

### Step 2: create the new tables

Create the following tables:

1. `scientificname`: all names from `taxon`
2. `taxonomy`: taxonomy backbone of all scientific names. This table is entirely populated with information from GBIF Backbone
3. `scientificnameannex`: all names (scientific names or expressions) contained in official annexes
4. `vernacularname`: vernacular names of all taxa in `taxonomy`. Table entirely populated with information from GBIF.

In [21]:
message = "Step 2: create the new tables"
print(message)
logging.info(message)
execute_sql_from_file(conn, 'create_new_tables.sql')

Step 2: create the new tables


<cursor object at 0x000000000F726200; closed: 0>

### Step 3: populate the scientificname table based on the actual content

We populate the `scientificname` table with taxa in `taxon`. From `taxon` we select the fields:
1. `id`
2. `acceptedname`
3. `scientificnameauthorship`

and we store them as:
1. `deprecatedTaxonId`
2. `scientificName`
3. `authorship`

We select only the taxa in use, i.e. taxa which are used in any of the linked tables.

In [22]:
message = "Step 3: populate the scientificname table based on the actual content"
print(message)
logging.info(message)
execute_sql_from_file(conn, 'populate_scientificname.sql',
                      {'limit': config.get('transform_db', 'scientificnames-limit')})

Step 3: populate the scientificname table based on the actual content


<cursor object at 0x000000000F7263C0; closed: 0>

### Step 4: populate the scientificnameannex table based on official annexes

Similarly to previous step, we populate the scientificnameannex table with all names (scientific names or expresssions)
in official annexes, [`official_annexes.csv`](https://github.com/inbo/speciesbim/blob/master/data/raw/official_annexes.csv).
Some cleaning correcting typos or simplyfing taxa where possible is performed.

In this demo we use a small but significant subset, [`official_annexes_demo.csv`](https://github.com/inbo/speciesbim/blob/master/data/raw/official_annexes_demo.csv).

In [23]:
message = "Step 4: populate the scientificnameannex table based on official annexes"
print(message)
logging.info(message)
if not demo:
    populate_scientificname_annex.populate_scientificname_annex(conn, config_parser=config,
                                                                annex_file=ANNEX_FILE_PATH)
else:
    populate_scientificname_annex.populate_scientificname_annex(conn, config_parser=config,
                                                                annex_file=ANNEX_FILE_PATH_DEMO)

Step 4: populate the scientificnameannex table based on official annexes
Columns in C:\Users\damiano_oldoni\Documents\INBO\repositories\speciesbim\notebooks\../data/raw/official_annexes_demo.csv: annex_code, scientific_name_original, scientific_name_corrected, page_number, remarks
Number of taxa listed in official annexes and ordinances: 14
Total number of taxa inserted in scientificnameannex: 14
Table scientificnameannex populated in 1s.


### Step5: populate `taxonomy` table with matches to GBIF Backbone and corresponding backbone tree

In this step all scientific names in `scientificname` table are evaluated against the [_GBIF Backbone Taxonomy_](https://www.gbif.org/dataset/d7dddbf4-2cf0-4f39-9b2a-bb099caae36c) or simply _GBIF Backbone_.
If a match occurs, the taxon and its related tree is added to `taxonomy`. In case of synonyms, their corresponding accepted taxa are added as well.
Information about the match is added in `scientificname` table as well.

In [24]:
message = "Step 5: populate taxonomy table with matches to GBIF Backbone and related backbone tree " +\
          "and update scientificname table"
print(message)
logging.info(message)
if not demo:
    gbif_match.gbif_match(conn, config_parser=config, unmatched_only=False)
else:
    gbif_match.gbif_match(conn, config_parser=config, unmatched_only=False)

Step 5: populate taxonomy table with matches to GBIF Backbone and related backbone tree and update scientificname table
Number of taxa in scientificname table: 4 (demo mode)
Match names (scientificName + authorship) to GBIF Backbone (demo mode)
Timestamp used for this (whole) match process: 2020-08-28 22:55:26.640000
Try matching the "Rana ridibunda Pallas, 1771" name...
Recursively adding the taxon with GBIF key 2426662 (Rana ridibunda Pallas, 1771) to the taxonomy table
According to GBIF, this is *not* a root taxon, we'll insert parents first
    Recursively adding the taxon with GBIF key 2426629 (Pelophylax Fitzinger, 1843) to the taxonomy table
    According to GBIF, this is *not* a root taxon, we'll insert parents first
        Recursively adding the taxon with GBIF key 6746 (Ranidae) to the taxonomy table
        According to GBIF, this is *not* a root taxon, we'll insert parents first
            Recursively adding the taxon with GBIF key 952 (Anura) to the taxonomy table
      

### Step 6: vernacular names

We retrieve all vernacular names of all taxa in `taxonomy` table for a selected number of langauges: French, Dutch and
English. The names are stored in table `vernacularnames`.

In [12]:
message = "Step 6: populate vernacular names from GBIF for each entry in the taxonomy table"
print(message)
logging.info(message)
# list of 2-letters language codes (ISO 639-1)
languages = ['fr', 'nl', 'en']
vernacular_names.populate_vernacular_names(conn, config_parser=config, empty_only=False, filter_lang=languages)

Step 6: populate vernacular names from GBIF for each entry in the taxonomy table
We'll now load vernacular names for 17 entries in the taxonomy table. Languages: fr, nl, en
Now saving 'Animals'(en) for taxon with ID: 1 (source: Phthiraptera.info)
Now saving 'animals'(en) for taxon with ID: 1 (source: Integrated Taxonomic Information System (ITIS))
Now saving 'animaux'(fr) for taxon with ID: 1 (source: Integrated Taxonomic Information System (ITIS))
Now saving 'dieren'(nl) for taxon with ID: 1 (source: Belgian Species List)
Now saving 'animals'(en) for taxon with ID: 1 (source: World Register of Marine Species)
Now saving 'animals'(en) for taxon with ID: 1 (source: World Register of Introduced Marine Species (WRiMS))
Now saving 'animaux'(fr) for taxon with ID: 1 (source: World Register of Marine Species)
Now saving 'animaux'(fr) for taxon with ID: 1 (source: World Register of Introduced Marine Species (WRiMS))
Now saving 'dieren'(nl) for taxon with ID: 1 (source: World Register of Marin

### Step 7: add exotic status of taxa in `taxonomy`

The exotic status (`True` or `False`) for all taxa in `taxonomy` is filled by consulting the GBIF checklist
[_Global Register of Introduced and Invasive Species - Belgium_](https://www.gbif.org/dataset/6d9e952f-948c-4483-9807-575348147c7e).

In [13]:
message = "Step 7: populate field exotic_be (values: True of False) from GRIIS checklist for each entry in " \
          "taxonomy table "
print(message)
logging.info(message)
# GBIF datasetKey of checklist: Global Register of Introduced and Invasive Species - Belgium
griis_be = "6d9e952f-948c-4483-9807-575348147c7e"
exotic_status.populate_is_exotic_be_field(conn, config_parser=config, exotic_status_source=griis_be)


Step 7: populate field exotic_be (values: True of False) from GRIIS checklist for each entry in taxonomy table 
We'll now retrieve the GBIF checklist containing the exotic taxa in Belgium, datasetKey: 6d9e952f-948c-4483-9807-575348147c7e.
Retrieved 2891 exotic taxa in 32s.
We'll now update exotic_be field for 17 taxa of the taxonomy table.
Taxon Pelophylax ridibundus (Pallas, 1771) (gbifId: 2426661) is exotic in Belgium.
    Taxon Rana ridibunda Pallas, 1771 (gbifId: 2426662) is exotic in Belgium.
Taxon Reynoutria japonica Houtt. (gbifId: 2889173) is exotic in Belgium.
    Taxon Fallopia japonica (Houtt.) Ronse Decraene (gbifId: 5334357) is exotic in Belgium.
4 exotic taxa found in taxonomy.
Field exotic_be updated for 17 taxa in taxonomy in 0.05s.
