In [1]:
import os, sys

# Global Invasive and Alien Traits and Records (GIATAR) Database - query functions tutorial

Welcome to the tutorial for the query functions supplied with the GIATAR database!
The file containing query functions <'GIATAR_query_functions.py'> is availible the queries folder of the supporting github repository and in the queries folder of the released database folder (This tuturial should be stored adjacently)

These functions simplify the process of querying and joining the database and typically return pandas dataframes, to simplify the process of analysis. While there is considerably more information in the database than is acessible through these tools, we hope they will simplify the most common operations for database users. 
## Environment
The environment.yml file supplied with the code for this project will suffice for the query functinos here. However, it contains a complete suite of packages for database updating, some of which may be tricky to install. These query functions rely mostly on basic python packages (pandas, numpy etc) with the exception of pygbif, which can be installed with pip. 
## Setting paths
### to functions
We suggest putting the ```GIATAR_query_functions.py``` file (availible in the queries folder of the DOI released database or on the project GitHub ```https://github.com/ncsu-landscape-dynamics/GIATAR-database``` for the most current version)  in your project directory. 

### to data
```GIATAR_query_functions.py``` contains data_path as the first line of code in the file - set this to the database directory of GIATAR. If you prefer, you can call ```create_dotenv(pathtodata)``` to create a .env that permanently sets this path

## usageKeys
Unique ID Keys for species in the database are referred to as usageKeys, following the structure and naming of usageKeys from GBIF. Where possible, we have retained GBIF usageKeys as unique ID's for taxa - otherwise we have generated unique ID's that wont overlap with GBIF usageKeys

## functions
 returns species name as string - takes usageKey as string or int

### get_usageKey(species_name) 
 returns usageKey as string - takes species name as string

### get_all_species() 
 returns list of all species names in database - no inputs

### check_species_exists(species_name) 
 returns True or False - takes species name as string

### get_first_introductions(usageKey, check_exists=False, ISO3_only=False, import_additional_native_info=True) 


returns dataframe of first introductions takes usageKey as string or int

check_exists=True will raise a KeyError if species is not in database

ISO3_only=True will return only return species location info that are 3 character ISO3 codes. Some other location info includes bioregions or other geonyms

import_additional_native_info=True will import additional native range info, first by seeing if native range info for a particular country is availible from sources that reported later than the first introduction, and second by importing native range info from the file of native range info unique to GIATAR

### get_all_introductions(usageKey, check_exists=False, ISO3_only=True) 

returns dataframe of all introductions - takes usageKey as string or int 
check_exists=True will raise a KeyError if species is not in database
ISO3_only=True will return only return species location info that are 3 character ISO3 codes. Some other location info includes bioregions or other geonyms

import_additional_native_info=True will import additional native range info, first by seeing if native range info for a particular country is availible from sources that reported later than the first introduction, and second by importing native range info from the file of native range info unique to GIATAR

### get_ecology(species_name) 
 returns dictionary of dataframes of ecology info - takes species name as string. Ecology info returned by this function includes rainfall, airtemp, climate, lat/altitude, water temp and wether a pest utilizing wood packaging.

 Ecological info is variously formatted for different species - e.g. air temperature might include max, min, range or other info. We reccomend spending time with the outputs to find the information you want. 



### get_hosts_and_vectors(species_name) 
returns dictionary of dataframes of host and vector info - takes species name as string
This tool returns hosts (plant hosts for herbivorous insects, animal hosts for diseases and parasites) and vectors (either zoonotic or plant vectors - mostly for diseases)



### get_species_list(kingdom=None, phylum=None, taxonomic_class=None, order=None, family=None, genus=None)
 returns list of usageKeys matching taxonomic criteria - takes kingdom, phylum, taxonomic_class, order, family, genus as strings. This function can help select a group of organisms in the database matching the search criterion. Note that the term <code> class </code> is protected in python, so we refer to the taxonomic grouping as taxonomic_class



### get_native_ranges(usageKey, ISO3=None) 
returns dataframe of native ranges - takes usageKey as string or int.

The GIATAR database stores native range information in several ways - some better-studied species have native information as a binary true/false for the country level. Many other species have native range information stored only as biogeographic zones e.g. palearctic. We provide functionality to map this biogeographic zone data to presence-absence t/f using a crosswalk, which is availible in the native ranges subfolder of the database.

When the user calls <code> get_native_ranges() </code> and ISO3 is set to None, the function returns all avalible information about the species. 
If the user wishes to use the native-range to country-presence crosswalk, they should provide ISO3 as a python list of ISO3 standard country codes e.g. <code> ['USA','CHN] </code> - the function will then use the crosswalk to provide true/false information on the native status of the species if there is biogeographic information avalible. When biogeographic information and more specific country/native binary information is avalible, the function defaults to the more specific country true/false info. 

ISO3=list returns dataframe of native ranges and True or False if species is native to ISO3 - takes a list of ISO3 codes for countries as input. See examples below for context. 







In [1]:
import GIATAR_query_functions as gqf
import pandas as pd



In [3]:
## Basic operations
get_all_species = gqf.get_all_species()
get_all_species



['Fusarium solani',
 'Acridotheres cristatellus',
 'Macrorhynchia philippina',
 'Peronospora sp.',
 'Peronospora aquilegiicola',
 'Carthamus oxyacanthus',
 'Thunbergia erecta',
 'Butia capitata',
 'Helianthus debilis',
 'Helianthus debilis',
 'Leucophyllum frutescens',
 'Epiphyllum oxypetalum',
 'Trirachys sp.',
 'Trirachys sartus',
 'Trirachys sp.',
 'Trirachys sartus',
 'Ludwigia palustris',
 'Galphimia gracilis',
 'Terminalia muelleri',
 'Laelia rubescens',
 'Bonamia ostreae',
 'Galphimia glauca',
 'Tuberose mild mottle virus',
 'Euphorbia leucocephala',
 'Senna italica',
 'Emilia praetermissa',
 'Lycorma delicatula',
 'Ixora coccinea',
 'Peronosclerospora philippinensis',
 'Salmo salar',
 'Agave vivipara',
 'Bothriochloa bladhii',
 'Carpobrotus chilensis',
 'Magallana gigas',
 'Xanthomonas vasicola',
 'Xanthomonas vasicola',
 'Xanthomonas vasicola',
 'Xanthomonas vasicola',
 'Ostrea edulis',
 'Grapevine red blotch virus',
 'Diaporthe eres',
 'Cornus sericea',
 'Cornus sericea',
 'M

In [4]:
gqf.check_species_exists("Ailanthus altissima")
gqf.get_usageKey("Ailanthus altissima")


'3190653'

In [2]:
#Let's pull some introduction records
gqf.get_first_introductions("Apis mellifera", import_additional_native_info=True)

#some species have multiple introduction records for some countres
#gqf.get_all_introductions("Apis mellifera", import_additional_native_info=False)

Unnamed: 0,usageKey,ISO3,year,Source,Reference,Native,Type
29216,1341976,ABW,2016.0,GBIF,Counts API,False,First report
29217,1341976,AFG,1975.0,GBIF,Counts API,False,First report
29218,1341976,AGO,1970.0,GBIF,Counts API,True,First report
29219,1341976,AIA,2017.0,GBIF,Counts API,False,First report
29220,1341976,ALA,2014.0,GBIF,Counts API,True,First report
...,...,...,...,...,...,...,...
29436,1341976,XK,2007.0,GBIF,Counts API,,First report
29437,1341976,YEM,1982.0,GBIF,Counts API,True,First report
29438,1341976,ZAF,1974.0,GBIF,Counts API,True,First report
29439,1341976,ZMB,1972.0,GBIF,Counts API,True,First report


In [3]:
gqf.get_ecology("Thrips tabaci")

{}

In [4]:
gqf.get_hosts_and_vectors('Icerya purchasi')

{'CABI_tohostPlants':      usageKey   code       section                         Plant name  \
 2715  2080592  28432  tohostPlants                   Acacia (wattles)   
 2716  2080592  28432  tohostPlants                     Acacia confusa   
 2717  2080592  28432  tohostPlants   Acacia dealbata (acacia bernier)   
 2718  2080592  28432  tohostPlants              Acalypha (Copperleaf)   
 2719  2080592  28432  tohostPlants    Albizia julibrissin (silk tree)   
 ...       ...    ...           ...                                ...   
 2779  2080592  28432  tohostPlants                    Syringa (lilac)   
 2780  2080592  28432  tohostPlants             Ulex europaeus (gorse)   
 2781  2080592  28432  tohostPlants   Vaccinium corymbosum (blueberry)   
 2782  2080592  28432  tohostPlants  Virgilia capensis (snowdrop tree)   
 2783  2080592  28432  tohostPlants                   Viscum cruciatum   
 
              Family  Context                              References  
 2715       Fabac

In [5]:
#the get_introductions functions contain native range information, but a user might want more details on the sources of that information
gqf.get_native_ranges("Apis mellifera")
# we include 

Unnamed: 0,ISO3,Source,Native,Reference,bioregion,DAISIE_region
0,ARG,ASFR,False,CABI ISC,,
1,BOL,ASFR,False,CABI ISC,,
2,BRA,ASFR,False,CABI ISC,,
3,CHL,ASFR,False,Fuentes et al. (2020),,
4,CHN,ASFR,False,Wan et al. (2016),,
5,COL,ASFR,False,CABI ISC,,
6,CRI,ASFR,False,CABI ISC,,
7,ECU,ASFR,False,CABI ISC,,
8,SLV,ASFR,False,CABI ISC,,
9,GUF,ASFR,False,CABI ISC,,


In [7]:
gqf.get_common_names("Pancratium maritimum").keys()

   DAISIE_idspecies                   name language  source   usageKey
0                 5    Murphy's Threadwort  English  DAISIE  8180725.0
1                 6      Long's Threadwort  English  DAISIE  7433870.0
2                 7        Great Crestwort  English  DAISIE  7425480.0
3                 8     Southern Crestwort  English  DAISIE  6096602.0
4                 9  Micheli's Balloonwort  English  DAISIE  5286305.0


dict_keys(['EPPO_names'])

In [8]:
gqf.get_usageKey("Pancratium maritimum")

'2853283'