# Common Name - Evaluation code
Author: 

### Fill out the cell below

In [1]:
eval_id = '2020v1'
notebook_name = "mSTSKx_2020v1.ipynb"
summary_name = 'stsk_r' # a short, memorable name to use for file names etc.

# Paths to occurrence record databases to use (from wildlife-wrangler).  Put paths as items in a tuple.
recent_dbs = ('P:/Proj3/USGap/Vert/USranges/2020v1/OccRecords/mstskx0GBIFr25GBIFf10.sqlite',)
historic_dbs = ('P:/Proj3/USGap/Vert/USranges/2020v1/OccRecords/mstskx0GBIFr26GBIFf10.sqlite',)

codeDir = 'T:/code/GAP-ranges/'
inDir = 'P:/Proj3/USGap/Vert/USRanges/2020v1/2001Ranges/'
outDir = 'P:/Proj3/USGap/Vert/USRanges/2020v1/Results/'
parameters_db = 'P:/Proj3/USGap/Vert/DBase/ranges-records.sqlite'
shucs_loc = 'P:/Proj3/USGap/Vert/Model/data/HucRng/Hucs'

There is a bug with mpl_toolkits, the following code is a temp fix, hopefully. https://stackoverflow.com/questions/52911232/basemap-library-using-anaconda-jupyter-notebooks-keyerror-proj-lib/54087410#54087410

In [2]:
import os
os.environ['PROJ_LIB'] = r'c:\Users\nmtarr\AppData\Local\Continuum\miniconda3\envs\wrangler\Library\share'

### General Setup  -  Nothing to fill out here.

In [3]:
%matplotlib inline
import sqlite3
import pprint
import pandas as pd
pd.set_option('display.width', 600)
pd.set_option('display.max_colwidth', 80)
pd.set_option('display.max_rows', 100)
from IPython.display import Image
import os
os.chdir(codeDir)
import range_functions as functions
os.chdir(outDir)
import matplotlib.pyplot as plt
os.environ['PROJ_LIB'] = r'c:\Users\nmtarr\AppData\Local\Continuum\miniconda3\envs\wrangler\Library\share'
from datetime import datetime
t1 = datetime.now()
connName = sqlite3.connect(recent_dbs[0])
gap_id = connName.execute("""SELECT value FROM species_concept WHERE attribute = "gap_id";""").fetchone()[0]
common_name = connName.execute("""SELECT value FROM species_concept WHERE attribute = "common_name";""").fetchone()[0]
sci_name = connName.execute("""SELECT value FROM species_concept WHERE attribute = "scientific_name";""").fetchone()[0]
del connName
eval_db = outDir + gap_id + eval_id + '.sqlite'

## Evaluation Parameters

Evaluation parameters need to be set and justified in the cells within this section.  Values that are entered here will be used to update cells within the 'evaluations' table stored in evaluations.sqlite. The decisions about what values to use are primarily documented here, not in the evaluations database.

Note that the evaluation ID and species' GAP code are set in the cell above, not below.  I am proposing that evaluation parameter sets also be documented as unique entities in a database (i.e, evaluations.sqlite).  Each evaluation can be given a unique id that can be used in documentation, file naming, and for the names of the columns that will be added to the GAP range table to record the results of the evaluation.  In this example, the evaluation_id is __tws2019__.

### Filter Sets
Notes:

In [4]:
ids = set([])
r_s = set([])
f_s = set([])
occ_dbs = recent_dbs + historic_dbs
for db in occ_dbs:
    connection = sqlite3.connect(db)
    req_id = connection.execute("""SELECT DISTINCT request_id FROM occurrences;""").fetchall()[0]
    filt_id = connection.execute("""SELECT DISTINCT filter_id FROM occurrences;""").fetchall()[0]
    r_s = r_s | set(req_id)
    f_s = f_s | set(filt_id)
    ids = ids | set(req_id) | set(filt_id)
    del connection
filters_request = list(r_s)
filters_post = list(f_s)
ids = tuple(ids)
filter_sets = ids[0]
for i in ids[1:]:
    filter_sets = filter_sets + ', ' + i
print(filter_sets)

GBIFr26, GBIFf10, GBIFr25


### Years
Justification: All available records with somewhat dependable locational information are desired.  GPS became decent around 2000.

In [5]:
start_year = 2000
end_year = 2020
years = str(list(range(start_year, end_year + 1, 1)))[1:-1]

### Months
Justification: Skunks are not migratory so all months are of interest.

In [6]:
months = "1,2,3,4,5,6,7,8,9,10,11,12"

### Evaluation Method
Justification: The restrictive nature of "proportion in polygon" is a good fit for the demonstration of the framework.

In [7]:
method = "proportion in polygon"

#### Error Tolerance
Justification: 20% is a rather modest requirement.  No indication that this species requires anything addressed with this parameter.

In [8]:
error_tolerance = 20

### Credits

In [9]:
creator = "Nathan Tarr"
date = datetime.now()

### Justification

In [10]:
justification = "See " + notebook_name

### Notes

In [11]:
notes = """"""

### Write to evaluations.sqlite

In [12]:
connjup = sqlite3.connect(parameters_db)
cursorjup = connjup.cursor()

# Make a row for species-evaluation
sqlrow = """INSERT OR IGNORE INTO evaluations ("evaluation_id", "species_id") VALUES (?, ?);"""
vals = [eval_id, gap_id]

# Filter sets
sqlfilters = """UPDATE evaluations SET filter_sets=? WHERE evaluation_id=? AND species_id=?;"""
vals = [filter_sets, eval_id, gap_id]
cursorjup.execute(sqlfilters, vals)

# Years
sqlyear = """UPDATE evaluations SET years=? WHERE evaluation_id=? AND species_id=?;"""
vals = [years, eval_id, gap_id]
cursorjup.execute(sqlyear, vals)

# Months
sqlmonths = """UPDATE evaluations SET months=? WHERE evaluation_id=? AND species_id=?;"""
vals = [months, eval_id, gap_id]
cursorjup.execute(sqlmonths, vals)

# Evaluation Method
sqlmethod = """UPDATE evaluations SET method=? WHERE evaluation_id=? AND species_id=?"""
vals = [method, eval_id, gap_id]
cursorjup.execute(sqlmethod, vals)

# Error Tolerance
sqltolerance = """UPDATE evaluations SET error_tolerance=? WHERE evaluation_id=? AND species_id=?"""
vals = [error_tolerance, eval_id, gap_id]
cursorjup.execute(sqltolerance, vals)

# Justification
sqljust = """UPDATE evaluations SET justification=? WHERE evaluation_id=? AND species_id=?"""
vals = [justification, eval_id, gap_id]
cursorjup.execute(sqljust, vals)

# Credits
sqlcreator = """UPDATE evaluations SET creator=? WHERE evaluation_id=? AND species_id=?"""
vals = [creator, eval_id, gap_id]
cursorjup.execute(sqlcreator, vals)

# Notes
sqlnotes = """UPDATE evaluations SET notes=? WHERE evaluation_id=? AND species_id=?"""
vals = [notes, eval_id, gap_id]
cursorjup.execute(sqlnotes, vals)

sqldate= """UPDATE evaluations SET date=? WHERE evaluation_id=? AND species_id=?"""
vals = [date, eval_id, gap_id]
cursorjup.execute(sqldate, vals)

connjup.commit()

### Evaluation Parameters
Display the record that was just written.

In [13]:
df1 = pd.read_sql_query(sql="SELECT * FROM evaluations WHERE evaluation_id='{0}' AND species_id='{1}'".format(eval_id, gap_id), con=connjup)
print(df1.loc[0])

evaluation_id                                                                               2020v1
species_id                                                                                  mstskx
years              2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012...
months                                                                  1,2,3,4,5,6,7,8,9,10,11,12
min_count                                                                                     None
error_tolerance                                                                                 20
method                                                                       proportion in polygon
filter_sets                                                              GBIFr26, GBIFf10, GBIFr25
justification                                                              See mSTSKx_2020v1.ipynb
creator                                                                                Nathan Tarr
date      

### Filter Set Info

In [14]:
for db in occ_dbs:
    filters2 = filters_request + filters_post
    connection = sqlite3.connect(db)
    for f in filters2:
        try:
            set_info = pd.read_sql(sql="""SELECT * FROM {0};""".format(f), con=connection)
            set_info = set_info.T.iloc[1:]
            set_info.rename({'0': ''}, inplace=True, axis=1)
            print(set_info)
            filters2.remove(f)
        except:
            pass
    del connection

                                                                                        0
request_id                                                                        GBIFr25
source                                                                               GBIF
lat_range                                                                            None
lon_range                                                                            None
years_range                                                                     2015,2020
months_range                                                                         1,12
geoissue                                                                            False
coordinate                                                                           True
country                                                                                US
geometry                                                                             None
creator   

# Occurrence Record Retrieval and Display
This repo is dependent upon the wildlife-wrangler repo because occurrence data is retrieved here from sqlite occurrence databases generated with the records wrangler repo.  In this section, a connection is established and records are filtered according to the evaluation parameters.  Keep in mind that the occurrence record databases were themselves created with filters so you have to be mindful of how parameters set here compare to ones set there. For example, dates of records included will be determined by whichever process had a more restrictive date range. 

The first step in using occurrence records to evaluate GAP range is to build a database to hold the GAP 12 digit HUCs, range for the species, and suitable occurrence records.  The database is also suitable for performing the necessary spatial queries.  The GAP range is retrieved from ScienceBase and the HUCs would be too if they were available as a shapefile.  

In [None]:
functions.make_evaluation_db(eval_db=eval_db, gap_id=gap_id, shucLoc=shucs_loc, inDir=inDir, outDir=outDir)

Copy records to the evaluation database, filtering out records from months or years that aren't wanted.

In [None]:
"""
if len(recent_dbs) == 1:
    recent_db = recent_dbs[0]
else:
    recent_db = functions.concat_dbs(recent_dbs)

if len(historic_dbs) == 1:
    historic_db = historic_dbs[0]
else:
    historic_db = functions.concat_dbs(historic_dbs)
    
# Load records into evaluation database
eval_db
"""

In [None]:
years = tuple([x.strip() for x in years.split(',')])
months = tuple([x.strip().zfill(2) for x in months.split(',')])
for occ_db in occ_dbs:
    print(occ_db)
    
    # Connect to the evaluation occurrence records database
    cursor, evconn = functions.spatialite(eval_db)

    # Attach occurrence database
    cursor.execute("ATTACH DATABASE ? AS occs;", (occ_db,))

    # Create table of occurrences that fit within evaluation parameters  --  IF EXISTS JUST APPEND
    if occ_db == occ_dbs[0]:
        cursor.execute("""CREATE TABLE evaluation_occurrences AS 
                       SELECT * FROM occs.occurrences 
                       WHERE STRFTIME('%Y', OccurrenceDate) IN {0} 
                       AND STRFTIME('%m', OccurrenceDate) IN {1};""".format(years, months))
    else:
        cursor.execute("""INSERT INTO evaluation_occurrences
                          SELECT * FROM occs.occurrences 
                          WHERE STRFTIME('%Y', OccurrenceDate) IN {0} 
                          AND STRFTIME('%m', OccurrenceDate) IN {1};""".format(years, months))
    
# Export occurrence circles as a shapefile (all seasons)
cursor.execute("""SELECT RecoverGeometryColumn('evaluation_occurrences', 'polygon_4326', 
                  4326, 'POLYGON', 'XY');""")
sql = """SELECT ExportSHP('evaluation_occurrences', 'polygon_4326', ?, 'utf-8');"""
subs = outDir + summary_name + "_circles"
cursor.execute(sql, (subs,))
'''
# Export occurrence 'points' as a shapefile (all seasons)
cursor.execute("""SELECT RecoverGeometryColumn('evaluation_occurrences', 'geom_xy4326', 
                  4326, 'POINT', 'XY');""")
subs = outDir + summary_name + "_points"
cursor.execute("""SELECT ExportSHP('evaluation_occurrences', 'geom_xy4326', ?, 'utf-8');""", (subs,))
'''
# Close db
evconn.commit()
evconn.close()

### Display Occurrence Points and GAP Range Map
A presence shapefile must be created for display because the sciencebase range shapefiles are seasonal, which is not the focus here.

In [None]:
gap_range = functions.download_GAP_range_CONUS2001v1(gap_id, inDir)

Display occurrence records over GAP range

In [None]:
gap_range2 = "{0}{1}_presence_4326".format(outDir, gap_id)

shp1 = {'file': gap_range2, 'column': None, 'alias': 'GAP range map - presence',
        'drawbounds': False, 'linewidth': .5, 'linecolor': 'm',
        'fillcolor': 'm', 'marker':'s'}

shp2 = {'file': '{0}{1}_circles'.format(outDir, summary_name), 'column': None,
        'alias': 'Occurrence records', 'drawbounds': True, 'linewidth': .75, 'linecolor': 'k',
        'fillcolor': None, 'marker':'o'}

# Display occurrence polygons
title="{1} ({0})".format(years[0] + " - " + years[-1], sci_name)
functions.MapShapefilePolygons(map_these=[shp1, shp2], title=title)

# GAP Known Range Data Evaluation
With all the data in a sqlite database with spatialite capabilities, we can perform the evaluation.

In [None]:
from importlib import reload
os.chdir(codeDir)
reload(functions)
os.chdir(outDir)

In [None]:
functions.compile_GAP_presence(eval_id=eval_id, gap_id=gap_id, eval_db=eval_db, cutoff_year=2015, parameters_db=parameters_db,
                         outDir=outDir, codeDir=codeDir)

In [None]:
connr = sqlite3.connect(eval_db)
df4 = pd.read_sql_query(sql="SELECT strHUC12RNG AS HUC12RNG, "
                                    "intGAPOrigin AS Origin, intGAPPresence AS Presence, "
                                    "intGAPReproduction AS Reproduction,"
                                    "intGAPSeason AS Season, eval_cnt, eval, "
                                    "validated_presence AS validated_pres FROM sp_range WHERE eval_cnt >=0", con=connr)
df4.set_index(["HUC12RNG"], inplace=True)
print("Tabular results of the evaluation")
print(df4)

In [None]:
print("Mapped results of the evaluation.")
shp3 = {'file': '{0}{1}_eval'.format(outDir, gap_id), 'column': 'eval',
        'alias': 'eval', 'column_colors': {1: 'b', 0: 'r'}, 
        'value_alias': {1:'Agreement', 0:'Disagreement'}, 'drawbounds': False, 
        'marker': "s"}
title="{0} -- {1}".format(common_name, eval_id)
functions.MapShapefilePolygons(map_these=[shp1, shp3], title=title)

In [None]:
dups0 = connr.execute("SELECT COUNT(occ_id) FROM evaluation_occurrences GROUP BY geom_xy4326, occurrenceDate;").fetchall()
dups1 = [x[0] for x in dups0]
dups2 = [x for x in dups1 if x > 1]
print(str(len(dups2)) + ' records were duplicates based on xy coordinate and date-time')

After occurrence circles are attributed to HUCs, the results can be recorded in the species' range map table in terms of whether the two data sets agreed and whether they validate the GAP range data for any HUCs. For each evaluation, a column is added for 1) how many records could be attributed to each huc and 2) whether there is agreement at that huc (1 for yes, 0 for no, 'None' for no data for that huc) and 3) whether the GAP range has been validated by the evaluation.

# Summary of Results

### How many records were available in the occurrence database?

In [None]:
count = connr.execute("SELECT COUNT(occ_id) FROM evaluation_occurrences;").fetchone()[0]
print(str(count) + " occurence records were suitable for this evaluation of the range.")

### How many of the records were attributable to a HUC?

In [None]:
hucable = connr.execute("SELECT SUM(eval_cnt) FROM sp_range WHERE eval_cnt >=0").fetchall()[0]
print(str(hucable[0]) + " records were attributable to a HUC.")

### How many hucs had records attributed to them?

In [None]:
containers = connr.execute("SELECT COUNT(eval_cnt) FROM sp_range WHERE eval_cnt >=0").fetchall()[0]
print(str(containers[0]) + " HUCs 'contained' records.")

### How many records were not used because of the minimum count?

In [None]:
ones = connr.execute("SELECT SUM(eval_cnt) FROM sp_range WHERE eval_cnt < ?", (min_count,)).fetchall()[0]
if ones[0] != None:
    print(str(ones[0]) + " HUCs had occurrences but were not validated because they didn't meet the minimum.")
else:
    print("None")

### How many HUCs were validated?

In [None]:
validated = connr.execute("SELECT COUNT(validated_presence) FROM sp_range WHERE eval = 1").fetchall()[0]
print(str(validated[0]) + " HUCs were validated.")

### How many HUCs did GAP appear to omit?

In [None]:
missed = connr.execute("SELECT COUNT(eval) FROM sp_range WHERE eval = 0".format(eval_id)).fetchall()[0]
print(str(missed[0]) + " HUCs were missed.")

### What was the maximum number of occurrences attributable to a single HUC?

In [None]:
maxi = connr.execute("SELECT MAX(eval_cnt) FROM sp_range").fetchall()[0]
print("The maximum number of records attributed to a HUC was " + str(maxi[0]))

### Runtime

In [None]:
t2 = datetime.now()
print(t2 - t1)

# Next Steps
This is just a starting point that needs scrutiny.  It is currently hard-coded for a single species, so deploying it would require redesigning to accomodate large numbers of species, multiple users, many more occurrence records, optimal methods for evaluation and range delineation among other things.  