In [1]:
"""
Reporting Gap Analysis
Author: Liam Megraw, RIT Envirionmental Science Technician
Date last edited: 7/20/2023
ESRI ArcGIS Pro Version 2.7
Default Python 3.x kernel

Description:
This code processes uses results from the RIT-developed computer 
vision model and iMapInvasives records to identify gaps in reporting
on a per-species basis. The first part compares all iMap and model 
records for the species of interest at the same time, while the 
second part compares them on a per-species basis.

Inputs required, stored in one geodatabase at the same projected coordinate reference system (CRS):
> Single point dataset of model prediction points with n species each
  having their own confidence score column (name must be input two cells below)
> 1 km grid for aoi named "reporting_analysis_grid_empty" (or alternatively 
created by uncommenting the 6th notebook cell)
> 7 iMapInvasives datasets for n species (name should follow default 
convention in all caps below)
    > PRESENCE_POINT (confirmed) 
    > PRESENCE_LINE (confirmed)
    > PRESENCE_POLYGON (confirmed)
    > PRESENCE_POINT_UNCONFIRMED
    > PRESENCE_LINE_UNCONFIRMED
    > PRESENCE_POLYGON_UNCONFIRMED
    > NOT_DETECTED_POLYGON
> n point datasets of model presence predictions at a threshold for 
the per-species approach. The name should follow this convention: 
SVI_Project_presences_species_threshold. For example, 
SVI_Project_presences_phrag_precision. Species names need to match 
the shortnames in the dictionary in notebook cell 4.

Scaling:
If the model is expanded to additional species, they will have to be 
added into the dictionary in cell 4. If these additional species have 
limited date ranges where results should be interpreted, that exception 
will need to be added into the date constraint dictionary in cell 4.

Outputs:
The final outputs are two polygon layers at a 1 km resolution with 
overall and per-species attributes detailing the type of records 
within a cell (model only, iMap only, or both), and if there is 
overlap, a comparison value between the two types of records.

How to Use:
These layers can be hosted on the ArcGIS Online Public and Manager Dashboards.
"""

'\nReporting Gap Analysis\nAuthor: Liam Megraw, RIT Envirionmental Science Technician\nDate last edited: 7/19/2023\nESRI ArcGIS Pro Version 2.7\nDefault Python 3.x kernel\n\nDescription:\nThis code processes uses results from the RIT-developed computer \nvision model and iMapInvasives records to identify gaps in reporting\non a per-species basis. The first part compares all iMap and model \nrecords for the species of interest at the same time, while the \nsecond part compares them on a per-species basis.\n\nInputs required, stored in one geodatabase at the same projected coordinate reference system (CRS):\n> Single point dataset of model prediction points with n species each\n  having their own confidence score column (name must be input two cells below)\n> 1 km grid for aoi named "reporting_analysis_grid_empty" (or alternatively \ncreated by uncommenting the 6th notebook cell)\n> 7 iMapInvasives datasets for n species (name should follow default \nconvention in all caps below)\n    > 

In [2]:
"""
Pseudocode Overview

Assign workspace and input files/parameters
Create lists of input files
    iMap: point, line, polygon
    model: point
Define function to create field mappings for spatial joins
(Optionally) create state-wide fishnets at 1 km resolution

Thresholdless approach
    Effectively, for both model data and iMap data, and each imap geometry type:
            Spatial join records to fishnet
            Add & calculate fields
                Total join count for that species
                Overlap type if statement:
                    Cells where model data join count is above zero and iMap join count above zero: both (i.e., overlap)
                    Cells where model data join count is above zero and iMap join count is zero: (i.e, model only)
                    Cells where model data join count is zero and iMap join count is above zero (i.e., iMap only)
                Calculate comparison between model and iMap
Thresholded (per-species) approach
    For each species:
        Export iMap records to layers split by species, record type, and geometry
        Spatially join records to fishnet
        Calculate ratios and percentiles for each comparison type
Export results

"""

'\nPseudocode Overview\n\nAssign workspace and input files/parameters\nCreate lists of input files\n    iMap: point, line, polygon\n    model: point\nDefine function to create field mappings for spatial joins\n(Optionally) create state-wide fishnets at 1 km resolution\n\nThresholdless approach\n    Effectively, for both model data and iMap data, and each imap geometry type:\n            Spatial join records to fishnet\n            Add & calculate fields\n                Total join count for that species\n                Overlap type if statement:\n                    Cells where model data join count is above zero and iMap join count above zero: both (i.e., overlap)\n                    Cells where model data join count is above zero and iMap join count is zero: (i.e, model only)\n                    Cells where model data join count is zero and iMap join count is above zero (i.e., iMap only)\n                Calculate comparison between model and iMap\nThresholded (per-species) approa

In [3]:
# #----- Get and set workspace to gdb -----
import arcpy
from arcpy import env
import os
def set_workspace():
    while True:
        gdb = input('Enter absolute geodatabase path: ')
        if os.path.exists(gdb):
            return gdb
        else:
            print("Geodatabase path incorrect")
    
arcpy.env.workspace = set_workspace()
cws = arcpy.env.workspace
arcpy.env.OverwriteOutput = True

def assign_layer_name(prompt):
   cws = arcpy.env.workspace
   while True:
      layer_name = input(prompt)
      if arcpy.Exists(os.path.join(cws, layer_name)):
         return layer_name
      else:
         print("No file of this name exists in your geodatabase, please verify.")

# Define necessary input files
# Each species must have their own column, named according to the dictionary in the cell below

model_pred = assign_layer_name(prompt="Enter model prediction dataset name: ")

# The value is used as a suffix and as a field in 
# the final per-species viewing layer
threshold = input('Enter threshold name (all lowercase): ')

def check_rag():
    rag_exist = input("Do you have a reporting analysis grid feature? (y/n):")
    if rag_exist == "y":
        def name_rag():
            default_choice = input("Is your grid named 'reporting_analysis_grid_empty'? (y/n): ")
            if default_choice == "y":
                rag_initial = 'reporting_analysis_grid_empty'
                if arcpy.Exists(os.path.join(cws, rag_initial)):
                    return rag_initial
                else:
                    print("No file of this name exists in your geodatabase; please verify.")
                    name_rag()
            elif default_choice == "n":
                rag_initial = input("Enter reporting grid name: ")
                if arcpy.Exists(os.path.join(cws, rag_initial)):
                    return rag_initial
                else:
                    print("No file of this name exists in your geodatabase; please verify.")
                    name_rag()
            else:
                print("Incorrect entry, please try again")
                name_rag()
        return name_rag()
    elif rag_exist == "n":
        print("Use the 'create fishnet' tool to create a grid for your aoi and then re-run this code.")
    else:
        print("Incorrect entry, please try again")
        check_rag()

rag_empty = check_rag()
print("Empty reporting grid name:", rag_empty)

Enter absolute geodatabase path: C:\Users\ltmsbi\Documents\ArcGIS\Projects\Final_Deployment\Final_Deployment.gdb
Enter model prediction dataset name: pred_finalDeployment_all
Enter threshold name (all lowercase): precision
Do you have a reporting analysis grid feature? (y/n):y
Is your grid named 'reporting_analysis_grid_empty'? (y/n): y
Empty reporting grid name: reporting_analysis_grid_empty


In [5]:
# Create dictionary of long names
# Names used for filtering in ArcGIS Online
species_fullnames = {
    "phrag": "'Phragmites, Unspecified'", # extra sinlge quotes are intentional since these are used in a field calculation
    "knot": "'Knotweed, Unspecified'",
    "wp": "'Wild Parsnip'",
    "toh": "'Tree-of-Heaven (Ailanthus)'",
    "pl": "'Purple Loosestrife'"
}

# Extract only the keys to a list
species_shortnames = list(species_fullnames.keys())

# List shortnames of species and their date constraints
date_constraint_where_clauses = {
    "wp": "date LIKE '%-05' Or date LIKE '%-06' Or date LIKE '%-07'",
    "toh": "date LIKE '%-07' Or date LIKE '%-08' Or date LIKE '%-09' Or date LIKE '%-10'",
    "pl": "date LIKE '%-07' Or date LIKE '%-08' Or date LIKE '%-09' Or date LIKE '%-10'"
}
date_constrained_species = list(date_constraint_where_clauses.keys())

# IDs that iMap assigns to the various species of interest
jurisdiction_ids = {
    "phrag": 1277,
    "wp": 1182,
    "pl": 1265,
    "toh": 1167,
    "knot": (1074, 1191, 1278, 1479) # Includes Japanese knotweed, giant knotweed, bohemian knotweed, and knotweed species unknown
}

In [14]:
# Define functions for use in one or both layers

def create_SJ_FieldMappings(targetLayer, joinLayer): # Return field mappings for spatial joins when called 
    
    # List starting fields for spatial joins that'll be updated with each successive join
    keepFields = list()
    omitFields = ["OBJECTID", "Shape", "Shape_Area", "Shape_Length"]

    for field in arcpy.ListFields(targetLayer):
        if field.name not in omitFields:
            keepFields.append(field.name)
    fieldMappings = arcpy.FieldMappings() # Create field mapping variable; this will store all field mappings

    # Create list of field names to keep in the output file
    targetTable = []
    for i in arcpy.ListFields(targetLayer):
        if i.name in (keepFields):
            targetTable.append(i.name)

    # List of input feature classes for the spatial join
    f = [targetLayer, joinLayer]

    for k in targetTable: # loop through main table
        #print("Field: ",k)
        fieldMap = arcpy.FieldMap() # create an empty field map variable
        fieldMap.addInputField(targetLayer,k) # insert the target layer as the first input into the field map
        for feature in f: # loop through feature classes
            for field in arcpy.ListFields(feature): # loop through field of each feature class
                if k in field.name: # check if any field matches with our target field then append it as an input field
                    fieldMap.addInputField(feature,field.name) 
        fieldMappings.addFieldMap(fieldMap) # add the current field map to the main field map variable
    return(fieldMappings)

def generate_where_clauses(type, species=""): # Return a dictionary of where clauses to select records for calculations when called 
    if type == "THRESHOLDED":
        l_suffix = "_"+species
        points = "model"+l_suffix
        extras = ["",")"]
    elif type == "NOT_THRESHOLDED":
        l_suffix = ""
        points = "model_points"
        extras = [" And imap_nd = 0", " Or imap_nd > 0)"]

    iMap_cnfrm = "imap_cnfrm"+l_suffix
    iMap_uncnfrm = "imap_uncnfrm"+l_suffix
    iMap_nd = "imap_nd"+l_suffix
    # Handle exception for species with date-constrained panorama interpretation
    if species in date_constrained_species:
        model_points = "model_"+species+"_possible"
    else:
        model_points = "model_points"
    model_positives = "model"+l_suffix # Unused name in the thresholdless version

    whereClauses = { # Define SQL queries used to select records
                # These will be set to negative integers
                "model-only_conf": points+" > 0 And "+iMap_cnfrm+" = 0"+extras[0], 
                "model-only_unconf": points+" > 0 And "+iMap_cnfrm+" = 0 And "+iMap_uncnfrm+" = 0"+extras[0],
                # These will be set to 0
                "iMap-only_conf": points+" = 0 And ("+iMap_cnfrm+" > 0"+extras[1], 
                "iMap-only_unconf": points+" = 0 And ("+iMap_cnfrm+" > 0 Or "+iMap_uncnfrm+" > 0"+extras[1],
                # These will be set to positive floats
                "Overlap_conf": points+" > 0 And ("+iMap_cnfrm+" > 0"+extras[1], 
                "Overlap_unconf": points+" > 0 And ("+iMap_cnfrm+" > 0 Or "+iMap_uncnfrm+" > 0"+extras[1],
                # Cells with neither record type will retain a null designation during calculation
                }

    if type == "THRESHOLDED": # Add extra conditions
        whereClauses["model-only_nd"] = model_points+" > 0 And "+model_points+" >= "+model_positives+" And "+iMap_nd+" = 0"
        whereClauses["iMap-only_nd"] = model_points+" = 0 And "+iMap_nd+" > 0"
        whereClauses["Overlap_nd"] = model_points+" > 0 And "+model_points+" >= "+model_positives+" And "+iMap_nd+" > 0"
    
    return(whereClauses) # Return dictionary of where clauses when called

def generate_calc_field_dict(type, species=""): # Return a dictionary of fields to assign calculated value 
    fields_dict = generate_where_clauses(type, species) # Create a reference to the whereClause dict
    
    # Update dict entries with the fields where future-calculated values should be stored
    if type == "THRESHOLDED":
        s_suffix = "_"+species
        for field in ["model-only_nd", "iMap-only_nd", "Overlap_nd"]:
            fields_dict[field] = "NDc"+s_suffix
    elif type == "NOT_THRESHOLDED":
        s_suffix = "_overall"
    for field in ["model-only_conf", "iMap-only_conf", "Overlap_conf"]:
        fields_dict[field] = "Cc"+s_suffix
    for field in ["model-only_unconf", "iMap-only_unconf", "Overlap_unconf"]:
        fields_dict[field] = "CUc"+s_suffix
    
    return(fields_dict)

def generate_calc_expressions(type, species=""): # Return a dictionary of expressions to calculate comparison values 
    exp_dict = generate_where_clauses(type, species) # Create reference to main whereClause dict
    
    if type == "THRESHOLDED":
        l_suffix = "_"+species
        points = "!model"+l_suffix+"!"
        extra = ""
        model_positives = "!model"+l_suffix+"!"
        if species in date_constrained_species:
            model_possible = "!model_"+species+"_possible!"
        else:
            model_possible = "!model_points!"
    elif type == "NOT_THRESHOLDED":
        l_suffix = ""
        points = "!model_points!"
        extra = "+ !imap_nd!"
        
    # Generate proper layer names to reference in calculations 
    iMap_cnfrm = "!imap_cnfrm"+l_suffix+"!"
    iMap_uncnfrm = "!imap_uncnfrm"+l_suffix+"!"
    iMap_nd = "!imap_nd"+l_suffix+"!"
        

    # Update dictionary with the values being the expression used to calculate the field determined by generate_calc_field_dict()
    for field in ["model-only_conf", "model-only_unconf"]:
        exp_dict[field] = "-"+points
    for field in ["iMap-only_conf", "iMap-only_unconf"]:
        exp_dict[field] = "0"
    exp_dict["Overlap_conf"] = points+"/("+iMap_cnfrm+extra+")" # Effectively, "extra" adds imap_nd if thresholdless and doesn't for thresholded
    exp_dict["Overlap_unconf"] = points+"/("+iMap_cnfrm+" + "+iMap_uncnfrm+extra+")" # Do the same as above line including unconfirmed in the calc
    if type == "THRESHOLDED":
        exp_dict["iMap-only_nd"] = "0"
        exp_dict["model-only_nd"] = "-("+model_possible+" - "+model_positives+")"
        exp_dict["Overlap_nd"] = "("+model_possible+" - "+model_positives+")/"+iMap_nd
    
    return(exp_dict)

def add_calc_fields(dataset, calc_field_dict):
    # Get a unique list of fields by first converting to a set that only contains the unique values
    unique_comp_field_names = list(set(calc_field_dict.values()))
    # Get list of existing fields
    field_objs = arcpy.ListFields(dataset)
    field_names = list()
    for field in field_objs:
        field_names.append(field.name)
    for comp_field in unique_comp_field_names:
        if comp_field not in field_names:
            arcpy.management.AddField(dataset, comp_field, "FLOAT", field_alias=comp_field)

# Delete temporary files
# This way is necessary to delete the feature itself and not just its contents
def delete_tmp_features(feature_list):
    import os
    cws = arcpy.env.workspace
    for f in feature_list:
      f_path = os.path.join(cws, f)
      if arcpy.Exists(f_path):
        arcpy.Delete_management(f_path)

In [7]:
# # Code to create a fishnet for the state if you do not already have one
# # New york state boundary coordinates in UTM Zone 18N projection 
# # (Coordinates are expressed in the order of x-min, y-min, x-max, y-max)
# aoi = ['4,481,032.099500 105,606.381800 4,985,489.904000 770,761.900100'] 
# cellsize = '1' # The width and height argument for the fishnet function
# fishnet_output_name = rag_empty
# # Create fishnet 
# arcpy.management.CreateFishnet(fishnet_output_name, '4,985,489.904000 105,606.381800', '4,481,032.099500 105,606.381800', cellsize, cellsize, '0', '0', {corner_coord}, 'NO_LABELS', aoi, 'POLYGON')

# For Thresholdless Only

In [13]:
# For thresholdless reporting analysis
# Create empty lists to ultimately populate a dataframe
out_features = []
join_features = []
field_names = []
tmp_ps_joins = []

# Assign names for overall model prediction join
# These join counts are used for species with no panorama date restirctions on interpretation
out_features.append("tmpRAG_model")
join_features.append(model_pred)
field_names.append("model_points")
# Assign names for joins of species with panorama date restrictions on interpretation
for species in date_constrained_species:
    print(species)
    ps_join_feature = "tmp_model_pred_"+species
    tmp_ps_joins.append(ps_join_feature)
    if species == "wp":
        whereClause = ("date LIKE '%-05' Or date LIKE '%-06' Or date LIKE '%-07'"
                    )
    if species == "toh":
        whereClause = ("date LIKE '%-07' Or date LIKE '%-08' Or date LIKE '%-09' Or date LIKE '%-10'"
                    )
    if species == "pl":
        whereClause = ("date LIKE '%-07' Or date LIKE '%-08' Or date LIKE '%-09' Or date LIKE '%-10'"
                    ) 
    print(whereClause)
    ps_sel = arcpy.management.SelectLayerByAttribute(model_pred, "NEW_SELECTION", whereClause)
    arcpy.management.CopyFeatures(ps_sel, ps_join_feature)
    arcpy.management.SelectLayerByAttribute(model_pred, "CLEAR_SELECTION")

    out_features.append("tmpRAG_model_"+species)
    join_features.append(ps_join_feature)
    del ps_join_feature
    field_names.append("model_"+species+"_possible")

# Assign field names for iMap joins and to delete later
geo_names = {"point": "POINT",
             "line": "LINE",
             "poly": "POLYGON"
}
geometries = geo_names.keys()
type_names = {"cnfrm": "",
              "uncnfrm": "_UNCONFIRMED",
              "nd": ""
}
types = type_names.keys()
tmpCnfrm = list()
tmpUncnfrm = list()
for record_type in types:
    for geometry in geometries:
        if record_type == "nd":
            if geometry == "poly":
                imap_prefix = "NOT_DETECTED"
                imap_suffix = "nd"
            else:
                continue
        else:
            imap_prefix = "PRESENCE"
            imap_suffix = geometry+"_"+record_type
        out_features.append("tmpRAG_"+imap_suffix)
        join_features.append(imap_prefix+"_"+geo_names[geometry]+type_names[record_type])
        fname = "imap_"+imap_suffix
        # Add to list to delete
        field_names.append(fname)
        # Add to lists for summing calculations
        if record_type == "cnfrm":
            tmpCnfrm.append("!"+fname+"!")
        elif record_type == "uncnfrm":
            tmpUncnfrm.append("!"+fname+"!")
        del fname
# Combine/create lists to put in the dataframe organizing spatial join inputs/outputs
target_features = [rag_empty,]
for out_feature in out_features:
    target_features.append(out_feature)

zipped = list(zip(target_features, join_features, out_features, field_names))
import pandas as pd
ps_name_df = pd.DataFrame(zipped, columns=['Target_Feature', 'Join_Feature', 'Out_Feature', 'Field_Name'])
del target_features, join_features, out_features, zipped
# Re-name the last output feature name in the dataframe
RAG_tl = "reporting_analysis_grid_thresholdless"
ps_name_df.at[len(ps_name_df)-1, "Out_Feature"]=RAG_tl
ps_name_df

for i in range(0,len(ps_name_df)):
    target_feature = ps_name_df.at[i, "Target_Feature"]
    join_feature = ps_name_df.at[i, "Join_Feature"]
    out_feature = ps_name_df.at[i, "Out_Feature"]
    field_name = ps_name_df.at[i, "Field_Name"]
    fm = create_SJ_FieldMappings(target_feature, join_feature) # Create the field mappings for the join
    arcpy.analysis.SpatialJoin(target_feature, join_feature, out_feature, "JOIN_ONE_TO_ONE", "KEEP_ALL", fm) # Count the features within each grid cell
    arcpy.management.AlterField(out_feature, "JOIN_COUNT", field_name, field_name) # Rename join_count field
    arcpy.management.DeleteField(out_feature, "TARGET_FID") # Delete unnecessary field
    del target_feature, join_feature, out_feature, field_name, fm

# Calculate the total number of iMap features joined
print("Calculating total iMap features joined")

cName = "iMap_cnfrm"
uName = "iMap_uncnfrm"
arcpy.management.AddFields(RAG_tl, [
    [cName, 'SHORT'],
    [uName, 'SHORT']
])

tmpCnfrm = tuple(tmpCnfrm)
print("Summing:",tmpCnfrm)
arcpy.management.CalculateField(RAG_tl, cName, tmpCnfrm[0]+"+"+tmpCnfrm[1]+"+"+tmpCnfrm[2])
print("Summing:",tmpUncnfrm)
tmpUncnfrm = tuple(tmpUncnfrm)
arcpy.management.CalculateField(RAG_tl, uName, tmpUncnfrm[0]+"+"+tmpUncnfrm[1]+"+"+tmpUncnfrm[2])

print("Calculating comparison values")
type = "NOT_THRESHOLDED"
fieldDict = generate_calc_field_dict(type)
expDict = generate_calc_expressions(type)
wCs = generate_where_clauses(type)
add_calc_fields(RAG_tl, fieldDict)

for key in wCs: # Loop to calculate the reporting analysis values
    sel = arcpy.management.SelectLayerByAttribute(RAG_tl, "NEW_SELECTION", wCs[key]) # Make the selection
    arcpy.management.CalculateField(sel, fieldDict[key], expDict[key]) # Calculate values in field based on expression
    arcpy.management.SelectLayerByAttribute(RAG_tl, "CLEAR_SELECTION")
    del sel

# Remove fields we want to keep from the list of fields to delete
field_names.remove("model_points")
field_names.remove("imap_nd")
for species in date_constrained_species:
    field_names.remove("model_"+species+"_possible")
# Delete the now unnecessary per-geometry fields
for field in field_names:
    arcpy.management.DeleteField(RAG_tl, field)
    
del type, fieldDict, expDict, wCs, field_names, tmpCnfrm, tmpUncnfrm

print("Done!")

wp
toh
pl
Calculating total iMap features joined
Summing: ('!imap_point_cnfrm!', '!imap_line_cnfrm!', '!imap_poly_cnfrm!')
Summing: ['!imap_point_uncnfrm!', '!imap_line_uncnfrm!', '!imap_poly_uncnfrm!']
Calculating comparison values
Done!


In [18]:
# Create list of all but the last output feature to delete
tmp_out_features = ps_name_df["Out_Feature"][:-1].to_list()
# Delete temporary files
delete_tmp_features(tmp_out_features)
delete_tmp_features(tmp_ps_joins)

# For Species-Based Approach

In [7]:
# Separate out iMap records by species and geometry

# Add model data layer names to processing list
# Create list of n + n*3 files to process, where n is the number of species
ps_records = [] # empty list to append per-species records onto
# Add names to list
for n in range(0,len(species_shortnames)):
    # Add n items for model data 
    ps_records.append('SVI_Project_presences_'+species_shortnames[n]+"_"+threshold)

    
# Define list of geometries in the iMap data
imap_geometries = ["POINT", "LINE", "POLYGON"]
# Define imap records types
imap_record_types = {
    "cnfrm": "_Conf", # The "suffix" for confirmed records is blank
    "uncnfrm": "_Unconf"
}
# Add n*2*3 + n items for iMap data (accounts for 2 record and 3 geometry types, 
# plus not-detected records) 
for n in range(0,len(species_shortnames)):
    # Set up per-species query for selecting by attribute
    if species_shortnames[n] is "knot":
        # Set initial SQL query
        idClause = "jurisdiction_species_id = "+str(jurisdiction_ids["knot"][0])
        # Add more conditions to query
        for ID in jurisdiction_ids["knot"][1:]:
            idClause = idClause + " Or jurisdiction_species_id = " + str(ID)
    else:
        idClause = "jurisdiction_species_id = "+str(jurisdiction_ids[species_shortnames[n]]) 
    
    # Copy a per-species subset of not detected polygons
    sel = arcpy.management.SelectLayerByAttribute("NOT_DETECTED_POLYGON", "NEW_SELECTION", idClause)
    imap_nd = "iMap_nd_"+species_shortnames[n]
    arcpy.management.CopyFeatures(sel, imap_nd)
    ps_records.append(imap_nd)
    # Remove variables and selections to save memory
    del sel, imap_nd
    arcpy.management.SelectLayerByAttribute("NOT_DETECTED_POLYGON", "CLEAR_SELECTION")
    
    # Copy a per-species subset for each record and geometry type
    for rt in range(0,len(imap_record_types)):
        if rt == 0: # Confirmed
            rt_suffix = ""
        elif rt == 1: # Unconfirmed
            rt_suffix = "_UNCONFIRMED"
        for g in range(0,len(imap_geometries)):
            sel = arcpy.management.SelectLayerByAttribute("PRESENCE_"+imap_geometries[g]+rt_suffix, "NEW_SELECTION", idClause)
            imap_subset = "iMap_"+imap_geometries[g].lower()+"_"+list(imap_record_types.keys())[rt]+"_"+species_shortnames[n]
            arcpy.management.CopyFeatures(sel, imap_subset)
            ps_records.append(imap_subset)
            # Remove variables and selections to save memory
            del sel, imap_subset
            arcpy.management.SelectLayerByAttribute("PRESENCE_"+imap_geometries[g], "CLEAR_SELECTION")
        
print("Number of layers to process: "+str(len(ps_records)))
print(ps_records)

del imap_geometries, imap_record_types

Number of layers to process: 40
['SVI_Project_presences_phrag_precision', 'SVI_Project_presences_knot_precision', 'SVI_Project_presences_wp_precision', 'SVI_Project_presences_toh_precision', 'SVI_Project_presences_pl_precision', 'iMap_nd_phrag', 'iMap_point_cnfrm_phrag', 'iMap_line_cnfrm_phrag', 'iMap_polygon_cnfrm_phrag', 'iMap_point_uncnfrm_phrag', 'iMap_line_uncnfrm_phrag', 'iMap_polygon_uncnfrm_phrag', 'iMap_nd_knot', 'iMap_point_cnfrm_knot', 'iMap_line_cnfrm_knot', 'iMap_polygon_cnfrm_knot', 'iMap_point_uncnfrm_knot', 'iMap_line_uncnfrm_knot', 'iMap_polygon_uncnfrm_knot', 'iMap_nd_wp', 'iMap_point_cnfrm_wp', 'iMap_line_cnfrm_wp', 'iMap_polygon_cnfrm_wp', 'iMap_point_uncnfrm_wp', 'iMap_line_uncnfrm_wp', 'iMap_polygon_uncnfrm_wp', 'iMap_nd_toh', 'iMap_point_cnfrm_toh', 'iMap_line_cnfrm_toh', 'iMap_polygon_cnfrm_toh', 'iMap_point_uncnfrm_toh', 'iMap_line_uncnfrm_toh', 'iMap_polygon_uncnfrm_toh', 'iMap_nd_pl', 'iMap_point_cnfrm_pl', 'iMap_line_cnfrm_pl', 'iMap_polygon_cnfrm_pl', 'iMap

## Create analysis-ready version (1 layer with fields for all species)

In [9]:
# Spatial join repeatedly
input_grid = RAG_tl
tmpFeatures = list()
# Used for tracking iterations which enables field calculations to be performed after 
# confirmed and unconfirmed iMap records of all geometries are spatially joined
counter = 1
# Effectively, for iMap and model data, each species, and geometry/record type (if iMap data):
for i, r in enumerate(ps_records):
    # Define field naming convention
    if "SVI" in r:
        # Slice to only retain the species and create new field name
        species = r[22:-10]
        field_name = "model_"+species
    else:
        field_name = r
    print(field_name)
    fm = create_SJ_FieldMappings(input_grid, r) # Create the field mappings for this join

    # Perform the spatial join & clean the table
    out_grid = "tmp_grid_"+str(i)
    tmpFeatures.append(out_grid) # add to list for deleting later
    arcpy.analysis.SpatialJoin(input_grid,r,out_grid,"JOIN_ONE_TO_ONE","KEEP_ALL",fm)
    del fm
    # Rename Join_Count field
    arcpy.management.AlterField(out_grid, "Join_Count", field_name, field_name)
    # Delete TARGET_FID field
    arcpy.management.DeleteField(out_grid, "TARGET_FID")
    
    # Sum records only after confirmed or unconfirmed iMap records of all geometries are spatially joined
    if i >= len(species_shortnames): # Check if our iteration index is past processing model data
        # Determine the oscillation
        if "nd" in r:
            x = 4
        elif "point_uncnfrm" in r:
            x = 3
        # Effectively oscillate between running every 4th and 3rd entry, respectively 
        if (counter == x): # Check if the counter is at the oscillating xth entry we're targeting
            counter = 0 # Reset the count

            sum_fields = ps_records[i-2:i+1] # Select three fields of interest to sum
            sum_name = "iMap_"+r[13:] # Create desired field name
            express = "!"+sum_fields[0]+"!+!"+sum_fields[1]+"!+!"+sum_fields[2]+"!" # Create expression for calculating sum 
            arcpy.management.AddField(out_grid, sum_name, "SHORT") # Create the field
            arcpy.management.CalculateField(out_grid,sum_name,express) # Sum the fields
            for field in sum_fields: # Loop through the fields used when summing
                arcpy.management.DeleteField(out_grid, field) # Delete the field from the feature since it's no longer necessary
        counter += 1 # Add 1 to the count
    input_grid = "tmp_grid_"+str(i) # Make the next run's target grid the output of this iteration's spatial join
    if i == len(ps_records)-1: # Check if the last iteration
        RAG_final = "SVI_Proj_reporting_analysis_grid_"+threshold
        arcpy.management.CopyFeatures(out_grid, RAG_final) # Copy to a new feature
        del counter, input_grid, out_grid

# Delete unmerged presence points
delete_tmp_features(tmpFeatures)
del tmpFeatures

print("All done")

model_phrag
model_knot
model_wp
model_toh
model_pl
iMap_nd_phrag
iMap_point_cnfrm_phrag
iMap_line_cnfrm_phrag
iMap_polygon_cnfrm_phrag
iMap_point_uncnfrm_phrag
iMap_line_uncnfrm_phrag
iMap_polygon_uncnfrm_phrag
iMap_nd_knot
iMap_point_cnfrm_knot
iMap_line_cnfrm_knot
iMap_polygon_cnfrm_knot
iMap_point_uncnfrm_knot
iMap_line_uncnfrm_knot
iMap_polygon_uncnfrm_knot
iMap_nd_wp
iMap_point_cnfrm_wp
iMap_line_cnfrm_wp
iMap_polygon_cnfrm_wp
iMap_point_uncnfrm_wp
iMap_line_uncnfrm_wp
iMap_polygon_uncnfrm_wp
iMap_nd_toh
iMap_point_cnfrm_toh
iMap_line_cnfrm_toh
iMap_polygon_cnfrm_toh
iMap_point_uncnfrm_toh
iMap_line_uncnfrm_toh
iMap_polygon_uncnfrm_toh
iMap_nd_pl
iMap_point_cnfrm_pl
iMap_line_cnfrm_pl
iMap_polygon_cnfrm_pl
iMap_point_uncnfrm_pl
iMap_line_uncnfrm_pl
iMap_polygon_uncnfrm_pl
All done


In [15]:
# Calculate per-species ND ratios, confirmed ratios, unconfirmed ratios, and rato percentiles

for species in species_shortnames:
    print(species)
    print("***Checking key names")
    type = "THRESHOLDED"
    fieldDict = generate_calc_field_dict(type, species)
    expDict = generate_calc_expressions(type, species)
    wCs = generate_where_clauses(type, species)
    a = len(fieldDict)
    b = len(expDict)
    c = len(wCs)
    if a == b and b == c:
        pass
    else:
        print("******Error in key names of one or more of fieldDict, expDict, and wCs")
        quit()
    print("***Adding comparison fields")
    add_calc_fields(RAG_final, fieldDict)
    print("***Calculating comparison values")
    for key in wCs: # Loop to calculate the reporting analysis values
        sel = arcpy.management.SelectLayerByAttribute(RAG_final, "NEW_SELECTION", wCs[key]) # Make the selection
        arcpy.management.CalculateField(sel, fieldDict[key], expDict[key]) # Calculate values in field based on expression
        arcpy.management.SelectLayerByAttribute(RAG_final, "CLEAR_SELECTION")
        del sel
    del type, fieldDict, expDict, wCs

    # Calculate percentile field that describes the percentile of model-only record magnitude and imap-model overlap ratio ----------
    import scipy
    from scipy.stats import percentileofscore # For calculating the percentile of each value
    filters = [" < 0", " > 0"]
    comp_types = ["Cc_", "CUc_", "NDc_"]
    for comp_type in comp_types: # For each comparison type (Confirmed, Unconfirmed, Not detected)
        ratio_field = comp_type+species
        pct_field = "pct_"+ratio_field
        arcpy.management.AddField(RAG_final, pct_field, "FLOAT")
        for filter in filters: # For (effectively) both the overlap and model-only sets
            rga_values = [abs(row.getValue(ratio_field)) for row in arcpy.SearchCursor(RAG_final, ratio_field+filter)] # Return a list of all reporting gap analysis magnitudes (only positives or negatives, no nulls)
            cursor = arcpy.UpdateCursor(RAG_final, ratio_field+filter)
            if filter is " < 0":
                x = -1 # Make model-only percentiles negative
            else:
                x = 1 # Keep overlap percentiles positive
            for i, row in enumerate(cursor): # For each row in the filtered dataset
                pct_v = x*scipy.stats.percentileofscore(rga_values, abs(row.getValue(ratio_field)), kind='mean') # Calculate the percentile of the reporting gap analysis value
        #         if i < 20: # View the first 20 ratio and percentile values
        #             print("r ",abs(row.getValue(ratio_field)))
        #             print("p ",pct_v)
                row.setValue(pct_field, pct_v) # Assign the percentile of that reporting gap analysis value
                cursor.updateRow(row) # As far as I understand, this locks in that edit
                del pct_v
            del rga_values, cursor
        # Add zeroes to cells where there's only iMap data
        cursor = arcpy.UpdateCursor(RAG_final, ratio_field+" = 0")
        for row in cursor:
            row.setValue(pct_field, 0)
            cursor.updateRow(row)
        del cursor
del filters
print("Analysis version of the reporting gap analysis complete")

phrag
***Checking key names
***Adding comparison fields
***Calculating comparison values
knot
***Checking key names
***Adding comparison fields
***Calculating comparison values
wp
***Checking key names
***Adding comparison fields
***Calculating comparison values
toh
***Checking key names
***Adding comparison fields
***Calculating comparison values
pl
***Checking key names
***Adding comparison fields
***Calculating comparison values
Analysis version of the reporting gap analysis complete


## Create the viewing-ready version (1 layer with combined comp and pct fields)

In [None]:
# Section pseudocode
# For each species:
    # Select records with non-null values for that species
    # Copy to new feature
    # Collapse per-species ratio and percentile fields into two new ones 
    # Delete per-species ratio and percentile fields
    # Add & calculate common name, jurisdiction id, and op criteria fields
# Merge layers into one for upload into AGOL

In [16]:
merge_sets = list() # Updated to contain the per-species layers to merge

for species in species_shortnames: # For each species
    whereClause = ""
    # Create where clause used to select cells where there are records
    for i, comp_type in enumerate(comp_types):
        ratio_field = comp_type+species
        
        if i < 1:
            whereClause = ratio_field + " IS NOT NULL"
        else: 
            whereClause = whereClause + " or " + ratio_field + " IS NOT NULL"
        del ratio_field

    sel = arcpy.management.SelectLayerByAttribute(RAG_final, "NEW_SELECTION", whereClause) # Select cells where there are records
    
    ps_viewing_layer = "tmp_ps_view_"+species
    arcpy.management.CopyFeatures(sel, ps_viewing_layer) # Create a copy of records for just this species
    merge_sets.append(ps_viewing_layer) # Add to list for merging later

    
    for comp_type in comp_types:
        ratio_field = comp_type+species
        pct_field = "pct_"+ratio_field
        
        # Collapse per-species field values into two fields
        arcpy.management.CalculateField(ps_viewing_layer, comp_type+"ps", "!"+ratio_field+"!")
        arcpy.management.CalculateField(ps_viewing_layer, comp_type+"ps_pct", "!"+pct_field+"!")

        arcpy.management.DeleteField(ps_viewing_layer, [ratio_field, pct_field]) # Delete per-species fields
    
    # Add some attribute fields to enable filtering in the ArcGIS Online dashboard
    arcpy.management.CalculateField(ps_viewing_layer, "Common_Nam", species_fullnames[species])
    if species is "knot":
        jsid = 1479 # Knotweed unspecified
    else:
        jsid = jurisdiction_ids[species]
    arcpy.management.CalculateField(ps_viewing_layer, "jurisdiction_species_id", jsid)
    arcpy.management.CalculateField(ps_viewing_layer, "op_criteria", "'"+threshold+"'")

arcpy.management.Merge(merge_sets, "SVI_Proj_RAG_1km_Viewing")

print("Viewing layer of the reporting gap analysis complete")

Viewing layer of the reporting gap analysis complete


In [17]:
# Delete temporary features and vars

# Make list of features to delete
tmpFeatures = list()
for species in species_shortnames:
    tmpFeatures.append("tmp_ps_view_"+species) # Add per-species layers
    tmpFeatures.append("iMap_nd_"+species) # Add iMap not-detected
    for rt in ['cnfrm', 'uncnfrm']:
        for geometry in ['point', 'line', 'polygon']:
            tmpFeatures.append("iMap_"+geometry+"_"+rt+"_"+species) # Add iMap polygon
# Delete unmerged presence points
delete_tmp_features(tmpFeatures)
# Delete user-defined variables
for obj in dir():
    if not obj.startswith("__"): # If not a system var
        del globals()[obj] # Delete