In [85]:
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# <img width=50px  src = 'https://apps.fs.usda.gov/lcms-viewer/images/lcms-icon.png'> Lab 7: LCMS Map Validation

<table align="left">
  <td>
    <a href="https://colab.research.google.com/github/redcastle-resources/lcms-training/blob/main/7-Map_Validation.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo"> Run in Colab
    </a>
  </td>
  <td>
    <a href="https://github.com/redcastle-resources/lcms-training/blob/main/7-Map_Validation.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
  <td>
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://github.com/redcastle-resources/lcms-training/blob/main/7-Map_Validation.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      Open in Vertex AI Workbench
    </a>
  </td>
</table>
<br/><br/><br/>



## 7.0: Overview and Introduction


This notebook teaches how to assess map accuracy of LCMS outputs

While this notebook teaches the specific steps LCMS takes to assess map accuracy, your assessment methods can possibly be much simpler if you have a large training sample, simple or systematic random sample design, and/or a map output that always takes the most confident prediction from the model. 


### 7.0.1: Objective

In this tutorial, you learn how to assess the map accuracy of LCMS map outputs. 

This tutorial uses the following Google Cloud services:

- `Google Earth Engine`
- `Google Cloud Storage`

The steps performed include:

- Understanding the difference between model and map accuracy
- Simulating map accuracy with k-fold cross validation

### 7.0.2: Before you begin

#### If you are working in Workbench: Set your current URL under `workbench_url`
This gives the Map Viewer a url in which to host the viewer we will be generating. 
* This will be in your URL/search bar at the top of the browser window you are currently in
* It will look something like `https://1234567890122-dot-us-west3.notebooks.googleusercontent.com/` (See the image below)

![workspace url](img/workspace-url.png)

#### Set a folder to use for all exports under `export_path_root` 
* This folder should be an assets folder in an existing GEE project.
* By default, this folder is the same as the pre-baked folder (where outputs have already been created). 
* If you would like to create your own outputs, specify a different path for `export_path_root`, but leave the `pre_baked_path_root` as it was. This way, the pre-baked outputs can be shown at the end, instead of waiting for all exports to finish.
* It will be something like `projects/projectID/assets/newFolder`
* This folder does not have to already exist. If it does not exist, it will be created

**If you are working in Qwiklabs and wish to export:** Copy the project ID from the 'Start Lab' screen into the `projectID` field in `export_path_root`.

In [116]:
workbench_url = 'https://559cdf0b5fe9790f-dot-us-central1.notebooks.googleusercontent.com'
pre_baked_path_root  = 'projects/rcr-gee/assets/lcms-training'
export_path_root = pre_baked_path_root

print('Done')

Done


#### Installation
First, install necessary Python packages. Uncomment the first line to upgrade geeViz if necessary.

Note that for this module, we're also importing many data science packages such as pandas. 

In [375]:
#Module imports
#!python -m pip install geeViz --upgrade
try:
    import geeViz.getImagesLib as getImagesLib
except:
    !python -m pip install geeViz
    import geeViz.getImagesLib as getImagesLib

import geeViz.changeDetectionLib as changeDetectionLib
import geeViz.assetManagerLib as aml
import geeViz.taskManagerLib as tml
import geeViz.gee2Pandas as g2p
import inspect,operator,os,glob,json,warnings
import matplotlib.pyplot as plt
import pandas as pd  
import numpy as np

import lcms_scripts.accuracy_and_sampling_lib2 as asl
from importlib import reload

try:
    from sklearn.model_selection import train_test_split
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.model_selection import GroupKFold
    from sklearn.metrics import accuracy_score,classification_report,balanced_accuracy_score,cohen_kappa_score
    from sklearn import metrics 
except:
    !pip install -U scikit-learn
    from sklearn.model_selection import train_test_split
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.model_selection import GroupKFold
    from sklearn.metrics import accuracy_score,classification_report,balanced_accuracy_score,cohen_kappa_score
    from sklearn import metrics 
# from IPython.display import IFrame,display, HTML
ee = getImagesLib.ee
Map = getImagesLib.Map

# Can set the port used for viewing map outputs
Map.port = 1235
print('Done')


Done


#### Set up your work environment

Create a folder in your export path where you will export the composites. In addition, create a blank image collection where your composites will live.

Currently, when running within Colab or Workbench, geeView uses a different project to authenticate through, so you may need to make your asset public to view from within Colab.

In [118]:
# Bring in all folders/collections that are needed
# These must already exist as they are created in previous notebooks

export_timeSync_folder = f'{export_path_root}/lcms-training_module-4_timeSync'

export_assembledLCMSOutputs_collection = f'{export_path_root}/lcms-training_module-6_assembledLCMSOutputs'

# This is the pre-made TimeSync data
# Creating this dataset is not covered in this set of notebooks
timeSync_featureCollection = 'projects/lcms-292214/assets/R8/PR_USVI/TimeSync/18_PRVI_AllPlots_TimeSync_Annualized_Table_secLC'

# The model options table (created in module 5.1, but stored in the lcms-training repository)
model_options_csv_filename = './tables/LCMS_model_options_table.csv'
change_thresholds_json_filename = './tables/LCMS_change_thresholds.json'
# Set up folder to put accuracy files
local_map_acc_folder = '/tmp/lcms-training/local_map_accuracy'

if not os.path.exists(local_map_acc_folder):os.makedirs(local_map_acc_folder)

print('Done')

Done


In [119]:
# set up map
Map.clearMap()

# reset port if necessary
Map.port = 1235
Map.proxy_url = workbench_url

print('Done')

Done


## 7.1: Data Setup

### 7.1.1: Background

* While previous tutorials have provided a good depiction of the model error, for a number of reasons, this is not a statistically valid depiction of the map error. The following are reasons why the model error is not the same as the map error in the case of LCMS' current methods:
    * Our sample is not a simple random sample - rather it is a stratified random sample
    * For Change, we introduce additional logic that makes the default model output different than the map output
* LCMS also lacks sufficient training data to leave a subset out of the actual model calibration
    * This means we need to simulate the model error using either a bootstrapping technique or k-fold cross validation
* If your particular project has sufficient training data to leave a subset out of the actual model calibration, it is a simple or systematic random sample (not stratified random), and the default model predicted class is the same as your final map class, you can skip this step and use the model errors from the methods used in Module 5.

### 7.1.2: Bring in Reference Data

* First, we'll need to repeat steps from Module 5 and download our reference data to a local location

In [120]:
# Bring in raw TS data
timeSyncData = ee.FeatureCollection(timeSync_featureCollection)
timeSync_fields = timeSyncData.first().toDictionary().keys().getInfo()
# Now lets bring in all training data and prep it for modeling
assets = ee.data.listAssets({'parent': export_timeSync_folder})['assets']

# You may need to change the permissions for viewing model outputs in geeViz
# Uncomment this if needed
# for asset in assets:aml.updateACL(asset['name'],writers = [],all_users_can_read = True,readers = [])

# Read in each year of extracted TimsSync data
training_data = ee.FeatureCollection([ee.FeatureCollection(asset['name']) for asset in assets]).flatten()

# Bring in existing LCMS data for the class names, numbers, and colors
lcms_viz_dict = ee.ImageCollection("USFS/GTAC/LCMS/v2020-6").first().toDictionary().getInfo()
                                             
print('LCMS class code, names, and colors:',lcms_viz_dict)


# Get the field names for prediction
# Find any field that was not in the original TimeSync data and assume that is a predictor variable
all_fields = training_data.first().toDictionary().keys().getInfo()
predictor_field_names = [field for field in all_fields if field not in timeSync_fields]

# Filter out any non null values (any training plot with missing predictor data will cause the model to fail entirely)
training_data = training_data.filter(ee.Filter.notNull(predictor_field_names))

print('Done')

LCMS class code, names, and colors: {'Change_class_names': ['Stable', 'Slow Loss', 'Fast Loss', 'Gain', 'Non-Processing Area Mask'], 'Change_class_palette': ['3d4551', 'f39268', 'd54309', '00a398', '1b1716'], 'Change_class_values': [1, 2, 3, 4, 5], 'Land_Cover_class_names': ['Trees', 'Tall Shrubs & Trees Mix (SEAK Only)', 'Shrubs & Trees Mix', 'Grass/Forb/Herb & Trees Mix', 'Barren & Trees Mix', 'Tall Shrubs (SEAK Only)', 'Shrubs', 'Grass/Forb/Herb & Shrubs Mix', 'Barren & Shrubs Mix', 'Grass/Forb/Herb', 'Barren & Grass/Forb/Herb Mix', 'Barren or Impervious', 'Snow or Ice', 'Water', 'Non-Processing Area Mask'], 'Land_Cover_class_palette': ['005e00', '008000', '00cc00', 'b3ff1a', '99ff99', 'b30088', 'e68a00', 'ffad33', 'ffe0b3', 'ffff00', 'aa7700', 'd3bf9b', 'ffffff', '4780f3', '1b1716'], 'Land_Cover_class_values': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], 'Land_Use_class_names': ['Agriculture', 'Developed', 'Forest', 'Non-Forest Wetland', 'Other', 'Rangeland or Pasture', 'No

In [106]:
# Now, we'll crosswalk the training fields to numeric codes
# The TimeSync fields are a string by default
# They must be a number for modeling
# Set up lookup dictionaries to convert the names to numeric codes
land_cover_name_code_dict = ee.Dictionary({'TREES':1,
                             'TSHRUBS-TRE':2,
                             'SHRUBS-TRE':3,
                             'GRASS-TREE':4,
                             'BARREN-TRE':5,
                             'TSHRUBS':6,
                             'SHRUBS':7,
                             'GRASS-SHRU':8,
                             'BARREN-SHR':9,
                             'GRASS':10,
                             'BARREN-GRA':11,
                             'BARREN-IMP':12,
                             'BARREN-IMP':12,
                             'WATER':14
                            })
land_use_name_code_dict = ee.Dictionary({'Agriculture':1,
                           'Developed':2,
                           'Forest':3,
                           'Non-forest Wetland':4,
                           'Other':5,
                           'Rangeland':6
                          })

change_code_dict = ee.Dictionary({'Debris': 3, 
                                  'Fire': 3, 
                                  'Growth/Recovery': 4, 
                                  'Harvest': 3, 'Hydrology': 3, 
                                  'Mechanical': 3, 
                                  'Other': 3, 
                                  'Spectral Decline': 2, 
                                  'Stable': 1, 
                                  'Structural Decline': 2, 
                                  'Wind/Ice': 3})

reference_field_dict = {'Land_Cover':{'field':'DOM_SEC_LC','name_code_dict':land_cover_name_code_dict},
                        'Land_Use':{'field':'DOM_LU','name_code_dict':land_use_name_code_dict},
                        'Change':{'field':'CP','name_code_dict':change_code_dict,
                                  'fields':['Slow Loss', 'Fast Loss', 'Gain']}
                       }
# Make a function that will get the code for a given name and set it
# We could also use the remap function to accomplish this
def set_class_code(plot,product):
    name_fieldName = reference_field_dict[product]['field']
    code_fieldName = ee.String(name_fieldName).cat('_Code')
    name = ee.String(plot.get(name_fieldName))
    code = reference_field_dict[product]['name_code_dict'].get(name)
    plot = plot.set(code_fieldName,code)
    return plot
                    
                    
    # print(name_fieldName,code_fieldName.getInfo(),name.getInfo(),code.getInfo())
            
# set_class_code(training_data.first(),'Land_Cover')
for product in list(reference_field_dict.keys()):
    print('Crosswalking:',product)
    training_data = training_data.map(lambda f:set_class_code(f,product))

# Now will download the training table to a local location

local_model_data_folder = '/tmp/lcms-training/local_modeling'
local_training_csv = os.path.join(local_model_data_folder,'timeSync_training_table.csv')


if not os.path.exists(local_model_data_folder):os.makedirs(local_model_data_folder)

# Download the training data from a featureCollection to a local CSV
# This function will automatically break the featureCollection into 5000 feature featureCollections
# if it is larger than the 5000 feature limit set by GEE
g2p.featureCollection_to_csv(training_data,local_training_csv,overwrite = False)

# Once the table is store locally, read it in
training_df = pd.read_csv(local_training_csv)

training_df.describe()
print('Done')

Crosswalking: Land_Cover
Crosswalking: Land_Use
Crosswalking: Change
/tmp/lcms-training/local_modeling\timeSync_training_table.csv  already exists
Done


#### 7.2.1.1 Filter to predictors used

* Filter out to only have rows from the non correlated top 30 predictors
* No single set of prectors works best over all models and all model performance metrics
* Any subset of predictors could be used here, but this one should work well
* Good options are: `Non-correlated Predictors`, `All Predictors Top 30`, or `Non-correlated Predictors Top 30`


In [111]:
predictor_set = 'Non-correlated Predictors Top 30'
model_options = pd.read_csv(model_options_csv_filename)

model_options = model_options[model_options['Model Name'] == predictor_set]

display(model_options)

print('Done')

Unnamed: 0,Product Name,Model Name,OOB Acc,Overall Acc,Balanced Acc,Kappa,Var Imp
3,Change,Non-correlated Predictors Top 30,0.892001,0.833756,0.286863,0.099097,"['swir2_LT_slope', 'swir2_LT_fitted', 'NDVI_LT..."
7,Land_Cover,Non-correlated Predictors Top 30,0.973405,0.707922,0.290145,0.512691,"['red_LT_fitted', 'green_CCDC_fitted', 'slope'..."
11,Land_Use,Non-correlated Predictors Top 30,0.994287,0.815904,0.627273,0.697383,"['red_LT_fitted', 'red_CCDC_fitted', 'NDVI_CCD..."


Done


### 7.1.2: K-Fold Setup

* LCMS does not have enough training samples to simply ommit 20% or so from training our final models
* Since our assemblage process introduces differences between the model predicted class, and our sample is based on a stratified random sample design, we cannot simply use the out-of-bag samples from the random forest model
* We have to use a method that will simulate the map accuracy that can account for the likelihood of each samples inclusion (strata weights), as well as also allow us to introduce any assemblage rules that are not typically part of the underlying random forest model
* Our methods roughly follow guidance from [Stehman 2014](./lit/Stehman_2014.pdf)
* We first divide our reference sample into a train and test set over K-folds and train a random forest model for each fold
* E.g. if we have 5 folds, then for each fold, 80% of the training data are used to train the model, and the remaining 20% are held out for comparing the reference and predicted classes. This ensures all samples are held out once ($20 \%  * 5 folds = 100\%$).
![Cross_Validation](https://scikit-learn.org/stable/_images/grid_search_cross_validation.png)

In [112]:

products = ['Change','Land_Cover','Land_Use']
KFoldInfo = {}
# kfoldinfo_pickle_filename = pickleName+'.p'
# KFoldInfo['TrainingData'] = training_df.copy()

# strata = allTrainingData[stratColumn].squeeze()
groups = training_df['PLOTID'].squeeze()
k = 10
n_jobs = 4
gkf = GroupKFold(n_splits=k)
foldNum = 1
seed = 999
nTrees = 50

# Fit and Train model
# Set up a random forest model
rf = RandomForestClassifier(n_estimators = nTrees, random_state=seed,oob_score=False,n_jobs = n_jobs)
KFoldInfo['STRATUM'] = []
KFoldInfo['STRATUM_PIXEL_COUNT'] = []
KFoldInfo['STRATUM_PIXEL_PCT'] = []
for train_index, test_index in gkf.split(training_df, training_df, groups):
    KFoldInfo[foldNum] = {}
    
    print(f'Fold {foldNum} has {len(train_index)} training samples and {len(test_index)} test samples')

    # Indices of training and test samples
    KFoldInfo[foldNum]['Indices'] = {\
        'Train': train_index,
        'Test': test_index}

    # Run model and predict probabilities
    KFoldInfo[foldNum]['Probabilities'] = {}
    KFoldInfo[foldNum]['Predictions'] = {}
    KFoldInfo[foldNum]['Model'] = {}
    KFoldInfo[foldNum]['Ref'] = {}
    
    # Pull the tran and test plots
    k_train,k_test = training_df.iloc[train_index], training_df.iloc[test_index]
    
    # Get the strata info
    KFoldInfo['STRATUM'].extend(k_test['STRATUM'])
    KFoldInfo['STRATUM_PIXEL_COUNT'].extend(k_test['STRATUM_PIXEL_COUNT'])
    KFoldInfo['STRATUM_PIXEL_PCT'].extend(k_test['STRATUM_PIXEL_PCT'])
    for product_name in products:
        print(f'Fitting model for fold {foldNum}, {product_name}')
        
        # Pull predictors from table from 5.1
        # Some parsing is needed to read it in properly
        predictor_variable_names = model_options[model_options['Product Name'] == product_name]['Var Imp'].values[0]
        predictor_variable_names = predictor_variable_names[1:-1]
        predictor_variable_names=predictor_variable_names.replace("'","").split(', ')
        
        # Get X and Y points for each group  
        kx_train = k_train[predictor_variable_names]
        ky_train = k_train[reference_field_dict[product_name]['field']+'_Code']
        
        kx_test = k_test[predictor_variable_names]
        ky_test = k_test[reference_field_dict[product_name]['field']+'_Code']
        
        rf.fit(kx_train,ky_train)
        
        # Get predicted classes or probabilities for each Test Point
        # For change, get the probabilities, for all others, get the classes
        if product_name in ['Land_Cover','Land_Use']:
            ky_pred = rf.predict(kx_test)
            
        else:
            ky_pred = rf.predict_proba(kx_test)
        KFoldInfo[foldNum]['Predictions'][product_name] = ky_pred
        KFoldInfo[foldNum]['Ref'][product_name] = ky_test
        
    foldNum+=1

print('Done')

Fold 1 has 17799 training samples and 1979 test samples
Fitting model for fold 1, Change
Fitting model for fold 1, Land_Cover
Fitting model for fold 1, Land_Use
Fold 2 has 17798 training samples and 1980 test samples
Fitting model for fold 2, Change
Fitting model for fold 2, Land_Cover
Fitting model for fold 2, Land_Use
Fold 3 has 17798 training samples and 1980 test samples
Fitting model for fold 3, Change
Fitting model for fold 3, Land_Cover
Fitting model for fold 3, Land_Use
Fold 4 has 17806 training samples and 1972 test samples
Fitting model for fold 4, Change
Fitting model for fold 4, Land_Cover
Fitting model for fold 4, Land_Use
Fold 5 has 17806 training samples and 1972 test samples
Fitting model for fold 5, Change
Fitting model for fold 5, Land_Cover
Fitting model for fold 5, Land_Use
Fold 6 has 17798 training samples and 1980 test samples
Fitting model for fold 6, Change
Fitting model for fold 6, Land_Cover
Fitting model for fold 6, Land_Use
Fold 7 has 17799 training samples 

### 7.1.3: Strata Weights Setup

* In order to account for the probability of including a sample from a given strata, we need to know the proportion of the total sample location population for each stratum 
* Note that our strata are not the same as our map classes

In [326]:
stratum = [i for i in KFoldInfo['STRATUM']]

stratum_counts = KFoldInfo['STRATUM_PIXEL_COUNT']
stratum_pct = [(n/100) for n in KFoldInfo['STRATUM_PIXEL_PCT']]
strata_dict = dict(set(zip(stratum,stratum_counts)))
strata_pct_dict = dict(set(zip(stratum,stratum_pct)))
print(strata_dict)


print('Done')

{4: 344785, 8: 3325898, 3: 110101, 5: 71185, 7: 3519261, 2: 113212, 1: 1450331, 9: 865643, 10: 258676, 11: 102288, 6: 92958}
Done


### 7.1.4: Organize Reference and Predicted Values

* Organize reference and predicted values into simple dictionary by product

In [249]:
ref_pred_dict = {}

for product_name in products:
    preds = []
    refs = []
    for foldNum in range(1,k+1):
        predsFold = KFoldInfo[foldNum]['Predictions'][product_name]
        refsFold = KFoldInfo[foldNum]['Ref'][product_name]
        preds.extend(predsFold)
        refs.extend(refsFold)
        
    refs = pd.Series(refs)
    preds = pd.Series(preds)
    ref_pred_dict[product_name] = {'refs':refs,'preds':preds}
   
print(ref_pred_dict)

{'Change': {'refs': 0        1
1        1
2        1
3        1
4        1
        ..
19773    1
19774    1
19775    1
19776    4
19777    1
Length: 19778, dtype: int64, 'preds': 0        [0.76, 0.0, 0.02, 0.22]
1        [0.96, 0.0, 0.02, 0.02]
2        [0.92, 0.04, 0.0, 0.04]
3        [0.78, 0.0, 0.06, 0.16]
4         [0.9, 0.0, 0.06, 0.04]
                  ...           
19773     [0.46, 0.0, 0.2, 0.34]
19774    [0.86, 0.0, 0.02, 0.12]
19775    [0.82, 0.0, 0.02, 0.16]
19776    [0.64, 0.0, 0.08, 0.28]
19777    [0.42, 0.0, 0.44, 0.14]
Length: 19778, dtype: object}, 'Land_Cover': {'refs': 0         1
1         1
2         1
3         1
4        12
         ..
19773     1
19774     1
19775     1
19776     7
19777     1
Length: 19778, dtype: int64, 'preds': 0         1
1         1
2         1
3         1
4        12
         ..
19773     1
19774     1
19775     1
19776     1
19777     1
Length: 19778, dtype: int64}, 'Land_Use': {'refs': 0        3
1        3
2        6
3        3
4      

### 7.1.5: Apply Thresholds to Change

* Since the Change outputs use a specific set of thresholds and logic covered in Module 6, we must replicate this process to ensure the predicted Change classes reflect what would be on the map.
* There are many additional considerations for representing time series map accuracy
    * Often, since our input predictor data may not indicate a change occurred until the year after, we will allow for a prediction the year after to count as correct. These kind of fuzzy logic exceptions are not included in this example, but may be appropriate to consider. 

In [250]:
product_name = 'Change'

# Open class thresholds used from Module 6
o = open(change_thresholds_json_filename,'r')
change_thresholds = json.load(o)
o.close()

# Pull the predictions from the k-fold cross validation
# Currently they contain the probability of all predictions
preds = np.array(ref_pred_dict[product_name]['preds'].values.tolist())
print('Input Change Predictions:',preds)

# Specify which classes to assemble and the index (value - 1) where they are located in the prediction
# probability table
change_classes = ['Slow Loss','Fast Loss','Gain']
change_indices = [1,2,3]

# Find the max probability
max_prob = np.max(preds, axis=1)
print(max_prob)

# Set up a constant output of 1 (Stable)
out = np.ones(len(max_conf))

# Find max conf class
change_class_max_conf_preds = preds.argmax(axis=1)+1

# Iterate across each class 
for change_class,change_index in list(zip(change_classes,change_indices)):
    
    # Pull the predictions for the class
    preds_class = preds[:,change_index]

    # Find if it is the max and above the threshold
    isMax = preds_class == max_prob
    aboveThresh = preds_class > change_thresholds[change_class]
    isChange = isMax & aboveThresh

    # If it is the max and above the threshold, recode the stable to the respective value (index +1)
    change_value = change_index+1
    out[isChange] = change_value

print('Output Change Predictions:',out)
# Reset the predictions for Change
ref_pred_dict[product_name]['preds'] = pd.Series(out)

print('Done')

Input Change Predictions: [[0.76 0.   0.02 0.22]
 [0.96 0.   0.02 0.02]
 [0.92 0.04 0.   0.04]
 ...
 [0.82 0.   0.02 0.16]
 [0.64 0.   0.08 0.28]
 [0.42 0.   0.44 0.14]]
[0.76 0.96 0.92 ... 0.82 0.64 0.44]
Output Change Predictions: [1. 1. 1. ... 1. 1. 3.]
Done


## 7.2 Compute Accuracy
### 7.2.1: Look at confusion matrices

* We will first look at the confusion matrices from the cross-validation
* These will be very similar to the matrices shown in Module 5

* Notice the impact of the change assemblage (thresholds) is very minor. While the overall accuraccy is not changed much, the map is much different (as shown in module 6).

In [385]:
reload(asl)
# Ignore divide by 0 warnings from NumPy
warnings.filterwarnings('ignore')

# Iterate across each product
for product in products:
    product_title = product.replace('_',' ')

    # Set up output filename
    cm_file = os.path.join(local_map_acc_folder,f'LCMS_{product}_Map_Confusion_Matrix.csv')

    # Get the class numbers, names, and values 
    class_numbers = list(set(ref_pred_dict[product]['refs']))
    class_names = lcms_viz_dict[f'{product}_class_names'][:-1]
    class_values = lcms_viz_dict[f'{product}_class_values'][:-1]
    lookup_dict = dict(list(zip(class_values,class_names)))

    # Get rid of any missing values
    assessment_classes = [n for n in class_values if n  in class_numbers]
    labels = [lookup_dict[n] for n in class_values if n  in class_numbers]

    # Get the confusion matrix
    print(f'Final Assembled {product_title} Map Confusion Matrix:')
    cm = asl.getConfusionMatrix(ref_pred_dict[product]['refs'],ref_pred_dict[product]['preds'],stratum,stratum_pct,strata_dict,assessment_classes,labels)
    display(cm)
    cm.to_csv(cm_file)
    
    # Show the confusion matrix of change if we simply used the max model confidence (only slightly different)
    if product == 'Change':
        print('Change Max Conf:')
        cm = asl.getConfusionMatrix(ref_pred_dict[product]['refs'],change_class_max_conf_preds,stratum,stratum_pct,strata_dict,assessment_classes,labels)
        display(cm)

print('Done')

Final Assembled Change Map Confusion Matrix:


Unnamed: 0,Stable,Slow Loss,Fast Loss,Gain,Producers Accuracy
Stable,2563,0,4,20,99
Slow Loss,2,0,0,0,0
Fast Loss,152,0,8,1,5
Gain,440,0,0,17,3
Users Accuracy,81,-99,65,43,81


Change Max Conf:


Unnamed: 0,Stable,Slow Loss,Fast Loss,Gain,Producers Accuracy
Stable,2567,0,4,17,99
Slow Loss,2,0,0,0,0
Fast Loss,153,0,8,0,5
Gain,442,0,0,15,3
Users Accuracy,81,-99,64,44,81


Final Assembled Land Cover Map Confusion Matrix:


Unnamed: 0,Trees,Shrubs & Trees Mix,Grass/Forb/Herb & Trees Mix,Barren & Trees Mix,Shrubs,Grass/Forb/Herb & Shrubs Mix,Barren & Shrubs Mix,Grass/Forb/Herb,Barren & Grass/Forb/Herb Mix,Barren or Impervious,Water,Producers Accuracy
Trees,1781,2,18,0,5,3,0,47,0,15,0,94
Shrubs & Trees Mix,136,0,3,0,0,0,0,9,0,0,0,0
Grass/Forb/Herb & Trees Mix,154,3,15,0,16,4,0,54,0,12,0,5
Barren & Trees Mix,4,0,0,0,0,0,0,0,0,0,0,0
Shrubs,50,4,6,0,11,9,0,27,0,2,0,9
Grass/Forb/Herb & Shrubs Mix,51,0,10,0,14,5,0,71,0,0,0,3
Barren & Shrubs Mix,6,0,0,0,0,0,0,0,0,1,0,0
Grass/Forb/Herb,83,0,20,0,1,21,0,186,0,20,0,55
Barren & Grass/Forb/Herb Mix,1,0,0,0,0,0,0,5,0,10,0,0
Barren or Impervious,41,0,2,0,0,0,0,13,1,220,0,80


Final Assembled Land Use Map Confusion Matrix:


Unnamed: 0,Agriculture,Developed,Forest,Non-Forest Wetland,Other,Rangeland or Pasture,Producers Accuracy
Agriculture,2,9,5,0,0,33,5
Developed,0,479,62,0,0,53,81
Forest,4,25,1741,9,1,90,93
Non-Forest Wetland,0,0,17,11,0,16,31
Other,0,2,17,0,5,0,58
Rangeland or Pasture,3,28,206,10,0,370,58
Users Accuracy,30,85,85,46,91,64,81


### 7.2.2: Run Stehman 2014 Accuracy Method

* Once the reference and predicted classes are organized for each LCMS output, we can run a method that reflects the method outlined in [Stehman 2014](https://www.tandfonline.com/doi/abs/10.1080/01431161.2014.930207) for each LCMS output product.
* The outputs are stored as `.txt` files 

In [386]:

reload(asl)
acc_results = {}
for product_name in products:
    print(f'Computing accuraccy for: {product_name}')
    acc_file = os.path.join(local_map_acc_folder,f'LCMS_{product_name}_Map_Accuracy_Results.txt')
    o = open(acc_file,'w')
    accuracy, balanced_accuracy, users, producers, kappa, f1_score, areas, accuracy_error, usersError, producersError, area_errors = asl.get_write_stratified_accuracies(\
        ref_pred_dict[product_name]['refs'],            # The correct classifications
        ref_pred_dict[product_name]['preds'],      # The predicted classifications
        stratum,       # The strata of the same plots as above
        strata_dict,         # Dictionary of the number of pixels in each stratum - defined in LCMSVariables - used for weighting
        lcms_viz_dict[f'{product_name}_class_values'][:-1], # Class names - used for looping through classes for users/producers accuracies and areas
        lcms_viz_dict[f'{product_name}_class_names'][:-1],
        method = product_name,        # This is just a run name, used for printing out accuracies in file. Not really used anymore
        accFile = o)
    o.close()
    acc_results[product_name] = accuracy, balanced_accuracy, users, producers, kappa, f1_score, areas, accuracy_error, usersError, producersError, area_errors 

print('Done')

Computing accuraccy for: Change
Computing accuraccy for: Land_Cover
Computing accuraccy for: Land_Use


### 7.2.3: Look at Accuracy Outputs

* Once finished, we can look at the results

In [387]:
# Get each output .txt file and open it 
results_files = glob.glob(os.path.join(local_map_acc_folder,'*Accuracy*txt'))
for results_file in results_files:
    o = open(results_file,'r')
    results = o.read()
    o.close()
    print(results)

Accuracy: 0.8137716820779104 +/- 0.003345112180960089
Balanced Accuracy: 0.2882303591059005
Kappa: 0.06061158523380341
F1 Score: 0.8137716820779104
Users Accuracy: 
Class Stable: 0.8191787855235716 +/- 0.0033958684877441743
Class Slow Loss: nan +/- 0.0
Class Fast Loss: 0.6509683893052343 +/- 0.000447825761355762
Class Gain: 0.43235039751683174 +/- 0.0006205535385353442
Producers Accuracy: 
Class Stable: 0.9904348583719218 +/- 0.0033958684877441743
Class Slow Loss: 0.0 +/- 0.0
Class Fast Loss: 0.05773522541938911 +/- 0.000447825761355762
Class Gain: 0.038275263150338454 +/- 0.0006205535385353442
Design-Based Area Estimation: 
Class Stable: 0.8135018712954257 +/- 0.003346061413220812
Class Slow Loss: 0.0010107002132115045 +/- 0.0002393632800430789
Class Fast Loss: 0.04889481523090165 +/- 0.001865172708520779
Class Gain: 0.13659261326046104 +/- 0.0029603884520563143

Accuracy: 0.7004417582378794 +/- 0.0037934945246005805
Balanced Accuracy: 0.20966407240850368
Kappa: 0.34618704079927115
F1

## 7.3 Copy outputs to Google Cloud Storage
### 7.3.1 Set up Google Cloud Storage Bucket

* This step will set up a location in Google Cloud Storage for the accuracy results `.txt` files to be copied to

In [388]:
pre_baked_accuracy_output_bucket = f'gs://lcms-training-outputs-accuracy'

accuracy_output_bucket = pre_baked_accuracy_output_bucket

# You may need to run these functions if running outside Workbench
# In order to avoid permissions issues, ensure the projectName is the same as the one you used for authenticating 
# to GEE
# !gcloud auth login
# !gcloud projects list
# !gcloud config set project projectName

!gsutil mb {accuracy_output_bucket}

Creating gs://lcms-training-outputs-accuracy/...
ServiceException: 409 A Cloud Storage bucket named 'lcms-training-outputs-accuracy' already exists. Try another name. Bucket names must be globally unique across all Google Cloud projects, including those outside of your organization.


### 7.3.2: Copy outputs to Google Cloud Storage

* This step copy the outputs to the bucket

In [389]:
!gsutil -m cp {local_map_acc_folder}/*.txt {accuracy_output_bucket}
!gsutil -m cp {local_map_acc_folder}/*.csv {accuracy_output_bucket}

Copying file://\tmp\lcms-training\local_map_accuracy\LCMS_Change_Map_Accuracy_Results.txt [Content-Type=text/plain]...
/ [0/3 files][    0.0 B/  5.0 KiB]   0% Done                                    
Copying file://\tmp\lcms-training\local_map_accuracy\LCMS_Land_Cover_Map_Accuracy_Results.txt [Content-Type=text/plain]...
/ [0/3 files][    0.0 B/  5.0 KiB]   0% Done                                    
Copying file://\tmp\lcms-training\local_map_accuracy\LCMS_Land_Use_Map_Accuracy_Results.txt [Content-Type=text/plain]...
/ [0/3 files][    0.0 B/  5.0 KiB]   0% Done                                    
/ [1/3 files][  5.0 KiB/  5.0 KiB]  99% Done                                    
/ [2/3 files][  5.0 KiB/  5.0 KiB]  99% Done                                    
/ [3/3 files][  5.0 KiB/  5.0 KiB] 100% Done                                    

Operation completed over 3 objects/5.0 KiB.                                      
Copying file://\tmp\lcms-training\local_map_accuracy\LCMS_Change_Map

In [100]:
# Open this link to view the accuracy outputs
print(f'https://console.cloud.google.com/storage/browser/{os.path.basename(accuracy_output_bucket)}')

https://console.cloud.google.com/storage/browser/lcms-training-outputs-accuracy


# Lab 7 challenge:



## Done with Module 7