Skip to content

maxwell-geospatial/center_weighted_accuracy_assessment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Center-Weighted Accuracy Assessment R Function Documentation

Overview

Two version of the function have been provided to allow for different input formats. The cwghtAccVec() function accepts reference and predicted data in vector format (for example, a shapefile or feature class) and the extent mask as a raster grid (for example, TIFF or IMG format). The cwghtAccRas() functions accepts all data in a raster format.

The purpose of the function is to generate accuracy assessment metrics using the center-weighted method described in the following paper: “Thematic Classification Accuracy Assessment with Inherently Uncertain Boundaries: An Argument for Center-Weighted Accuracy Assessment Metrics.”

Maxwell, A.E., and T.A. Warner, 2020. Thematic classification accuracy assessment with inherently uncertain boundaries: an argument for center-weighted accuracy assessment metrics, Remote Sensing, 12(12): 1-21. https://doi.org/10.3390/rs12121905.

Input Requirements

cwghtAccVec()

  • Reference data as a vector that contains an attribute column that defines the class to which each feature belongs. This does not need to include the background class. Classes should be coded numerically from 1 to the number of classes. 0 is reserved for the background class, which can be provided or can be generated within the process. The background class can be NULL or not represented in the input data.
  • Reference data as a vector that contains an attribute column that defines the class to which each feature was predicted. This does not need to include the background class. Classes should be coded numerically from 1 to the number of classes. 0 is reserved for the background class, which can be provided or can be generated within the process. The background class can be NULL or not represented in the input data.
  • Raster mask covering the full extent of the validation area. Each cell within the desired extent should be coded to 1 and the cell size should be set to the defined processing cell size. This raster grid will be used as a template for all additional raster grids generated by the function. Note: All data layers should have the same spatial reference. You should use a projected coordinate system as opposed to a geographic coordinate system.

cwghtAccRas()

  • Raster grids of unique values for each spatial contiguous area of the same class in both the reference and predicted data. Should be coded from 1 to the number of features in the dataset.
  • Raster grid of class codes. A unique numeric code should be assigned to each class from 1 to the number of classes. 0 is reserved for the background class, which can be provided or can be generated within the process.
  • Raster mask covering the full extent of the validation area. Each cell within the desired extent should be coded to 1 and the cell size should be set to the defined processing cell size. This raster grid will be used as a template for all raster grids generated by the function.

Note: All data layers should have the same spatial reference. You should use a projected coordinate system as opposed to a geographic coordinate system. All raster grids should have the same extent, origin, number or rows, number of columns, and cell size. R is less forgiving with cell alignment issues than GIS software packages.

Functions Parameters

cwghtAccVec()

  • reference: polygons of reference objects read in using st_read(). (No Default)
  • predicted: polygons of predicted objects read in using st_read(). (No Default)
  • mask: raster mask of validation extent read in using raster(). (No Default)
  • cellSize: processing cell size. Should be the same as the mask raster grid. (No Default)
  • referenceCls: field name of the column that specifies the class to which the reference data are assigned. Should be provided as a string. Field should contain numeric codes representing each class. (No Default)
  • predictedCls: field name of the column that specifies the class to which the predicted data are assigned. Should be provided as a string. Field should contain numeric codes representing each class. (No Default)
  • pwr: distance weighting exponent to apply. (Default is 1 or a linear weighting)
  • satDist: saturation threshold distance or distance from edge at which to stop increasing the weight. (Default is 10,000 map units, such as meters)
  • areaBased&: Whether to generate count-based or area-based metrics. (Default is area-based or TRUE)
  • twoClass: Whether to perform a multi-class assessment or an assessment of a binary classification. (Default is TRUE or a binary classification)
  • grass_Dir: directory information for GRASS on your machine. (for example, “C:/Program Files/GRASS GIS 7.8”)

cwghtAccRas()

  • referenceID: raster grid of unique IDs for each reference object read in using st_read(). 0 is reserved for background. (No Default)
  • predictedID: raster grid of unique IDs for each predicted object read in using st_read(). 0 is reserved for background. (No Default)
  • referenceCls: raster grid of class codes for each reference object read in using st_read(). 0 is reserved for background. (No Default)
  • predictedCls: raster grid of class codes for each predicted object read in using st_read(). 0 is reserved for background. (No Default)
  • mask: raster mask of validation extent read in using raster(). (No Default)
  • cellSize: processing cell size. Should be the same as the mask raster grid. (No Default)
  • pwr: distance weighting exponent to apply. (Default is 1 or a linear weighting)
  • satDist: saturation threshold distance or distance from edge at which to stop increasing the weight. (Default is 10,000 map units, such as meters)
  • areaBased: Whether to generate count-based or area-based metrics. (Default is area-based or TRUE)
  • twoClass: Whether to perform a multi-class assessment or an assessment of a binary classification. (Default is TRUE or a binary classification)
  • grass_Dir: directory information for GRASS on your machine. (for example, “C:/Program Files/GRASS GIS 7.8”)

Note: All raster grids should have the same extent, origin, number or rows, number of columns, and cell size. R is less forgiving with cell alignment issues than GIS software packages. You can code background cells to 0 or leave them as NULL or NoDATA since the function can perform the recoding.

Dependencies

The functions make use of the following packages:

  • raster: allows for management and processing of raster data.
  • sp/sf: for management and processing of vector data (sf is used where possible, but some operations required sp).
  • dplyr: for basic data manipulation.
  • fasterize/fasterRaster: to speed up processing of raster data and creation of distance surfaces. Note: fasterize will need to be downloaded/cloned and installed from GitHub (https://github.com/ecohealthalliance/fasterize). Note: fasterRaster will need to be downloaded/cloned and installed from GitHub (https://github.com/adamlilith/fasterRaster).
  • caret/rfUtilities/differ: for calculation of assessment metrics.
  • You will also need to have GRASS installed on your machine (https://grass.osgeo.org/news/2020_10_05_grass_gis_7_8_4_released/). This needs to be a stand-alone version outside of QGIS.

Values Returned

For Two-Class Assessment

  • Metrics calculated using confusionMatrix() function from the caret package: confusion matrix, overall accuracy, Kappa, Sensitivity, Specificity, Positive Prediction Value, Negative Predication Value, Precision, Recall, F1 Score, Prevalence, Detection Rate, Detection Prevalence, Balanced Accuracy (Note: not all metrics will be meaningful for all classification problems). For Multi-Class Assessment
  • Metrics calculated using confusionMatrix() function from the caret package: confusion matrix, overall accuracy, Kappa, Sensitivity, Specificity, Positive Prediction Value, Negative Predication Value, Precision, Recall, F1 Score, Prevalence, Detection Rate, Detection Prevalence, Balanced Accuracy (Note: not all metrics will be meaningful for all classification problems).
  • Metrics calculated using the accuracy() function from the rfUtilities package: User’s and Producer’s Accuracy for all classes.
  • Metrics from differ package: Overall Error, Allocation Disagreement, Quantity Disagreement, Exchange Disagreement, Shift Disagreement.

The output will be provided as a list object with following elements:

  • positive: which class is considered the positive case (generally irrelevant for multi-class problems).
  • table: confusion matrix as a table object. Columns represent the reference data and rows represent the predictions.
  • overall: vector of overall accuracy metrics (first value is overall accuracy and second value is Kappa).
  • byClass: class-level metrics calculated by the confusionMatrix() function provided by the caret package.
  • producers.accuracy: vector of producer’s accuracy for each class calculated using the rfUtilities package (will not be returned for two-class problem).
  • users.accuracy: vector of user’s accuracy for each class calculated using the rfUtilities package (will not be returned for two-class problem).
  • Pontius: data frame of assessment metrics calculated with the diffeR package (will not be returned for two-class problem).

Example Data

Example data have been provided to experiment with the functions:

  • predicted_shapes.shp: synthetic data as a vector shapefile representing predicted extents for a single class against a background. The “class” field provides the positive class code (1).
  • reference_shapes.shp: synthetic data as a vector shapefile representing reference extents for a single class against a background. The “class” field provides the positive class code (1).
  • extent.shp: validation extent as a vector shapefile.
  • raster_mask.tif: validation extent as a raster grid. All cells in the extent are coded to 1.
  • predicted_id.tif: raster grid of unique IDs for each predicted object.
  • reference_id.tif: raster grid of unique IDs for each reference object.
  • predicted_class.tif: raster grid of class codes for each predicted object.
  • reference_class.tif: raster grid of class codes for each reference object.

Note: These data are equivalent to the synthetic data used in Example 3 in the associated paper. All raster grids have the same extent, cell size, number of rows, number of columns, and number of cells. All data layers have the same spatial reference. Grids were produced at a 10 m cell size. Note that decreasing the cells size and/or increasing the size of the validation extent will increase the required processing time. These data can be used as a template for generating your own data for input to this function.

Example Code

The code below provides an example of how to perform the analysis using the vector-based method followed by the raster-based method for the provided example data. You will need to execute the function before using it.

setwd("YOUR FOLDER PATH ")
reference <- st_read("reference_shapes.shp")
predicted <- st_read("predicted_shapes.shp")
mask <- raster("raster_mask.tif")

cellSize <- 10
referenceCls <- "class"
predictedCls <- "class"
pwr <- 1
areaBased=TRUE 
twoClass=TRUE
satDist <- 10000
grass_Dir <- “C:/Program Files/GRASS GIS 7.8”

vector_based_result <- cwghtAccVec(reference, predicted, mask, cellSize, referenceCls, predictedCls, pwr, areaBased, twoClass, satDist, grass_Dir)

setwd("YOUR FOLDER PATH HERE")
referenceID <- raster("reference_id.tif")
predictedID <- raster("predicted_id.tif")
referenceCls <- raster("reference_class.tif")
predictedCls <- raster("predicted_class.tif")
mask <- raster("raster_mask.tif")

cellSize <- 10
pwr <- 1
areaBased=TRUE 
twoClass=TRUE
satDist <- 10000
grass_Dir <- “C:/Program Files/GRASS GIS 7.8”

raster_based_result <- cwghtAccRst(referenceID, predictedID, referenceCls, predictedCls, mask, cellSize, pwr, areaBased, twoClass, satDist, grass_Dir)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published