GitHub - maxwell-geospatial/slopefailure_prob_models

Description

Slope failures, such as landslides, are naturally occurring geohazards that involve mass movements of rock, soil, and/or debris downslope. They can be triggered by intense precipitation and often occur in areas with steep slopes. These hazards are potentially life threatening and can cause massive amounts of damage to property and infrastructure. Therefore, it is important to understand the likelihood of landslide occurrence through machine learning, spatial predictive modeling, and probabilistic mapping to better prepare for and mitigate this hazard. Landslide probability maps are used to assess where slope failures are more likely to occur and can be of use to communities and government agencies for hazard assessment.

This repo provides code examples for undertaking this modeling process using either the Python/ArcPy or R/SAGA data science environments. The provided code shows how to:

Generate terrain derivatives from digital terrain data to serve as predictor variables
Extract predictor variables from raster data at sample point locations to build training and validation datasets
Train and assess machine learing models using scikit-learn in Python and caret in R
Predict back to raster data to generate spatial predictive models of the probability of slope failure occurence
Process large volumes of data using loops and tiling

There are several machine learning methods that can be used for probabilistic prediction. Here, we demonstrate the random forest (RF), support vector machine (SVM), and k-nearest neighbor (k-NN) methods.

This work was undertaken by the WV GIS Technical Center (WVGISTC) and the WVU Department of Geology and Geography. This project was funded by FEMA (FEMA-4273-DR-WV-0031). The performance period for the project is 6/20/2018 to 6/4/2021.

Project Members

Caleb Malay, Graduate Research Assistant, WVGISTC
Dr. Aaron Maxwell, Assistant Professor, WVU Department of Geology and Geography/WVGISTC
Kurt Donaldson, Senior Project Manager, WVGISTC
Dr. Maneesh Sharma, GIS Project Lead, WVGISTC
Dr. Steve Kite, Emeritus Professor, WVU Geology and Geography
Shannon Maynard, Research Associate, WVGISTC

WV Landslide Tool

This work was reported in the following publication, which is available in open access:

Maxwell, A.E., Sharma, M., Kite, J.S., Donaldson, K.A., Thompson, J.A., Bell, M.L. and Maynard, S.M., 2020. Slope Failure Prediction Using Random Forest Machine Learning and LiDAR in an Eroded Folded Mountain Belt. Remote Sensing, 12(3): 486. https://doi.org/10.3390/rs12030486.

The slope failure inventory and resulting predictive models are viewable using the web-based WV Landslide Tool.

The LiDAR data used in this project can be downloaded using the WV Elevation and LiDAR Download Tool created and supported by the WVGISTC and West Virginia View.

Technologies

Python Packages

numpy: work with data arrays
pandas: work with data tables
matplotlib: data visualization and graphing
sklearn: implement machine learning training and validation
rasterio: read in and work with geospatial raster data
pyspatialml: use trained models to predict to raster grids and make spatial models

R Packages

caret: training and validating machine learning models using a consistent syntax
raster: read in and work with raster data in R
dplyr: data prep and manipulation
ggplot2: data visualization and graphing
pROC: generate ROC curves and the AUC metric
tmap: geospatial data visualization in R
spatialEco: generate terrain derivatives in R
RSAGA: connect to SAGA to generate additional terrain variables (to use, you must install SAGA on your system either stand-alone or within QGIS)

How To Use

The code provided here is not the original code used to generate our slope failure models and conduct the associated studies. Instead, we have generated more generic code that may be more applicable to other projects. Our original code also involved the collective use of Python, ArcPy, R, and SAGA. Here, we have tried to partition the examples so that the workflow can be accomplished using just Python/ArcPy or R/SAGA. This is so that users can undertake the process in the environment in which they are most familiar. Below, we provide descriptions of the provided scripts.

Python

Scripts

py_terrain_derivatives_separate.py: create terrain variables from digital terrain models to use as predictor variables in the predictive model. Results are written out to separate raster grids.
py_terrain_derivatives_stack.py: create terrain variables from digital terrain models to use as predictor variables in the predictive model. Results are written out as a single raster stack.
py_extract_preds.py: use arcpy to extract predictor variables at sample point locations to build training/validation datasets. You can edit this file to extract additional predictor variables. Our example is just for the terrain variables.
py_modeling.py: prepare training/validation data, optimize models, train models, and assess models using scikit learn.
py_predict_grid.py: use trained model to predict to a raster stack and create a probabilistic model.
py_tile_looping_pnt_ext.py: large volumes of raster data cannot be used at once do to memory limitations. This script demonstrates how to loop through multiple tiles to generate training data and extract raster values at points.

Note that we have not yet added a predict over tiles loop. We plan to do so in the future.

Terrain Variables (Python)

Slope Gradient (Spatial Analyst Extension)
Slope Position
Topographic Roughness
Topographic Dissection
Mean Slope Gradient
Site Exposure Index
Heat Load Index
Linear Aspect
Surface Relief Ratio
Surface Area Ratio
Mean Curvature (Raster Functions/Surface Parameters)
Profile Curvature (Raster Functions/Surface Parameters)
Tangential Curvature (Raster Functions/Surface Parameters)

Unless otherwise noted, terrain metrics were calculated using the Geomorphometry and Gradient Metrics Toolbox.

R

Scripts

R_terrain_derivatives_single.R: create terrain variables from digital terrain models to use as predictor variables in the predictive model. Results are written out to separate raster grids.
R_terrain_derivatives_stack.R: create terrain variables from digital terrain models to use as predictor variables in the predictive model. Results are written out as a single raster stack.
R_extract_preds.py: extract predictor variables at sample point locations to build training/validation datasets. You can edit this file to extract additional predictor variables. Our example is just for the terrain variables.
R_modeling.R: prepare training/validation data, optimize models, train models, and assess models using caret and other R packages.
R_predidct_grid.R: use trained model to predict to a raster stack and create a probabilistic model.
R_tile_looping_pnt_ext.R: large volumes of raster data cannot be used at once do to memory limitations. This script demonstrates how to loop through multiple tiles to generate training data and extract values at points.
R_tile_looping_pred.R: large volumes of raster data cannot be used at once do to memory limitations. This script demonstrates how to loop through multiple tiles to generate predictions as grids.

Terrain Variables (R)

Slope Gradient (raster package)
Topographic Dissection (spatialEco package)
Surface Area Ratio (spatialEco package)
Surface Relief Ratio (spatialEco package)
TRASP Aspect Transform (spatialEco package)
Heat Load Index (spatialEco package)
Mean Slope Gradient (SAGA)
Profile Curvature (SAGA)
Plan Curvature (SAGA)
Cross-Sectional Curvature (SAGA)
Longitudinal Curvature (SAGA)
Topographic Roughness Index (SAGA)
Topographic Postion Index (SAGA)

Example Data

We have also provided some example data to experiment with the process. Since these data are large, they could not be provided with the repo. Instead, they are avaialble here on the West Virginia View webpage. Click on the Landslides download option.

lsm_data2.csv: table of slope failures and predictor variables used to demonstrate the use of the scripts.
stack2.tif: raster stack with same set of predictor variables as presented in the table to assess predicting back to raster data to make predictive models.
test_loop folder: folder that contains a set of random points, processing tiles, and a DEM to experiment with looping over tiles.

Additional Resources

If you are interested in learning more about machine learning and spatial predictive modeling using Python and/or R, some instructional materials have been provided on the West Virginia View webpage. The Open-Source GIScience course provides examples in Python while the Open-Source Spatial Analytics (R) course provides examples in R.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Python		Python
R		R
imgs		imgs
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

WV Landslide Tool