Modeling Distributions of Bering Sea Snow Crab

This is part of a NOAA project that creates updated species distribution maps for snow crab in the northern Bering Sea. This repository serves as a tool to continue to create updated SDMs for snow crab by sex and maturity will be used forecast next-season summer distributions. A portion of this work comprises a chapter of my Ph.D. dissertation and is under review at Nature Communications. The final models can be rerun using this script with a cleaned data set found here. Models were developed using R v4.3.1 in RStudio v2023.09.0 on a Windows 11 with 16 GB RAM but can be run using the latest versions. R can be downloaded here and RStudio here. Installation takes less than 20 minutes. Model runs take around 20 minutes depending on the computing power of the computer being used.

Data

Both fishery-independent, fishery-dependent, and oceanographic data were used for this project, and future work will use ROMS output from the Bering10K. Data cleaning and matching processes can be seen here, though data are not accessible through this respository and must be independently obtained. Some data are confidential and others must be requested. Specifically, we use:

The Alaska Department of Fish and Game Crab Observer Program data, which are confidential. These data include records of catches of snow crab in the targeted, fixed gear snow crab fishery. The fisheries for crab typically occur from December to March. These data were incorporated as PCA scores, through these methods. These PCA results look like this for legal size male snow crab:
The NOAA AFSC Eastern Bering Sea Bottom Trawl Survey data, which includes catches of snow crab at stations sampled by this survey. Data are available beginning in 1975 and surveys run during the summer months. May be available upon request.
The ERA5 Reanalysis sea ice concentration monthly values.
The NOAA Eastern Bering Sea sediment database, which provides a comprehensive set of grain sizes in the study region at a 1 km resolution. Sediment grain size is distributed like this for the study region:
The Bering10K ROMS output. We will be using the latest CMIP6 runs to obtain values for temperature in order to predict next season distributions in future work. These outputs are publicly available.

Methods

Different types of species distribution models (SDM) were compared in order to select the best model. Root mean square error (RMSE), Spearman's correlation coefficient, and percent deviance explained were used to compare the models. Training data used data from 1995-2014 and test data included 2015-2019 and 2021. Ultimately, boosted regression trees were selected. This process can be replicated here. An example of this comparison can be seen through this RMSE plot:

Generalized Additive Models (GAMs)

Two types of GAMs were evaluated for use as SDMs for snow crab.

Delta-type GAMs model presence-absence first using a Bernoulli distribution. Then abundance-only data is log(x+1) transformed and modeled using a Gaussian distribution. Predicted abundance is conditional on the presence-absence from the first model, meaning the predictions from each are multiplied to obtain the final predictions.
GAMs using the full set of log transformed data using a Tweedie distribution with a log link. This is a type of Poisson-gamma compound model.

Boosted Regression Trees (BRTs)

Delta-type BRTs were developed in a similar manner to the delta-type GAMs. Predicted abundance for these models was also conditional on the presence-absence from the Bernoulli model. The final predicted plots of distribution look like this:

Spatial Error

RMSE values were calculated spatially for each sex/maturity model using two different sets of train/test data. First, the original train/test data sets were used to calculate both spatial and overall RMSE. Then the models were retrained using train data that incorporated 2018 and test data that no longer included 2018. Spatial and overall RMSE values were then recalculated and compared to the initial values. This can be replicated here.

SHAP values

Shapley values are commonly used in other fields to explain outputs from machine learning models, but have only recent been applied in SDM contexts. Here, we use the SHAP implementation through the FastSHAP package and combine the SHAP values from each part of the delta-type BRT through the mshap package. Values above zero indicate positive effects on abundance from a given variable while those below zero indicate a negative effect. The greater the SHAP value, the greater the magnitude. SHAP can also be visualized spatially with various variable SHAP values added on top of the spatial SHAP values. The SHAP calculation process is found in this script.

Name		Name	Last commit message	Last commit date
Latest commit History 436 Commits
code		code
data		data
results		results
.gitignore		.gitignore
LICENSE.Rmd		LICENSE.Rmd
README.html		README.html
README.md		README.md
Snow_CrabSDM.Rproj		Snow_CrabSDM.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

data

data

results

results

.gitignore

.gitignore

LICENSE.Rmd

LICENSE.Rmd

README.html

README.html

README.md

README.md

Snow_CrabSDM.Rproj

Snow_CrabSDM.Rproj

Repository files navigation

Modeling Distributions of Bering Sea Snow Crab

Data

Methods

Generalized Additive Models (GAMs)

Boosted Regression Trees (BRTs)

Spatial Error

SHAP values

About

Releases 3

Packages

Languages

License

howardre/Snow_CrabSDM

Folders and files

Latest commit

History

Repository files navigation

Modeling Distributions of Bering Sea Snow Crab

Data

Methods

Generalized Additive Models (GAMs)

Boosted Regression Trees (BRTs)

Spatial Error

SHAP values

About

Resources

License

Stars

Watchers

Forks

Languages