In [None]:
#load file from imported Galaxy hisrory dataset

input_path = "galaxy_inputs"

for (dir in list.dirs(input_path)){
    for (file in list.files(dir)) {
        file_path = file.path(dir, file)}
}

## Index overview

The Red List Index of ecosystems (RLIE) measures trends in ecosystem collapse risk. It uses the risk categories defined based on IUCN Red List of Ecosystems risk assessments. The index complements the Red List Index of species survival, providing comparable information about ecosystems risk. It is calculated for the overall risk category assigned to each ecosystem and separately for each criterion.

This information sheet provides the code used to calculate the index and an example of each step. 

**Reference:**  
Rowland, J. A., Bland, L. M., Keith, D. A., Bignoli, D. J., Burgman, M., Etter, A., Ferrer-Paris, J. R., Miller, R. M. and Nicholson, E. (2020) Ecosystem indices to support global biodiversity conservation. Conservation Letters. e12680  

## Set up functions

### 1) Order risk categories

The function *danger* orders the Red List of Ecosystems risk categories from lowest to highest risk. The categories are:

- NE = Not Evaluated
- DD = Data Deficient
- LC = Least Concern
- NT = Near Threatened
- VU = Vulnerable
- EN = Endangered
- CR = Critically Endangered
- CO = Collapsed

In [1]:
danger <- function(x){
 
  # Set up
  position = 1
  
  # Define order of risk categories
  dangerzone <- c("NE", "DD", "LC", "NT", "VU", "EN", "CR", "CO")
  
  # Order risk categories
  for(i in 1:length(dangerzone)){
    if(x == dangerzone[i]) {
      position = i
    }
  }
    return(position)
}

### 2) Define highest risk category

The function *maxcategory* uses the category ranks defined by the function *danger*. If the risk categories for each sub-criteria are listed in separate columns, the function *maxcategory* selects the highest risk category across the columns (i.e. subcritera) for each criteria.

In [2]:
maxcategory <- function (x) {
  
  # Set up
  value = 0
  position = 0
  highestvalue = NULL
  
  # Define highest risk category across columns
  for(i in 1:length(x)){
    if (danger(x[i]) > value){
    value = danger(x[i])
    position = i
    highestvalue = x[i]
    }
  }
  
  # Return highest risk category across columns
  category_list <- c(highestvalue, position)
  return(category_list)
}

#### Forests of the Americas example 

An example dataset of the Red list of Ecosystem assessments of the forests across the Americas is available from github.The assessments are from the continental-scale RLE assessments of 136 temperate and tropical forests across 51 countries/territories in the Caribbean and Americas (Ferrer-Paris et al., 2018). 

**Reference:**  
Ferrer-Paris, J. R., Zager, I., Keith, D. A., Oliveira-Miranda, M. A., Rodríguez, J. P., Josse, C., … Barrow, E. (2018). An ecosystem risk assessment of temperate and tropical forests of the Americas with an outlook on future conservation strategies. Conservation Letters, 12. https://doi.org/10.1111/conl.12623

The columns in the dataframe are:  
- ecosystem = type of ecosystem.  
- country = country containing part of the ecosystem distribution.   
- criterion_A = risk category assigned based on Criterion A - the change in ecosystem area.  
- criterion_B = risk category assigned based on Criterion B - the whether the ecosystem is spatially restricted.  
- criterion_C = risk category assigned based on Criterion C - the change in abiotic conditions.  
- criterion_D = risk category assigned based on Criterion D - the change in biotic processes and interactions.  

In [4]:
# Load data
data <- read.csv(file_path) #fill in path to file

# View data
head(data)

# Set up - these will form the new columns with the highest risk category for each criteria and overall
n <- nrow(data)
overall <- as.character(n)
criterion_A <- as.character(n)
criterion_B <- as.character(n)
criterion_C <- as.character(n)
criterion_D <- as.character(n)

# Calculate overall risk category for criterion A
for(i in 1:n){
  A <- data[i, 3:5] # alter the numbers for the relevant columns in a dataset
  results <- maxcategory(A)
  criterion_A[i] <- results[1]
}

# Calculate overall risk category for criterion B
for(i in 1:n){
  B <- data[i, 6:8]  # alter the numbers for the relevant columns in a dataset
  results <- maxcategory(B)
  criterion_B[i] <- results[1]
}

# Calculate overall risk category for criterion C
for(i in 1:n){
  C <- data[i, 9:10]  # alter the numbers for the relevant columns in a dataset
  results <- maxcategory(C)
  criterion_C[i] <- results[1]
}

# Calculate overall risk category for criterion D
for(i in 1:n){
  D <- data[i, 11:13]  # alter the numbers for the relevant columns in a dataset
  results <- maxcategory(D)
  criterion_D[i] <- results[1]
}

# Add overall risk categories for each criterion to the dataframe
data$criterion_A <- unlist(criterion_A)
data$criterion_B <- unlist(criterion_B)
data$criterion_C <- unlist(criterion_C)
data$criterion_D <- unlist(criterion_D)

# Calculate overall risk category across columns
for(i in 1:n){
overall_risk <- data[i, 14:17]  # alter the numbers for the relevant columns in a dataset
results <- maxcategory(overall_risk)
overall[i] <- results[1]
}

# Add overall risk category to the dataframe
data$overall <- unlist(overall)

# View output
head(data)

Unnamed: 0_level_0,ecosystem,country,criterion_A1,criterion_A2b,criterion_A3,criterion_B1,criterion_B2,criterion_B3,criterion_C2a,criterion_C2b,criterion_D1,criterion_D2b,criterion_D3
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
1,Mangrove,Peru,DD,NT,DD,LC,NT,LC,DD,NT,DD,VU,DD
2,Mangrove,Ecuador,VU,EN,NT,LC,LC,LC,DD,LC,VU,NT,LC
3,Mangrove,Colombia,VU,LC,VU,LC,LC,LC,DD,VU,LC,LC,LC
4,Mangrove,Panama,VU,VU,EN,LC,LC,LC,DD,DD,VU,NT,LC
5,Mangrove,Nicaragua,DD,NT,DD,LC,LC,LC,DD,LC,LC,NT,LC
6,Mangrove,Mexico,EN,NT,EN,LC,LC,LC,DD,LC,VU,VU,LC


Unnamed: 0_level_0,ecosystem,country,criterion_A1,criterion_A2b,criterion_A3,criterion_B1,criterion_B2,criterion_B3,criterion_C2a,criterion_C2b,criterion_D1,criterion_D2b,criterion_D3,criterion_A,criterion_B,criterion_C,criterion_D,overall
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
1,Mangrove,Peru,DD,NT,DD,LC,NT,LC,DD,NT,DD,VU,DD,NT,NT,NT,VU,VU
2,Mangrove,Ecuador,VU,EN,NT,LC,LC,LC,DD,LC,VU,NT,LC,EN,LC,LC,VU,EN
3,Mangrove,Colombia,VU,LC,VU,LC,LC,LC,DD,VU,LC,LC,LC,VU,LC,VU,LC,VU
4,Mangrove,Panama,VU,VU,EN,LC,LC,LC,DD,DD,VU,NT,LC,EN,LC,DD,VU,EN
5,Mangrove,Nicaragua,DD,NT,DD,LC,LC,LC,DD,LC,LC,NT,LC,NT,LC,LC,NT,NT
6,Mangrove,Mexico,EN,NT,EN,LC,LC,LC,DD,LC,VU,VU,LC,EN,LC,LC,VU,EN


## 3) Assign ordinal values to risk categories 

The function *calcWeights* allocates each risk category an ordinal rank from 0 (Least Concern) to 5 (Collapsed). This step is included for informational purposes only and can be skipped because the the *calcRLIE* function described below includes this step.

The ordinal ranks are:
- Not Evaluated = Excluded
- Data Deficient = Excluded
- Least Concern = 0  
- Near Threatened = 1  
- Vulnerable = 2  
- Endangered = 3  
- Critically Endangered = 4  
- Collapsed = 5  

Parameters are:  
- eco_data = dataframe  
- RLE_criteria = name of the column with the Red List of Ecosystems criterion of interest  

In [5]:
calcWeights <- function(eco_data, RLE_criteria) {
  
  # Remove NA values (where values aren't true NAs)
  eco_data <- dplyr::filter(eco_data, .data[[RLE_criteria]] != "NA")
  
  # Calculate numerical weights for each ecosystem based on risk category
  weight_data <- dplyr::mutate(eco_data, 
                               category_weights = case_when(.data[[RLE_criteria]] == "CO" ~ 5,
                                                            .data[[RLE_criteria]] == "CR" ~ 4, 
                                                            .data[[RLE_criteria]] == "EN" ~ 3, 
                                                            .data[[RLE_criteria]] == "VU" ~ 2, 
                                                            .data[[RLE_criteria]] == "NT" ~ 1,
                                                            .data[[RLE_criteria]] == "LC" ~ 0))
}

#### Example

Calculate the weights for the risk categories for one criteria, such as Criterion A:

In [6]:
# Install packages
library(dplyr)
library(tidyr)

# Use function to calculate weights for criterion of interest
output_weights <- calcWeights(data, RLE_criteria = "criterion_A")

# View output
head(output_weights)


Attaching package: ‘dplyr’


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union




Unnamed: 0_level_0,ecosystem,country,criterion_A1,criterion_A2b,criterion_A3,criterion_B1,criterion_B2,criterion_B3,criterion_C2a,criterion_C2b,criterion_D1,criterion_D2b,criterion_D3,criterion_A,criterion_B,criterion_C,criterion_D,overall,category_weights
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>
1,Mangrove,Peru,DD,NT,DD,LC,NT,LC,DD,NT,DD,VU,DD,NT,NT,NT,VU,VU,1
2,Mangrove,Ecuador,VU,EN,NT,LC,LC,LC,DD,LC,VU,NT,LC,EN,LC,LC,VU,EN,3
3,Mangrove,Colombia,VU,LC,VU,LC,LC,LC,DD,VU,LC,LC,LC,VU,LC,VU,LC,VU,2
4,Mangrove,Panama,VU,VU,EN,LC,LC,LC,DD,DD,VU,NT,LC,EN,LC,DD,VU,EN,3
5,Mangrove,Nicaragua,DD,NT,DD,LC,LC,LC,DD,LC,LC,NT,LC,NT,LC,LC,NT,NT,1
6,Mangrove,Mexico,EN,NT,EN,LC,LC,LC,DD,LC,VU,VU,LC,EN,LC,LC,VU,EN,3


## 4) Calculate the index

The function *calcRLIe* selects the column in a dataframe listing the risk categories, and allocates the ordinal ranks specified by the function *calcWeights* (see above). These ordinal ranks are used to calculate the Red List Index for Ecosystems (RLIE) and percentiles capturing the middle 95% of the data. The RLIE ranges from zero (all ecosystems Collapsed) to one (all Least Concern).  

Parameters are:
- eco_data = dataframe  
- RLE_criteria = column name of criterion of interest  
- group1 = the factor (optional) you want to group the index by. Where not specified, an RLIE will be calculated based on all ecosystems (output = single score)  
- group2 = the second factor (optional) you want to group the index by  

Parameters 'group1' and 'group2' are optional.

In [7]:
calcRLIE <- function(eco_data, RLE_criteria, group1, group2){
  
  # Filter out rows with NE and DD from selected column
  filter_data <- dplyr::filter(eco_data, .data[[RLE_criteria]] != "NE" & .data[[RLE_criteria]] != "DD")
  
  # Calculate ordinal ranks for each ecosystem based on risk category
  weight_data <- calcWeights(filter_data, RLE_criteria)
  weight_data <- drop_na(weight_data, .data[[RLE_criteria]])
  
  # Calculate index score for the (i) whole dataset, (ii) for one defined grouping, and (iii) for two nested groupings:
  
  ## (i) Calculate overall index score using all rows
  if (missing(group1)) {
    values <- dplyr::group_by(weight_data)
    
  ## (ii) Calculate index scores for individual groups
  } else {
    if (missing(group2)) {
      values <- dplyr::group_by(weight_data, 
                                group1 = .data[[group1]])
      
  ## (iii) Calculate scores for each level within two nested groupings
    }  else {
      values <- dplyr::group_by(weight_data, 
                                group1 = .data[[group1]],
                                group2 = .data[[group2]])
    }
  }
  
  summed_weights <- summarise(values, 
                              
                              # Sum ordinal ranks
                              total_weight = sum(category_weights), total_count = n(), 
                             
                              # Define the upper and lower quantiles 
                              upper = 1 - quantile(category_weights, probs = 0.025) / 5, 
                              lower = 1 - quantile(category_weights, probs = 0.975) / 5)
  
  # Calculate index scores
  index_scores <- mutate(summed_weights, 
                         RLIE = 1 - (total_weight/(total_count * 5)),
                         criteria = RLE_criteria)
  
  # Return dataframe with index scores
  return(index_scores)
}

#### Example

Calculate the index using no groupings. The output includes:  

- total_weight = sum of all weights included in the index.  
- total_count = the total number of ecosystems included in the index.   
- lower and upper = the intervals are based on the quantiles aiming to capture the middle 95% of the data calcualted using the 2.5th and 97.5th percentiles.  
- RLIE = the Ecosystem Area Index.  
- criteria = the criteria used to calculate the index.  

In [8]:
# Calculate the index values
output <- calcRLIE(data,
                   RLE_criteria = "criterion_A")

# View output
head(output)

“[1m[22mUse of .data in tidyselect expressions was deprecated in tidyselect 1.2.0.
[36mℹ[39m Please use `all_of(var)` (or `any_of(var)`) instead of `.data[[var]]`”


total_weight,total_count,upper,lower,RLIE,criteria
<dbl>,<int>,<dbl>,<dbl>,<dbl>,<chr>
955,507,1,0.2,0.6232742,criterion_A


Calculate the index using one grouping, such as the ecosystem type:

In [9]:
# Calculate the index values
output_one_grouping <- calcRLIE(data,
                                RLE_criteria = "criterion_A",
                                group1 = "ecosystem")

# View output
head(output_one_grouping)

group1,total_weight,total_count,upper,lower,RLIE,criteria
<chr>,<dbl>,<int>,<dbl>,<dbl>,<dbl>,<chr>
Cool Temperate Forest,62,30,1,0.2,0.5866667,criterion_A
Mangrove,94,47,1,0.2,0.6,criterion_A
Temperate Flooded & Swamp Forest,55,25,1,0.2,0.56,criterion_A
Tropical Dry Forest & Woodland,198,90,1,0.2,0.56,criterion_A
Tropical Flooded & Swamp Forest,221,133,1,0.2,0.6676692,criterion_A
Tropical Lowland Humid Forest,174,106,1,0.2,0.6716981,criterion_A


Calculate the index using two groupings where ecosystems are grouped by ecosystem type and country:

In [10]:
# Calculate the index values
output_two_groupings <- calcRLIE(data,
                                 RLE_criteria = "criterion_A",
                                 group1 = "ecosystem",
                                 group2 = "country")

# View output
head(output_two_groupings)

[1m[22m`summarise()` has grouped output by 'group1'. You can override using the
`.groups` argument.


group1,group2,total_weight,total_count,upper,lower,RLIE,criteria
<chr>,<chr>,<dbl>,<int>,<dbl>,<dbl>,<dbl>,<chr>
Cool Temperate Forest,Argentina,1,2,0.995,0.805,0.9,criterion_A
Cool Temperate Forest,Canada,25,10,0.91,0.2,0.5,criterion_A
Cool Temperate Forest,Chile,0,2,1.0,1.0,1.0,criterion_A
Cool Temperate Forest,Mexico,2,1,0.6,0.6,0.6,criterion_A
Cool Temperate Forest,United States,34,15,1.0,0.2,0.5466667,criterion_A
Mangrove,Antigua and Barbuda,2,1,0.6,0.6,0.6,criterion_A


## Author information

jessica.rowland674@gmail.com  
http://jessrowlandresearch.wordpress.com

In [None]:
write.table(output_two_groupings, file = "outputs/collection/output_two_groupings.tsv", sep = "\t", dec = ".", quote = FALSE, row.names = FALSE )
write.table(output_one_grouping, file = "outputs/collection/output_one_grouping.tsv", sep = "\t", dec = ".", quote = FALSE, row.names = FALSE )
write.table(output, file = "outputs/collection/output_RLIE.tsv", sep = "\t", dec = ".", quote = FALSE, row.names = FALSE )
write.table(output_weights, file = "outputs/collection/output_weights.tsv", sep = "\t", dec = ".", quote = FALSE, row.names = FALSE )
write.table(data, file = "outputs/collection/output_data.tsv", sep = "\t", dec = ".", quote = FALSE, row.names = FALSE )