/
custom-checklist.Rmd
187 lines (131 loc) · 7.11 KB
/
custom-checklist.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
---
title: "Generating custom checklists with rWCVP"
author: "Matilda Brown"
date: "24/05/2022"
---
<!-- GENERATED BY vignettes/articles/precompile.R -->
As well as allowing users to create checklists from the [World Checklist of Vascular Plants (WCVP)](https://wcvp.science.kew.org), rWCVP provides functionality for modifying a checklist output to generate a customised report. I'll demonstrate this here by **generating a list of species that are endemic (or near-endemic) to Sierra Leone**.
In addition to `rWCVP`, we'll use the `tidyverse` packages for data manipulation and plotting, the `gt` package for rendering nice tables, and the `sdpep` package to find bordering regions.
```r
library(rWCVP)
library(tidyverse)
library(gt)
library(spdep)
```
In this example we use the pipe operator (`%>%`),`dplyr` and `ggplot` - if these are unfamiliar we'd suggest checking out https://tidyverse.org/ and some of the help pages therein, or this code might be difficult to interpret.
Now, let's get started!
## A checklist of endemic species
We start by generating a checklist of the species that occur in Sierra Leone. Remembering and/or finding the appropriate WGSRPD Level 3 codes is a pain, so we can use `get_wgsrpd3_codes("Sierra Leone")` to do that work for us in the function call.
```r
sl_code <- get_wgsrpd3_codes("Sierra Leone")
#> i Matches to input geography found at Area (Level 3) and Country (Gallagher)
sl_species <- wcvp_checklist(area=sl_code, synonyms = FALSE)
#> i No taxon specified. Generating checklist for all species.
#> i Generating a checklist of accepted species names only. Use `synonyms = TRUE` to include all names
```
How many species occur in Sierra Leone, and how many are endemic? We can use the `endemic` column here, so it's really simple.
```r
#enclosed in parentheses so that the output is printed as well as assigned
(endemic_summary <- sl_species %>%
distinct(taxon_name, endemic) %>%
group_by(endemic) %>%
summarise(number.of.sp = n()))
#> # A tibble: 2 x 2
#> endemic number.of.sp
#> <lgl> <int>
#> 1 FALSE 3303
#> 2 TRUE 45
```
Easy! For a list of endemic species, we can simply filter our checklist using the `endemic` column, but what about **near-endemics**?
## A checklist of near-endemic species
Depending on how we define near-endemics, there are two ways we can approach the filtering step.
1. We define near-endemics as those species that occur in Sierra Leone and one other WGSPRD3 Area (L3). From a data perspective, this means filtering out species that have >2 rows in `sl_species` (because each row is a species-area occurrence).
2. Alternatively, we can consider near-endemics as those species that might occur across a border, so are functionally endemic. To do this, we need to a) identify the neighbouring WGSPRD3 Areas and b) filter our species list accordingly.
### 1. Species in Sierra Leone plus 1 other area
We can just directly remove any species from our list of species from Sierra Leone that occur in more than 2 areas.
```r
sl_near_endemics1 <- sl_species %>%
group_by(plant_name_id, taxon_name) %>%
filter(n() < 3) %>%
ungroup()
```
### 2. Species in Sierra Leone and neighbouring areas
First, we need to identify which WGSRPD Areas share a border with Sierra Leone.
We could do this by looking at a map, but we'll do this programmitically using the shape files.
To do so, we take the WGSRPD level 3 polygons and find all the areas that border each other.
```r
sf_use_s2(FALSE)
area_polygons <- rWCVPdata::wgsrpd3
area_neighbours <- poly2nb(area_polygons)
```
**Note, we had to turn off spherical coordinates in `sf` for this using `sf_use_s2(FALSE)`.**
Now we have a list of neighbouring areas, we need to find the ones that border Sierra Leone.
```r
#which polygon/s is/are Sierra Leone?
sl_index <- which(area_polygons$LEVEL3_COD %in% sl_code)
#find neighbouring polygons
sl_neighbours_index <- area_neighbours[[sl_index]]
#get the polygons for Sierra Leone plus its neighbours
sl_plus_neighbours <- area_polygons[c(sl_index, sl_neighbours_index),]
```
We can map the region to sanity-check this automated neighbour detection before we generate our final checklist.
```r
#get a sensible bounding box for our plot
bounding_box <- st_bbox(sl_plus_neighbours)
xmin <- bounding_box["xmin"] - 2
xmax <- bounding_box["xmax"] + 2
ymin <- bounding_box["ymin"] - 2
ymax <- bounding_box["ymax"] + 2
#plot the map
ggplot(area_polygons) +
#world polygons first, for context
geom_sf(fill="white", colour="grey") +
#add polygons of interest
geom_sf(data=sl_plus_neighbours, fill="#a4dba2", colour="gray20")+
#bounding box we sey up above
coord_sf(xlim=c(xmin, xmax), ylim=c(ymin, ymax))+
#add country name labels
geom_sf_label(data=sl_plus_neighbours,aes(label=sl_plus_neighbours$LEVEL3_NAM))+
#add ocean background
theme(panel.background = element_rect(fill="#b6badb")) +
#remove x and y from axes
xlab(NULL) +
ylab(NULL)
```
![](figure/neighbours-plot-1.png)
Of course, we could have just identified Guinea and Liberia as neighbouring countries from a map, then found the codes using `get_wgsrpd3_codes("Liberia")` and `get_wgsrpd3_codes("Guinea")`, but that's not nearly as much fun!
Next, we can identify near-endemics as those species that *only* have occurrences in Sierra Leone, Guinea or Liberia.
```r
sl_near_endemics2 <- sl_species %>%
# filtering for each species
group_by(plant_name_id) %>%
# only keep those where all points are in the threee countries
filter(all(area_code_l3 %in% sl_plus_neighbours$LEVEL3_COD)) %>%
ungroup()
```
And finally we filter our list to only species that occur in Sierra Leone + one neighbour, like we did in Option 1. Looking at the map it seems plausible that a species could occur right at the triple junction between the three countries, but for this example we will exclude those species.
```r
sl_near_endemics2 <- sl_near_endemics2 %>%
group_by(plant_name_id, taxon_name) %>%
filter(n() < 3) %>%
ungroup()
```
## Creating our formatted report
Now we can do something a bit fancy - turn our checklist data frame into a formatted report. To do this, we plug it into a template file called "custom_checklist.Rmd" that is stored in the `rWCVP` package folder (specifically, the "rmd" subfolder). We pass the data (as well as some other information) using `params`, and need to specify a file name using `output_file`.
```r
library(rmarkdown)
#informative test to include in our html
checklist_description <- "Checklist of species that are endemic to Sierra Leone (or near-endemic, based on neighbouring countries)"
#for file saving
wd <- getwd()
#do the rendering
render(system.file("rmd", "custom_checklist.Rmd", package = "rWCVP"),
quiet = TRUE,
params=list( version = "New Phytologist Special Issue",
mydata = sl_nearendemics2,
description = checklist_description),
output_file = paste0(wd,"/Sierra_Leone_endemics_and_near_endemics.html"))
```
And here is our file (screenshot here, because the report has left the building)!
____
![example report](custom-checklist-example-report.png)