-
Notifications
You must be signed in to change notification settings - Fork 5
/
02_Background_Data.Rmd
191 lines (144 loc) · 7.65 KB
/
02_Background_Data.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
---
title: "Getting background data from OBIS"
---
In this notebook, we will explore three approaches to create background samples, aka pseudo absences. These are points where turtles were not recorded. These absences are needed for our choice of species distribution model algorithm. "Absences" does not mean that turtle could not be sighted here but that we have no records at these locations, either because we didn't look or looked and didn't see them.
These three approaches are as follows:
- Get occurrences for marine species from OBIS using the `robis` package, and use their locations as the locations for absences.
- Select random locations using gridded environmental data as a base for our sampling.
- Select random locations using a polygon drawn around our presence locations.
In both cases, points will be constrained to fit within our area of interest.
## Loading libraries
```{r libraries, message = FALSE}
#Deal with spatial data
library(sf)
#Base maps and plotting spatial data
library(rnaturalearth)
library(mapview)
library(raster)
#Data visualisation and manipulation
library(tidyverse)
#Find files easily
library(here)
#Access to OBIS
library(robis)
#SDM
library(sdmpredictors)
library(dismo)
library(ggspatial)
```
## Set-up
### Setting base directories
These directories contain the biological data (i.e., presence locations of loggerhead sea turtles) and environmental data.
```{r base_dir}
#Setting directories containing input data
dir_data <- file.path(here::here(), "data/raw-bio")
dir_env <- file.path(here::here(), "data/env")
```
### Loading bounding box for region of interest
In the [Region page](tutorial/Region.html), we created a bounding box for our region of interest. We will load this bounding box here to spatially constrain our data.
```{r}
#Loading bounding box for the area of interest
fil <- here::here("data", "region", "BoundingBox.shp")
extent_polygon <- read_sf(fil)
#Extract polygon geometry
pol_geometry <- st_as_text(extent_polygon$geometry)
```
## Approach 1: Use other species
In this approach we use other marine species presence locations as our locations for our background samples.
### Getting occurrence data from OBIS
We will use `robis` to get observations for marine species from OBIS within our bounding box. OBIS data includes about 100 different columns, but not all of these columns are relevant to us. We will define the columns that we need and then we will perform a search of the OBIS database.
### Defining relevant columns
```{r rel_cols}
cols.to.use <- c("scientificName", "dateIdentified", "eventDate", "decimalLatitude", "decimalLongitude", "coordinateUncertaintyInMeters", "bathymetry", "shoredistance", "sst", "sss")
```
### Querying OBIS
By setting the `wrims` parameter to `TRUE` we include observations of species registerd in the [World Register of Introduced Marine Species (WRiMS)](https://www.marinespecies.org/introduced/index.php).
```{r obis_query}
#Applying bounding box and including WRiMS species
background <- occurrence(geometry = pol_geometry, wrims = TRUE,
#DNA data is not needed, subsetting columns of interest
dna = FALSE, fields = cols.to.use,
#Excluding records labelled as being on land
exclude = "ON_LAND")
```
### Saving background data
```{r save_bg, eval=FALSE}
#Setting full file path to save background information
file_path_out <- file.path(dir_data, "io-background.csv")
#Saving background data as csv
write_csv(background, file_path_out)
```
### (Optional) Load background data
If you have previously downloaded the background data, you can simply load the data to the environment instead of downloading it again. To do this, you can use the code below.
```{r load_bg, eval=FALSE}
#Find background file in our biological data folder
file_path_bg <- list.files(dir_data, pattern = "background", full.names = TRUE)
#Load file
background_1 <- read_csv(file_path_bg)
```
### Plotting background data
We will create a map with all the observations we obtained from OBIS within our region of interest.
First we will load the region map that was saved in the [Region page](tutorial/00_Region.Rmd)
```{r}
fil <- here::here("data", "region", "region_map_label.rda")
load(fil)
```
Saving map into variable
```{r}
# load our base map and add points
region_map_label +
geom_point(data = background,
#Point to coordinates for background points
aes(x = decimalLongitude, y = decimalLatitude),
#Changing color and size of points for background data
color = "red", size = 0.1)
```
As you can see from the map above, this approach is not truly random. This is why we are including a second method to create background samples.
## Approach 2. Random points in our region
This section has been adapted from [this online tutorial](https://bbest.github.io/eds232-ml/lab1a_sdm-explore.html).
### Loading raster layer for area of interest
We will be using sdmpredictors for our environmental data layers (variables). We need to load a raster layer from sdmpredictors so we can sample locations from this raster. It doesn't matter what variable we use, it just needs to not have NAs. Using a marine layer from sdmpredictors ensures that we will not sample from the land. See the [sdmpredictors section](tutorial/03_sdmpredictors-variables.html) for discussion of accessing rasters with sdmpredictors and how to find out what layers are available.
We used a bathymetery layer: "MS_bathy_5m". We will load this data and crop it using our bounding box.
```{r load_env}
#Set default directory for environmental data
options(sdmpredictors_datadir = dir_env)
#Loading bathymetry
env_stack <- load_layers("MS_bathy_5m") %>%
#Cropping to our area of interest
crop(extent_polygon)
```
We can plot the cropped bathymetry to ensure it matches our study region and we did not make any mistakes.
```{r plot_bathy}
plot(env_stack)
```
We can see the outline of our area of interest. Now we can continue with creating our background points.
### Sample points from bathymetry layer
Using the `dismo` package, we will create random points over the bathymetry layer that we will use as background points. In this example, we have chosen to produce 1000 background points.
It is worth noting that the distribution and number of background points has a strong influence on SDM results. See the `README` file for resources discussing this issue.
```{r bg_samp}
#Setting seed for reproducibility
set.seed(42)
#Setting number of background points required
nsamp <- 1000
#Create background points
background <- randomPoints(env_stack, nsamp) %>%
#Transform to tibble
as_tibble() %>%
#Transform to sf object
st_as_sf(coords = c("x", "y"), crs = 4326)
```
### Plotting results
We will plot results to make sure our background points are in the ocean only. Here we use an alternate why to plot points on a map.
```{r plot_bg_2}
mapview(background, col.regions = "gray")
```
### Saving background samples
We can now save the background locations to our local machine.
```{r save_absence}
absence_geo <- file.path(dir_data, "absence.geojson")
pts_absence_csv <- file.path(dir_data, "pts_absence.csv")
st_write(background, pts_absence_csv, layer_options = "GEOMETRY=AS_XY", append = FALSE)
```
## Approach 3. Random points a convex hull
This section has been adapted from [this online tutorial](https://bbest.github.io/eds232-ml/lab1a_sdm-explore.html) and [this online tutorial](https://github.com/BigelowLab/maxnet/wiki/stars).
The next notebook will discuss how to get environmental data that we will use as inputs in our SDM from the `sdmpredictors` package.