-
Notifications
You must be signed in to change notification settings - Fork 0
/
final_project.Rmd
393 lines (269 loc) · 18.2 KB
/
final_project.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
---
title: "Invasive Animal Species Dashboard"
author: "Johann Wagner - u6958957"
date: "2023-10-20"
output:
html_document:
toc: true
toc_depth: 4
theme: cosmo
number_sections: false
toc_float: true
highlight: pygments
fig_width: 8
fig_height: 4
editor_options:
markdown:
wrap: 72
---
# Word/figure count
Words: 1999 words
Figures: 3
# Location on GitHub
The root directory on GitHub: [DS4B-final-project](https://github.com/johann-wagner/DS4B-final-project/tree/main).
# Dashboard Link
[Dashboard Link](https://0uexv8-johann-wagner.shinyapps.io/invasive_species_dashboard/)
# Data Description
The data used is from The Atlas of Living Australia (ALA), which is "a collaborative, digital and open infrastructure repository that aggregates biodiversity data from multiple sources" ([ALA, 2022](https://support.ala.org.au/support/solutions/articles/6000261410-introduction-to-the-ala)).
Specifically, this analysis uses species occurrence records for seven invasive animal species:
1. [European Rabbits (Oryctolagus cuniculus)](https://bie.ala.org.au/species/https://biodiversity.org.au/afd/taxa/692effa3-b719-495f-a86f-ce89e2981652)
2. [European Red Foxes (Vulpes vulpes)](https://bie.ala.org.au/species/https://biodiversity.org.au/afd/taxa/2869ce8a-8212-46c2-8327-dfb7fabb8296)
3. [Cane Toads (Rhinella marina)](https://bie.ala.org.au/species/https://biodiversity.org.au/afd/taxa/e79179f8-44bf-46eb-b139-a2e66f6045c1)
4. [Feral Cats (Felis catus)](https://bie.ala.org.au/species/https://biodiversity.org.au/afd/taxa/7de6b16a-f854-4b4b-88cf-81868ce74ad8)
5. [Feral Horses (Equus (Equus) caballus)](https://bie.ala.org.au/species/https://biodiversity.org.au/afd/taxa/86b4395d-97a7-470d-b35b-306e01dfbab6)
6. [Feral Pigs (Sus scrofa)](https://bie.ala.org.au/species/https://biodiversity.org.au/afd/taxa/948d8ca1-62fc-4d03-a79d-4be2de837264)
7. [Red Imported Fire Ants (Solenopsis invicta)](https://bie.ala.org.au/species/https://biodiversity.org.au/afd/taxa/5b3fe1c6-1c2c-4e20-819f-2e6d8fa3d332)
I was interested in the spatial and temporal variables for all 7 species, particularly the longitude and latitude values (`decimalLongitude` and `decimalLatitude`) and date of recording (`eventDate`) for each record.
# Questions/Aims
## Initial Aims:
1. Develop an RShiny app that showcases the spatial and temporal (monthly) occurrence of ~~the top 5-10~~ 7 invasive animal species in Australia by state/territory ~~and by regionality (urban vs. regional)~~
2. ~~Visualise and calculate the proportion of the selected invasive animal species observed in an urban area~~
3. Create a download option for users to download the cleaned dataset for the selected invasive animal species for a particular state/territory
## Justifications:
Initially, I was interested in the spatial distribution of magpies in Canberra as I'm an avid cyclist and wanted to find the hotspots of magpie records and linking/statistically model this data with a collaborative magpie swooping database called [magpiealert.com](https://www.magpiealert.com/). Unfortunately, the data was not available and the sampling bias of the occurrence data may not be truly representative, as observations are more likely to be near roads, airports, and major urban areas ([Kellie, 2022](https://labs.ala.org.au/posts/2022-07-22_sample-bias/post.html?panelset=mantids)).
This made me interested in simply visualising the data spatially, instead of using any statistical methodologies and focusing the project on whether there would be major differences/patterns of occurrence records in urban vs. non-urban/regional areas. Unfortunately, given the scope of the project, calculating the urban vs. non-urban analysis would have been too much scope-creep, so this aim was ignored (maybe post-project exploration).
Hughes et al. ([2021](https://onlinelibrary.wiley.com/doi/10.1111/ecog.05926)) and Guerin et al. ([2016](https://doi.org/10.1371/journal.pone.0144779)) highlight the spatial sampling bias associated with biodiversity data suggesting that these biases must first be understood before applying any statistical modelling methods. Interestingly, Australian research by Piccolo et al. ([2020](https://www.nature.com/articles/s41598-020-66719-x)) show that reptile research locations are highly positively correlated with proximity to universities suggesting biodiversity research locations are closely associated by accessibility. Similarly, temporal biases can also influence results, as species distributions can change across time ([Boakes et al, 2010](https://doi.org/10.1371/journal.pbio.1000385)).
The Atlas of Living Australia already have a browser-based spatial visualisation tool, where a specific species can be selected and plotted on a map. However, there seemed to not be a functionality to comparing both the spatial _and_ temporal (by month) variables.
This dashboard is the initial step in data exploration step and in the essence of open and citizen-science, I wanted to make my cleaned data easily available through the dashboard. Data sharing enables for further analysis and reusability ([Tedersoo et al., 2021](https://www.nature.com/articles/s41597-021-00981-0), [Ramachandran et al., 2021](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2020EA001562)). I wanted my dashboard to follow the FAIR Data Principles ([Wilkinson et al., 2016](https://www.nature.com/articles/sdata201618)).
I chose invasive animal species, because of the rabbits I see when cycling in Canberra. The other 6 species were chosen arbitrarily based off this [article](https://www.nespthreatenedspecies.edu.au/news-and-media/media-releases/australia-s-10-worst-invasive-species-study), which summarises the results from Kearney et al.'s paper ([2018](https://www.nespthreatenedspecies.edu.au/media/qile4abf/kearney-et-al-2018_the-threats-to-aus-species_postprint.pdf)).
A deeper understanding of my thought process is documented in GitHub Issue [`#4`](https://github.com/johann-wagner/DS4B-final-project/issues/4).
The analysis was based on this [ALALabs article](https://labs.ala.org.au/posts/2023-05-16_dingoes/post.html) ([Smith et al., 2023](https://labs.ala.org.au/posts/2023-05-16_dingoes/post.html)).
# Raw data
The raw data is available using the `galah` package. A user must use an ALA-registered email at [ala.org.au](ala.org.au) and use the `galah_config()` function. Once completed, the custom-function `load_galah_occurrence_data()` was used, where the input argument `species_name` was the aforementioned scientific names of the seven species.
```{r, message = FALSE, warning = FALSE}
### Purpose:
# To load the occurrence data of any species within the Atlas
# of Living Australia.
suppressPackageStartupMessages({
library(galah)
})
galah_config(email = "johann.wagner@gmail.com")
load_galah_occurrence_data <- function(
# scientific name
species_name
) {
species_occurrence <- galah_call() |>
# To conduct a search for species scientific name
galah_identify(species_name) |>
# Pre-applied filter to ensure quality-assured data
# the "ALA" profile is designed to exclude lower quality records
galah_apply_profile(ALA) |>
atlas_occurrences(mint_doi = TRUE)
return(species_occurrence)
}
# References:
# - [1] https://bie.ala.org.au/species/https://biodiversity.org.au/afd/taxa/692effa3-b719-495f-a86f-ce89e2981652
# - [2] https://galah.ala.org.au/R/reference/galah_apply_profile.html
```
The DOIs for each species:
1. [European Rabbits (Oryctolagus cuniculus)](https://doi.org/10.26197/ala.d0da2524-838a-4540-a160-fba8088f4850)
2. [European Red Foxes (Vulpes vulpes)](https://doi.org/10.26197/ala.5f700174-e92e-4376-a243-be05ffa93f46)
3. [Cane Toads (Rhinella marina)](https://doi.org/10.26197/ala.d6d70c2c-00de-4f76-8ea6-598b9562eb64)
4. [Feral Cats (Felis catus)](https://doi.org/10.26197/ala.57a58cdc-88c0-45c1-9260-bd17c6fc4b62)
5. [Feral Horses (Equus (Equus) caballus)](https://doi.org/10.26197/ala.c3cebb21-c2e0-41d0-8b5d-f446c2914827)
6. [Feral Pigs (Sus scrofa)](https://doi.org/10.26197/ala.952aa727-1915-4c3f-8399-ed940c24bfef)
7. [Red Imported Fire Ants (Solenopsis invicta)](https://doi.org/10.26197/ala.74b5dd29-4287-473d-8391-9184c8c62554)
# Data wrangling
Because the data is already in tidy format, the seven datasets are binded into one `tidied_data.csv`, then filtered with `excluded_data.csv`. The excluded data includes any `NA` values in the `eventDate`, `decimalLongitude`, and `decimalLatitude` columns and also includes data not located within any of the [*State and Territories - 2021*](https://www.abs.gov.au/statistics/standards/australian-statistical-geography-standard-asgs-edition-3/jul2021-jun2026/access-and-downloads/digital-boundary-files) spatial boundaries provided by the Australian Bureau of Statistics (ABS). Further details of this process can be found in the [scripts folder](https://github.com/johann-wagner/DS4B-final-project/tree/main/scripts) and [data_cleaning.Rmd](https://github.com/johann-wagner/DS4B-final-project/blob/main/processed_data/data_cleaning.Rmd).
```{r, message = FALSE, warning = FALSE}
suppressPackageStartupMessages({
library(readr)
library(tidyverse)
})
excluded_data <- read_csv("processed_data/excluded_data.csv")
tidied_data <- read_csv("processed_data/tidied_data.csv")
cleaned_data <- tidied_data %>%
anti_join(
excluded_data,
join_by(recordID)
)
```
# Sanity checks
Let's use `skim()` to do some sanity checks.
```{r, message = FALSE, warning = FALSE}
suppressPackageStartupMessages({
library(skimr)
})
skim(cleaned_data)
```
There are 328,342 records with the same number of unique values in the `recordID` column indicating each record is distinct. There are 8 unique `state` values and 7 `simpleName` values, which makes sense. The latest `eventDate` value is 2023-10-09, which is not in the future.
Let's spatially showcase the data.
```{r, warning = FALSE, message = FALSE, fig.width = 7.5, fig.height = 10}
suppressPackageStartupMessages({
options(timeout = 1000)
devtools::install_github("wfmackey/absmapsdata")
# To easily access the Australian Bureau of Statistics (ABS) spatial structures
### https://github.com/wfmackey/absmapsdata
library(absmapsdata)
library(ggplot2)
})
state2021 %>%
# Only state/territories
filter(!state_code_2021 %in% c("9", "Z")) %>%
ggplot() +
geom_sf(
aes(geometry = geometry)
) +
geom_point(
data = cleaned_data,
aes(
x = decimalLongitude,
y = decimalLatitude
),
alpha = 0.1,
size = 0.7
) +
coord_sf() +
theme_minimal() +
labs(title = "Sanity Checked! All data points are in Australia!")
```
Interestingly, it seems that there are a few observations that are to the east of the NSW border, which is likely an island part of NSW.
# Addressing the questions/aims
## Aim 1: Dashboard and Visualisations
### Folder System
The RShiny app can be found in the GitHub folder [*DS4B-final-project/invasive_species_dashboard*](https://github.com/johann-wagner/DS4B-final-project/tree/main/invasive_species_dashboard). Similar to R Project environments, the RShiny app is all self-contained in this folder. The individual R scripts in the [*DS4B-final-project/scripts*](https://github.com/johann-wagner/DS4B-final-project/tree/main/scripts) folder must be run first to produce and save the dashboard data within the RShiny folder for most recent data.
### The RShiny App
The RShiny folder contains four files:
- *0-00_setup_and_configuration.R:* loads relevant packages and custom-made functions to create the analysis/visualisations/RShiny.
- *dashboard_data.csv:* is the cleaned occurrence data for the seven invasive species, including spatial and temporal variables.
- *server.R* contains the back-end logic/code that takes the two inputs to dynamically create/adjust the visualisations/titles/downloads.
- *ui.R* contains the front-end user-interface design and layout of the app.
### Spatial Visualisation: European Rabbits in ACT
Let's create the spatial visualisation using European Rabbits and ACT as the inputs.
```{r, warning = FALSE, message = FALSE}
library(scales)
species_simple_name <- "European Rabbits"
state_name <- "Australian Capital Territory"
dashboard_data <- read_csv("processed_data/dashboard_data.csv")
spatial_data <- dashboard_data %>%
filter(
simpleName == species_simple_name,
state == state_name
)
capital_cities_data <- tibble::tribble(
~state, ~city, ~lat, ~lon,
"Australian Capital Territory", "Canberra", -35.2809, 149.1300,
) %>%
filter(state == state_name)
state2021 %>%
filter(state_name_2021 == state_name) %>%
ggplot() +
geom_sf(
aes(geometry = geometry),
fill = "#E5E4E2"
) +
geom_point(
data = spatial_data,
aes(
x = decimalLongitude,
y = decimalLatitude,
size = simpleName
),
alpha = 0.6,
colour = "#1b9e77"
) +
geom_point(
data = capital_cities_data,
aes(
x = lon,
y = lat,
shape = city
),
colour = "#d95f02",
size = 4
) +
coord_sf() +
theme_minimal() +
# Ref [1]
labs(
title = bquote(
"There are" ~
bold(.({
spatial_data %>%
nrow() %>%
comma()
})) ~
"records of" ~
bold(.(species_simple_name)) ~
"in" ~
bold(.(state_name))
),
shape = "Capital City",
size = "Invasive Species"
) +
theme(
axis.title = element_blank(),
axis.ticks = element_blank(),
axis.text = element_blank(),
panel.grid = element_blank(),
plot.title = element_text(size = 13),
# Ref [2]: Improve readability
legend.text = element_text(size = 12),
legend.title = element_text(
size = 12,
face = "bold"
),
legend.direction = "vertical",
legend.background = element_rect(
fill = "#E5E4E2",
colour = "#708090"
),
legend.position = "right",
legend.justification = c("right", "bottom"),
legend.box.just = "right",
legend.margin = margin(6, 6, 6, 6),
panel.background = element_rect(fill = "transparent", colour = "transparent")
)
# References:
# 1: https://labs.ala.org.au/posts/2023-05-16_dingoes/post.html
# 2: https://r-graph-gallery.com/239-custom-layout-legend-ggplot2.html
```
Each green point represents one record. We can see that the majority of records are near Canberra with some patterns in Namadgi National Park, like Boboyan Road. Let's visualise some more maps.
### Temporal Visualisation
The aim of creating a circular monthly bar plot of record proportions was to explore potential temporal patterns. This plot suggests two things:
1. The time frequency when citizen scientists submit data and/or when professional/government teams measure/monitor invasive species. Maybe, annual monitoring occurs in May?
2. The actual temporal species distribution. For example, maybe there are more cats in May in SA, because of reproduction behaviour?
Let's create it for Feral Cats in SA (a screenshot was used, due to word count limitations: please see GitHub repo for details)
![Temporal Feral Cats in SA](screenshots/temporal_feral_cats_south_australia.png)
## Aim 2: Regionality
This aim was completely abandoned, because of the scope of the project was getting too big. However, adding the capital city point highlights how majority of the selected species cluster around the capital city. Additionally, a zoom function was added to the Shiny app for users to zoom into specific patterns and areas of interest.
## Aim 3: Data Download
Ensuring the data followed FAIR principles included making the data findable, such as referencing the DOIs on the dashboard. The user can select the two inputs they are interested in and can download the respective csv. The option to download the entire dashboard dataset is also available.
## GitHub Use / Final RShiny App
- Final UI and visualisation design improvements were made using GitHub Issues to manage these efforts.
- GitHub branching was used for each Issue.
- GitHub Releases were used to distinguish major developments of the RShiny App.
Please see the *README.md* file for more information.
![Invasive Animal Species Dashboard (Version 1.0.0)](screenshots/dashboard_photo.png)
# References
Atlas of Living Australia website at http://www.ala.org.au. Accessed 18 October 2023.
Hughes AC, Orr MC, Ma K, Costello MJ, Waller J, Provoost P, Yang Q, Zhu C, Qiao H (2021). Sampling biases shape our view of the natural world. *Ecography*. pp. 1259-1269. <https://doi.org/10.1111/ecog.05926>
Boakes EH, McGowan PJK, Fuller RA, Chang-qing D, Clark NE, O'Connor K, et al. (2010) Distorted Views of Biodiversity: Spatial and Temporal Bias in Species Occurrence Data. *PLoS Biol* 8(6): e1000385. https://doi.org/10.1371/journal.pbio.1000385
Guerin GR, Biffin E, Baruch Z, Lowe AJ (2016). Identifying Centres of Plant Biodiversity in South Australia. *PLoS ONE* 11(1): e0144779. https://doi.org/10.1371/journal.pone.0144779
Kearney SG, Carwardine J, Reside AE et al. (2018). The threats to Australia’s imperilled species and implications for a national conservation response. *CSIRO*. <https://www.nespthreatenedspecies.edu.au/media/qile4abf/kearney-et-al-2018_the-threats-to-aus-species_postprint.pdf>.
Kellie D (2022). Quantify geographic sampling bias with sampbias. *ALA Labs*. <https://labs.ala.org.au/posts/2022-07-22_sample-bias/post.html?panelset=mantids>
Ramachandran R, Bugbee K, & Murphy K (2021). From open data to open science. *Earth and Space Science*, 8, e2020EA001562. https://doi.org/10.1029/2020EA001562
Smith A, Torresan O, Kellie D (2023). An exploration of dingo observations in the ALA. *ALA Labs*. <https://labs.ala.org.au/posts/2023-05-16_dingoes/post.html>.
Tedersoo L, Küngas R, Oras E et al. Data sharing practices and data availability upon request differ across scientific disciplines. *Sci Data* 8, 192 (2021). https://doi.org/10.1038/s41597-021-00981-0
Piccolo RL, Warnken J, Chauvenet ALM. et al. Location biases in ecological research on Australian terrestrial reptiles. *Sci Rep* 10, 9691 (2020). https://doi.org/10.1038/s41598-020-66719-x
Wilkinson M, Dumontier M, Aalbersberg I et al. The FAIR Guiding Principles for scientific data management and stewardship. *Sci Data* 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18