/
bsims01-intro.Rmd
147 lines (116 loc) · 10.9 KB
/
bsims01-intro.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
---
title: "Introduction to the bSims package"
author: "Peter Solymos"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Introduction to the bSims package}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup,include=FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
par(mar = c(1, 1, 1, 1))
set.seed(429)
suppressPackageStartupMessages(library(bSims))
```
# Introduction
The **bSims** R package is a _highly scientific_ and _utterly addictive_ bird point count simulator. Highly scientific, because it implements a spatially
explicit mechanistic simulation that is based on statistical models
widely used in bird point count analysis (i.e. removal models, distance
sampling), and utterly addictive because the implementation
is designed to allow rapid interactive exploration (via **shiny** apps)
and efficient simulation (supporting various parallel backends),
thus elevating the user experience.
The goals of the package are to:
1. allow easy _testing of statistical assumptions_ and explore
effects of violating these assumptions,
2. _aid survey design_ by comparing different options,
3. and most importantly, to _have fun_ while doing it via an
intuitive and interactive user interface.
The simulation interface was designed with the following principles in mind:
1. _isolation_: the spatial scale is small (local point count scale)
so that we can treat individual landscapes as more or less homogeneous
units (but see below how certain stratified designs and edge effects
can be incorporated) and independent in space and time;
2. _realism_: the implementation of biological
mechanisms and observation processes are realistic,
defaults are chosen to reflect common practice and assumptions;
3. _efficiency_: implementation is computationally efficient
utilizing parallel computing backends when available;
4. _extensibility_: the package functionality is
well documented and easily extensible.
This documents outlines the major functionality of the package.
First we describe the motivation for the simulation and
the details of the layers. Then we outline an interactive workflow
to design simulation studies and describe how to run
efficient simulation experiments.
Finally we present some of the current limitations of the
framework and how to extend the existing functionality of the
package to incorporate more of the biological realism into
the simulations.
# Motivation
Point-count surveys are one of the most widely used survey techniques for birds. This method involves an observer standing at a location and recording all the birds that are detected during a set amount of time within a fixed or unlimited distance away from the observer. The data collected this way are often used in trend monitoring, assessing landscape and climate change effects on bird populations, and setting population goals for conservation.
Point counts provide an index of the true abundance at the survey location due to imperfect detection. A plethora of design and model based solutions exist to minimize the impacts and account for the biases due to imperfect detection. Contrary to the importance of point counts among other field methods for landbirds, there is no generally applicable simulation tool that would help better understand the possible biases and would help in aiding survey design.
Currently available simulation tools are either concerned by movement trajectories of bird flocks to mitigate mortality near airports and wind farms or are very specific to statistical models. Statistical techniques are expected to provide unbiased estimates when the data generation process follows the assumptions of the model. However, testing if the model assumptions are realistic is rarely evaluated with the same rigor.
The apparent lack of general purpose simulation tools for point counts can probably be attributed to the fact that reality is very complex in these situations and it is not immediately straightforward how to best tackle this complexity. To illustrate this claim, here is how ecological modellers approach bird density. Counts (Y) are described by the marginal distribution, e.g. Y ~ Poisson(DApq) where D is density (abundance per unit area), A is the survey area, p is the probability of individuals being available for sampling, whereas q is the detection probability given availability. Such a model is also viewed as a mixture distribution: N ~ Poisson(DA), Y ~ Binomial(N, pq). This gives a straightforward algorithm for generating counts under known D, A, p, q parameters, that can in turn be used to test statistical performance (unbiasedness, consistency) of N-mixture, time-removal, and distance sampling models among others.
Reality, however, is often more complicated. Take for example roadside counts, which served as the main motivations for developing the simulation approaches presented in here. Bird surveys along roads are widespread due to logistical, safety, and cost considerations. This is the case, even though differences between roadside and off-road counts are well documented.These differences indicate a roadside bias due to, e.g., density, behavior and detectability being different depending on the distance from the road. Understanding the nature and sources of roadside count bias might not lend itself to simple simulations based on the marginal distributions. Understanding roadside counts requires a spatially explicit and more mechanistic simulation approach.
The bSims R package presents a spatially explicit mechanistic simulation framework. Its design is informed by statistical models widely used in the analyses of bird point count data (i.e. removal models, distance sampling). The implementation allows real time interactive exploration via Shiny apps followed by efficient simulations supporting various parallel backends.
The goals of the package are to (1) allow easy testing of statistical assumptions and explore the effects of violating these assumptions; to (2) aid survey design by comparing different options; and (3) to have fun while doing this.
In this paper, we demonstrate the main functionality of the package, then outline the interactive workflow to design and efficiently run simulation experiments, finally we present future directions to extend the existing functionality of the package by incorporating more biological realism into the simulations.
# Design and implementation
The simulation interface was designed with the following principles in mind:
- Isolation: the spatial scale is small (local point count scale) so that we can treat individual landscapes as more or less homogeneous units (but see below how certain stratified designs and edge effects can be incorporated) and independent in space and time;
- Realism: the implementation of biological mechanisms and observation processes are realistic, defaults are chosen to reflect common practice and assumptions;
- Efficiency: implementation is computationally efficient utilizing parallel computing backends when available;
- Extensibility: the package functionality is well documented and easily extensible.
Going back to the Poisson–Binomial example, N would be a result of all the factors influencing bird abundance, such as geographic location, season, habitat suitability, number of conspecifics, competitors, or predators. Y, however, would largely depend on how the birds behave depending on the time of the day, or how the observer might detect or miss the different individuals, or count the same individual twice, etc.
This series of conditional filtering of events lends itself to be represented as layers. These simulation layers are conditionally independent of each other. This design can facilitate the comparison of certain settings while keeping all the underlying layers of the realizations identical. This can help pinpointing the effects without the extra variability introduced by all the other underlying layers.
The bSims package implements the following main 'verb' functions for simulating the conditionally independent layers:
- Initialize (`bsims_init`): the landscape is defined by the extent and possible habitat stratification;
- Populate (`bsims_populate`): the population of finite number of individuals within the extent of the landscape;
- Animate (`bsims_animate`): individual behaviors described by movement and vocalization events, i.e. the frequency of sending various types of signals;
- Detect (`bsims_detect`): the physical side of the observation process, i.e. transmitting and receiving the signal;
- Transcribe (`bsims_transcribe`): the "human" aspect of the observation process, i.e. the perception of the received signal.
The `bsims_` part of the function helps finding the functions via autocomplete functionality of the developer environment, such as RStudio VSCode. In the next sections we will review the main options available for each of the simulation layers (see Appendix for worked examples and reproducible code).
# Limitations
The package is not equipped with all the possible ways to estimate the
model parameters. It only has rudimentary removal modeling and
distance sampling functionality implemented for the interactive
visualization and testing purposes.
Estimating parameters for more complex situations (i.e.
finite mixture removal models, or via Hazard rate distance functions)
or estimating abundance via multiple-visit N-mixture models etc. is
outside of the scope of the package and it is the responsibility of the
user to make sure those work as expected.
Other intentional limitation of the package is the lack of
reverse interactions between the layers. For example
the presence of an observer could influence behavior
of the birds close to the observer. Such features can
be implemented as methods extending the current functionality.
Another limitation is that this implementation considers single
species. Observers rarely collect data on single species
but rather count multiple species as part of the same survey.
The commonness of the species, observer ability, etc. can influence
the observation process when the whole community is considered.
Such scenarios are also not considered at present. Although the same
landscape can be reused for multiple species, and building up
the simulation that way.
The package considers simulations as independent
in space and time. When larger landscapes need to be simulated,
there might be several options:
(1) simulate a larger extent and put multiple independent observers
into the landscape; or (2) simulate independent landscapes in isolation.
The latter approach can also address spatial and temporal
heterogeneity in density, behaviour, etc. E.g. if
singing rate is changing as a function of time of day,
one can define the `vocal_rate` values as a function of time,
and simulate independent animation layers.
When the density varies in space, one can
simulate independent population layers.
These limitations can be addressed as additional methods and
modules extending the capabilities of the package,
or as added functionality to the core layer functions
in future releases.