05-stan-timeseries.Rmd

---
title: "Fitting a timeseries model with Stan"
author: "Sean Anderson"
output:
  html_document:
    toc: true
    toc_float: true
---

# Goals

- Gain exposure to programming a (timeseries) model in Stan that isn't a simple
  linear regression.
- Practice interacting with Stan.
- Practice drawing inference from the posterior samples on a parameter.
- Practice comparing a parameter posterior to its prior distribution to assess
  the information in the data.

# Package loading and option setting

We'll start by loading packages and setting some options:

```{r, echo=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 6,
  fig.asp = 0.618,
  fig.align = "center"
)
show_file = function(file) {
  cat(paste(readLines(file), collapse = "\n"))
}
```

```{r, warning=FALSE, message=FALSE}
library(tidyverse)
theme_set(theme_light())
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
```

# The data

We are going to work with the time series of population counts for great herons in the UK. This is population ID 20579 from the Global Population Dynamics Database.

Data from Brooks, B., Traill, L. W., Bradshaw, C. J. A. 2006 Minimum viable
population sizes and global extinction risk are unrelated. Ecology Letters, 9
(4), 375-382.

Let's read in the data:

```{r}
d <- read.csv("data/grey-heron.csv") %>% as_tibble() %>% 
  select(sample_year, population_untransformed) %>% 
  rename(year = sample_year, abund_index = population_untransformed)
ggplot(d, aes(year, log(abund_index))) +
  geom_point() + geom_line()
```

The large dips in abundance you see are a result of cold snowy winters when the herons couldn't find food.

# The model

Let's fit a Gompertz model with heavy-tailed process error to a time series of grey heron counts from England. We previously fit this model to this data set in: 

Anderson, S.C., Branch, T.A., Cooper, A.B., and Dulvy, N.K. 2017. Black-swan events in animal populations. Proc. Natl. Acad. Sci. 114(12): 3252–3257. <https://doi.org/10.1073/pnas.1611525114>.

You can see the basic model we will work with in the file:

```{r}
show_file("stan/gompertz.stan")
```

Take a look at the model and make sure you understand it. This is fundamentally a timeseries model. How is it different from the other models we have worked with so far?

We can fit that model with Stan:

```{r, message=FALSE, warning=FALSE, results='hide'}
m <- stan("stan/gompertz.stan",
  data = list(N = nrow(d), y = log(d$abund_index), nu_rate = 0.01),
  iter = 2000)
```

Let's look at the parameters from our model:

```{r}
pars <- c("lambda", "b", "sigma", "nu")
print(m, pars = pars)
bayesplot::mcmc_trace(as.array(m), pars = pars)
bayesplot::mcmc_hist(as.array(m), pars = pars)
```

```{r}
post <- rstan::extract(m)
N <- 20
bayesplot::ppc_dens_overlay(y = log(d$abund_index), post$pred[1:N,])
bayesplot::ppc_ecdf_overlay(y = log(d$abund_index), post$pred[1:N,])
```

```{r}
bayesplot::ppc_intervals(y = log(d$abund_index), post$pred)
bayesplot::ppc_ribbon(y = log(d$abund_index), post$pred, prob = 0.5)
bayesplot::ppc_ribbon(y = log(d$abund_index), post$pred, prob = 0.9)
bayesplot::ppc_ribbon(y = d$abund_index, exp(post$pred), prob = 0.9)
```

Let's extract the posterior samples with `rstan::extract()`. Part of the Stan code calculated predictions from our model. Alternatively, we could calculate them in R ourselves.

As a comparison, we could fit the model with normally distributed process error. We will skip that here in the interest of time. 

# Fitting the model to a timeseries without extremes

Let's try fitting our model to a second data set where there aren't extremes. This time we will use a data set of population counts for house wrens in the mid 20th century in Illinois, USA. GPDD population ID 28.

Data reference: Kendeigh, S.C. 1982 Bird populations in east central Illinois:
fluctuations, variation and development over a half-century. Illinois
Biological Monographs, 52:1-136.

```{r}
d_hw <- read.csv("data/house-wren.csv") %>% as_tibble() %>% 
  select(sample_year, population_untransformed) %>% 
  rename(year = sample_year, abund_index = population_untransformed)
ggplot(d_hw, aes(year, log(abund_index))) +
  geom_point() + geom_line()
```

```{r, message=FALSE, warning=FALSE}
m_hw <- stan("stan/gompertz.stan", 
  data = list(N = nrow(d_hw), y = log(d_hw$abund_index), nu_rate = 0.01))
```

```{r}
print(m_hw, pars = pars)
```

So in this case our estimate of nu (the degrees of freedom parameter) is much higher. In fact, it approximately matches the prior because the data are not informative about heavy tails. In this case it reverts to be effectively normal:

```{r}
prior <- rexp(1e6, rate = 0.01)
prior <- prior[prior > 2]
hist(prior)
median(prior)
abline(v = median(prior), col = "red", lwd = 2)
```

This choice of prior is based on Fernandez C, Steel MFJ (1998) On Bayesian modeling of fat tails and skewness. J. Am. Stat. Assoc. 93(441):359–371.

Since our models are fit in a Bayesian framework, one useful value to calculate might be the probability density below some threshold value for the degrees of freedom parameter. For example:

```{r}
nu <- rstan::extract(m, pars = "nu")[[1]]
nu_hw <- rstan::extract(m_hw, pars = "nu")[[1]]

sum(nu < 10)/length(nu) # or mean(nu < 10)
sum(nu_hw < 10)/length(nu) 
```

We've often used a value of 10 as a threshold since nu values much above 10 render distributions that are almost indistinguishable from the normal. 

In the first case nu is less than 10 with `r mean(nu < 10)` probability. In the second case, there is only about a `r mean(nu_hw < 10)` probability of nu < 10. Remember that even with the prior, there is some probability that nu < 10.

```{r}
mean(prior < 10)
```

For many other possible heavy-tailed Stan population models, see <https://github.com/seananderson/heavy-tails/blob/master/analysis/1-compile-models.R>

```{r}
prior <- rexp(1e6, rate = 0.01)
prior <- prior[prior > 2]
g1 <- ggplot(tibble(prior), aes(prior)) + geom_density() + 
  ggtitle("original prior") +
  coord_cartesian(xlim = c(0, 100))

prior <- rexp(1e6, rate = 0.005)
prior <- prior[prior > 2]
g2 <- ggplot(tibble(prior), aes(prior)) + geom_density() + 
  ggtitle("weaker prior") +
  coord_cartesian(xlim = c(0, 100))

prior <- rexp(1e6, rate = 0.05)
prior <- prior[prior > 2]
g3 <- ggplot(tibble(prior), aes(prior)) + geom_density() + 
  ggtitle("stronger prior") +
  coord_cartesian(xlim = c(0, 100))

cowplot::plot_grid(g1, g2, g3)
```

```{r, message=FALSE, warning=FALSE}
m_weaker_prior <- stan("stan/gompertz.stan",
  data = list(N = nrow(d), y = log(d$abund_index), nu_rate = 0.005))
m_stronger_prior <- stan("stan/gompertz.stan",
  data = list(N = nrow(d), y = log(d$abund_index), nu_rate = 0.05))
```

```{r}
nu <- rstan::extract(m, pars = "nu")[[1]]
mean(nu < 10)
nu_weaker_prior <- rstan::extract(m_weaker_prior, pars = "nu")[[1]]
mean(nu_weaker_prior < 10)
nu_stronger_prior <- rstan::extract(m_stronger_prior, pars = "nu")[[1]]
mean(nu_stronger_prior < 10)
```

So in this case, it doesn't really matter among this range of priors. The heron data contain a lot of information about the `nu` parameter. If we repeated this with the wren data, we would see that the posterior of `nu` would closely resemble the prior each time. In this case, there isn't information in the data that supports low values of `nu`.