vignettes/ex2revdbayes.Rmd

---
title: "EVA 2021 revdbayes exercises 2"
subtitle: "EV inference using revdbayes"
output: html_notebook
---

## Getting started

* Install [RStudio Desktop](https://www.rstudio.com/products/rstudio/download/#download) (if you do not already have it) and open it
* Save [ex2revdbayes.Rmd](https://raw.githubusercontent.com/paulnorthrop/revdbayes/master/vignettes/ex2revdbayes.Rmd) to your computer (there is also a link to it above)
* Open `ex2revdbayes.Rmd` in Rstudio
* The R code is arranged in *chunks* highlighted in grey
* You can run the code in a chunk by clicking on the **green triangle** on the top right of the chunk
* There is information about Exercises 2 (and Exercises 1) in the [EVA2021 revdbayes slides](https://paulnorthrop.github.io/revdbayes/articles/EVA2021revdbayes.html)

## Aims

* Compare ROU approaches for sampling from a GP posterior
* Perform posterior predictive checking and inference
* Consider how to sample efficiently from a PP posterior 
* Compare `revdbayes` and `evdbayes` for sampling from a GEV posterior

There is further information in the [Introduction to revdbayes](https://paulnorthrop.github.io/revdbayes/articles/revdbayes-a-vignette.html) vignette.

```{r packages, message=FALSE}
# We need the evdbayes, revdbayes, ggplot2, bayesplot and coda packages
pkg <- c("evdbayes", "revdbayes", "ggplot2", "bayesplot", "coda")
pkg_list <- pkg[!pkg %in% installed.packages()[, "Package"]]
install.packages(pkg_list)
# Load all packages
invisible(lapply(pkg, library, character.only = TRUE))
```

```{r setup}
# Simulation sample size
n <- 10000
```

## GP model for threshold excesses

### GP priors

```{r setPriors}
# Information about the set_prior() function
# You can choose from a set of options, or create your own
?set_prior
# Set the threshold
u <- quantile(gom, probs = 0.65)
# Set an 'uninformative' prior for the GP parameters
fp <- set_prior(prior = "flat", model = "gp", min_xi = -1)
```

### GP posteriors

```{r posteriorSampling}
# Information about the rpost function
?rpost
# Information about the gom (significant wave heights) data
?gom
# Sample on (sigma_u, xi) scale
gp1 <- rpost(n = n, model = "gp", prior = fp, thresh = u, data = gom,
             rotate = FALSE)
# Rotation
gp2 <- rpost(n = n, model = "gp", prior = fp, thresh = u, data = gom)
# Box-Cox transformation (after transformation to positivity)
gp3 <- rpost(n = n, model = "gp", prior = fp, thresh = u, data = gom,
             rotate = FALSE, trans = "BC")
# Box-Cox transformation and then rotation
gp4 <- rpost(n = n, model = "gp", prior = fp, thresh = u, data = gom,
             trans = "BC")
```

### Comparing approaches

```{r GPplots}
# Samples and posterior density contours
plot(gp1, main = paste("mode relocation only, pa = ", round(gp1$pa, 3)))
plot(gp2, ru_scale = TRUE, main = paste("rotation, pa = ", round(gp2$pa, 3)))
plot(gp3, ru_scale = TRUE, main = paste("Box-Cox, pa = ", round(gp3$pa, 3)))
plot(gp4, ru_scale = TRUE, main = paste("Box-Cox and rotation, pa = ",
                                        round(gp4$pa, 3)))
```

``rpost_rcpp()`` is faster than ``rpost()``.  See the [rusting faster](https://cran.r-project.org/web/packages/revdbayes/vignettes/revdbayes-b-using-rcpp-vignette.html) vignette.

```{r faster}
gp2 <- rpost_rcpp(n = n, model = "gp", prior = fp, thresh = u, data = gom)
```

Suggestion: increase the threshold and see how the appearance of the posterior changes.

## Posterior predictive checking

There is further information in the [Posterior predictive EV inference](https://paulnorthrop.github.io/revdbayes/articles/revdbayes-c-predictive-vignette.html) vignette.

```{r GPcheck}
# nrep = 50 asks for 50 fake replicates of the original data
# For each of 50 posterior samples of the parameters a dataset is simulated 
gpg <- rpost(n = n, model = "gp", prior = fp, thresh = u, data = gom, 
             nrep = 50)
# Information about pp_check.evpost
?pp_check.evpost
# Compare real and fake datasets
pp_check(gpg, type = "multiple", subtype = "dens") + ggtitle("GP kernel density estimates")
# Compare real and fake summary statistics
pp_check(gpg, stat = "max")
```

* Would you be able to spot the real dataset or summary statistic if it was not highlighted?
* Option: you could play with the arguments 
    - `type` and/or `subtype` for the first plot
    - `stat` for the second plot

## Posterior predictive EV inference

### Binomial-GP model for threshold exceedances

Modelling threshold excesses is only part of the story.  We also need to model the proportion of observations that exceed the threshold.

```{r BinGP}
bp <- set_bin_prior(prior = "jeffreys")
# We need to provide the mean number of observations per year 
# The data cover a period of 105 years
npy_gom <- length(gom)/105
bgpg <- rpost(n = 1000, model = "bingp", prior = fp, thresh = u, data = gom,
              bin_prior = bp, npy = npy_gom, nrep = 50)
```              

We can make predictive inferences about the largest value $M_N$ to be observed over a time horizon of $N$ years.

```{r predBinGP}
# Information about pp_check.evpost
?predict.evpost
# Predictive density of the largest value in `n_years' years
plot(predict(bgpg, type = "d", n_years = 200))
# Predictive intervals (equi-tailed and shortest possible)
i_bgpp <- predict(bgpg, n_years = 200, level = c(95, 99), hpd = TRUE)
plot(i_bgpp, which_int = "both")
i_bgpp$short 
```

```{r}
# The predictive 100, 200 and 500 year return levels 
predict(bgpg, type = "q", n_years = 1, x = c(0.99, 0.995, 0.998))$y
```
See [Northrop and Attalides (2017)](https://doi.org/10.1111/rssc.12159) for an analysis of these data using an informative prior.

## PP model for threshold exceedances

We compare three ways to sample from a PP posterior distribution

1. Parameterise in terms of annual maxima
2. [Wadsworth et al. (2010)](https://doi.org/10.1214/10-AOAS333): parameterise to make $\mu$ orthogonal to $(\sigma, \xi)$ 
3. Empirical rotation to reduce association

```{r PPsetup}
# Information about the rainfall data
?revdbayes::rainfall
# Set a threshold and use the prior from the evdbayes user guide
rthresh <- 40
prrain <- evdbayes::prior.quant(shape = c(38.9,7.1,47), scale = c(1.5,6.3,2.6))
```

```{r PPsampling}
# 1. Number of blocks = number of years of data (54)
r1 <- rpost(n = n, model = "pp", prior = prrain, data = rainfall,
            thresh = rthresh, noy = 54, rotate = FALSE)
plot(r1)
# 2. Number of blocks = number of threshold excesses (use_noy = FALSE)
n_exc <- sum(rainfall > rthresh, na.rm = TRUE)
r2 <- rpost(n = n, model = "pp", prior = prrain, data = rainfall, 
            thresh = rthresh, noy = 54, use_noy = FALSE, rotate = FALSE)
plot(r2, ru_scale = TRUE)
# 3. # Rotation about maximum a posteriori estimate (MAP)
r3 <- rpost(n = n, model = "pp", prior = prrain, data = rainfall, 
            thresh = rthresh, noy = 54)
plot(r3, ru_scale = TRUE)
c(r1$pa, r2$pa, r3$pa)
```

* Based on the plots, which approach do you expect to have the largest probability of acceptance?
* Is this what you see in the values of `r1$pa`, `r2$pa` and `r3$pa`?

## GEV model for block maxima

```{r GEVprior}
# Information about the portpirie data
?revdbayes::portpirie
# Use the prior from the evdbayes user guide
mat <- diag(c(10000, 10000, 100))         
pn <- set_prior(prior = "norm", model = "gev", mean = c(0, 0, 0), cov = mat)
```

### revdbayes

```{r revdbayes}
mat <- diag(c(10000, 10000, 100))
gevp  <- rpost_rcpp(n = n, model = "gev", prior = pn, data = portpirie)
# Information about plot.evpost
?plot.evpost
# Can use the plots from the bayesplot package
plot(gevp, use_bayesplot = TRUE, fun_name = "dens")
plot(gevp, use_bayesplot = TRUE, pars = "xi", prob = 0.95)
```

### evdbayes

```{r evdbayes}
# evdbayes has a function ar.choice to help set the proposal SDs
# It does this by searching for values that achieve approximately
# target values for the acceptance rates (default 0.4 for all parameters)
?ar.choice
# Initialise evdbayes' Markov chain at the estimated posterior mean
init <- colMeans(gevp$sim_vals)
prop.sd.auto <- ar.choice(init = init, prior = pn, lh = "gev", 
                          data = portpirie, psd = rep(0.01, 3), 
                          tol = rep(0.02, 3))$psd
post <- posterior(n, init = init, prior = pn, lh = "gev", 
                  data = portpirie, psd = prop.sd.auto)
for (i in 1:3) plot(post[, i], ylab = c("mu","sigma","xi")[i]) 
```

```{r coda}
post_for_coda <- coda::mcmc(post)
# Assuming no burn-in period
burnin <- 0                 
post_for_coda <- window(post_for_coda, start = burnin + 1)
# Trace plots and KDEs
plot(post_for_coda)
# The sampled values are autocorrelated
coda::acfplot(post_for_coda)
# Effective sample sizes (10,000 for revdbayes)
coda::effectiveSize(post_for_coda)
```

* In practice one would run multiple chains and use the Gelman-Rubin convergence diagnostic, e.g. `gelman.diag` in the `coda` package. 
* The [Introduction to revdbayes](https://paulnorthrop.github.io/revdbayes/articles/revdbayes-a-vignette.html) shows that the posterior samples from `evdbayes` and `revdbayes` are in agreement.