/
lax-vignette.Rmd
254 lines (194 loc) · 10.7 KB
/
lax-vignette.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
---
title: "An overview of lax"
author: "Paul Northrop"
date: "`r Sys.Date()`"
output:
rmarkdown::html_vignette:
toc: true
toc_depth: 4
vignette: >
%\VignetteIndexEntry{An overview of lax}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
bibliography: lax.bib
csl: taylor-and-francis-chicago-author-date.csl
---
```{r, include = FALSE}
knitr::opts_chunk$set(comment = "#>", collapse = TRUE)
```
The [CRAN Task View on Extreme Value Analysis](https://CRAN.R-project.org/view=ExtremeValue) provides information about R packages that perform various extreme value analyses. The *lax* package supplements the functionality of nine of these packages, namely
[eva](https://cran.r-project.org/package=evd) [@eva],
[evd](https://cran.r-project.org/package=evd) [@evd],
[evir](https://cran.r-project.org/package=evir) [@evir],
[extRemes](https://cran.r-project.org/package=extRemes) [@extRemes],
[fExtremes](https://cran.r-project.org/package=fExtremes) [@fExtremes],
[ismev](https://cran.r-project.org/package=ismev) [@ismev],
[mev](https://cran.r-project.org/package=mev) [@mev],
[POT](https://cran.r-project.org/package=POT) [@POT] and
[texmex](https://cran.r-project.org/package=texmex) [@texmex].
Univariate extreme value models, including regression models, are supported.
The [chandwich](https://cran.r-project.org/package=chandwich) package is used to provide robust sandwich estimation of parameter covariance matrix and loglikelihood adjustment for models fitted by maximum likelihood estimation. This may be useful for cluster correlated data when interest lies in the parameters of the marginal distributions, or for performing inferences that are robust to certain types of model misspecification.
*lax* works in an object-oriented way, operating on R objects returned from functions in other packages that summarise the fit of an extreme value model. Loglikelihood adjustment and sandwich estimation is performed by an `alogLik` S3 method. We demonstrate this using the following running example, which involves fitting a Generalised Extreme Value (GEV) regression model. Firstly, we use the *ismev* package to produce the fitted GEV regression model object on which `alogLik` will operate, and illustrate what can be done with the objects returned from `alogLik`. We use the *ismev* package because it provides an example of a case where we need to refit the model in order to obtain the information that we need to perform adjusted inferences. Then we repeat the adjustment using seven of the other eight packages. The *POT* package specialises in Generalised Pareto (GP) modelling, for which we use a different example.
## GEV regression of annual maximum temperatures
This example is based on the analysis presented in Section 5.2 of @CB2007. The data, which are available in the data frame `ow`, are a bivariate time series of annual maximum temperatures, recorded in degrees Fahrenheit, at Oxford and Worthing in England, for the period 1901 to 1980. If interest is only in the marginal distributions of high temperatures in Oxford and Worthing, then we might fit a GEV regression model in which some or all of the parameters may vary between Oxford and Worthing. However, we should adjust for the cluster dependence between temperatures recorded during the same year.
## ismev
The `gev.fit()` function in *ismev* fits GEV regression models. It allows covariate effects in any of the (location, scale and shape) parameters of the GEV distribution. However, an object returned from `gev.fit()` does not provide all the information about a fitted regression model that `alogLik` needs, in order to calculate loglikelihood contributions from individual observations: the design matrices are missing. Therefore, *lax* provides the function `gev_refit`, which is a version of `gev.fit` that adds this information.
The following code fits a GEV regression model in which the location, scale and shape parameters of the GEV distribution vary between Oxford and Worthing. Then `alogLik` is used to provide adjusted standard errors and an adjusted loglikelihood.
```{r}
library(lax)
# Column 4 of ow contains 1 for Oxford and -1 for Worthing
large <- gev_refit(ow$temp, ow, mul = 4, sigl = 4, shl = 4, show = FALSE,
method = "BFGS")
# Adjust the loglikelihood and standard errors
adj_large <- alogLik(large, cluster = ow$year, cadjust = FALSE)
# MLEs, SEs and adjusted SEs
t(summary(adj_large))
```
This reproduces the values in rows 1, 3 and 4 in Table 2 of @CB2007. The estimation of the 'meat' of the sandwich adjustment is performed using the [sandwich](https://cran.r-project.org/package=sandwich) package. In this example, we need to pass `cadjust = FALSE` to ``sandwich::meatCL`` in order that the adjustment is the same as that used in @CB2007. Otherwise, `meatCL` makes a finite-cluster bias correction.
### Confidence intervals
A `confint` method calculates approximate (95\%) confidence intervals for the parameters, based on the adjusted loglikelihood. @CB2007 consider three types of loglikelihood adjustment: one vertical and two horizontal. The type of adjustment is selected by the argument `type`. The default is `type = "vertical"` and there is an option to perform no adjustment.
```{r}
confint(adj_large)
confint(adj_large, type = "none")
```
### Confidence regions
The `conf_region` function in the *chandwich* package can be used to produce confidence regions for pairs of parameters. Here, we consider the 'central' (midway between Oxford and Worthing) scale and shape intercept parameters: $\sigma_0$ and $\xi_0$ in @CB2007.
```{r, fig.align='center', fig.width=7, fig.height=7}
library(chandwich)
which_pars <- c("scale", "shape")
gev_none <- conf_region(adj_large, which_pars = which_pars, type = "none")
gev_vertical <- conf_region(adj_large, which_pars = which_pars)
plot(gev_none, gev_vertical, lwd = 2, xlim = c(3.1, 4.5), ylim = c(-0.35, -0.05),
xlab = expression(sigma[0]), ylab = expression(xi[0]))
```
### Comparing nested models
`alogLik` also has an `anova` method, which can be used to perform (adjusted) loglikelihood ratio tests of nested models. To illustrate this we fit, and adjust, a smaller model, in which Oxford and Worthing have a common GEV shape parameter, and then compare this model to the larger one.
```{r}
small <- gev_refit(ow$temp, ow, mul = 4, sigl = 4, show = FALSE,
method = "BFGS")
adj_small <- alogLik(small, cluster = ow$year, cadjust = FALSE)
summary(adj_small)
anova(adj_large, adj_small)
anova(adj_large, adj_small, type = "none")
```
We see that the adjustment of the loglikelihood for clustering makes enough of a difference to matter: if we perform a test at the 5\% significance level then we choose the larger model when we adjust but the smaller model if we do not.
```{r, echo = FALSE}
got_texmex <- requireNamespace("texmex", quietly = TRUE)
# 25/02/2024: avoid ERROR in texmex
got_texmex <- FALSE
```
## texmex
```{r, eval = got_texmex}
# This code is not run, to avoid an error from evm(), so there are no results
library(texmex, quietly = TRUE)
# Note: phi = log(scale)
evm_fit <- evm(temp, ow, gev, mu = ~ loc, phi = ~ loc, xi = ~loc)
adj_evm_fit <- alogLik(evm_fit, cluster = ow$year)
summary(adj_evm_fit)
```
```{r, echo = FALSE, eval = got_texmex}
detach("package:texmex")
```
## evd
```{r, echo = FALSE}
got_evd <- requireNamespace("evd", quietly = TRUE)
```
The `fgev()` function in *evd* fits GEV regression models, but it only allows covariate effects in the location parameter.
```{r, eval = got_evd}
library(evd, quietly = TRUE)
fgev_fit <- fgev(ow$temp, nsloc = ow[, "loc"])
adj_fgev_fit <- alogLik(fgev_fit, cluster = ow$year)
summary(adj_fgev_fit)
```
```{r, echo = FALSE, eval = got_evd}
detach("package:evd")
```
## extRemes
```{r, echo = FALSE, message = FALSE, warning = FALSE}
got_extRemes <- requireNamespace("extRemes", quietly = TRUE)
```
```{r, eval = got_extRemes, message = FALSE, warning = FALSE}
library(extRemes, quietly = TRUE)
fevd_fit <- fevd(temp, ow, location.fun = ~ ow$loc, scale.fun = ~ ow$loc,
shape.fun = ~ ow$loc)
adj_fevd_fit <- alogLik(fevd_fit, cluster = ow$year)
summary(adj_fevd_fit)
```
```{r, echo = FALSE, eval = got_extRemes}
detach("package:extRemes")
```
## eva
```{r, echo = FALSE, message = FALSE}
got_eva <- requireNamespace("eva", quietly = TRUE)
```
```{r, eval = got_extRemes, message = FALSE, warning = FALSE}
library(eva, quietly = TRUE)
gevr_fit <- gevrFit(ow$temp, information = "observed",
locvars = ow, locform = ~ ow$loc,
scalevars = ow, scaleform = ~ ow$loc,
shapevars = ow, shapeform = ~ ow$loc)
adj_gevr_fit <- alogLik(gevr_fit, cluster = ow$year)
summary(adj_gevr_fit)
```
```{r, echo = FALSE, eval = got_eva}
detach("package:eva")
```
## evir
```{r, echo = FALSE}
got_evir <- requireNamespace("evir", quietly = TRUE)
```
The `gev()` function in *evir* only fits stationary GEV models.
```{r, eval = got_evir, message = FALSE}
library(evir, quietly = TRUE)
gev_fit <- gev(ow$temp)
adj_gev_fit <- alogLik(gev_fit)
summary(adj_gev_fit)
```
```{r, echo = FALSE, eval = got_evir}
detach("package:evir")
```
## fExtremes
```{r, echo = FALSE}
got_fExtremes <- requireNamespace("fExtremes", quietly = TRUE)
```
The `gevFit()` function in *fExtremes* only fits stationary GEV models.
```{r, eval = got_fExtremes}
library(fExtremes, quietly = TRUE)
gevFit_fit <- gevFit(ow$temp)
adj_gevFit_fit <- alogLik(gevFit_fit)
summary(adj_gevFit_fit)
```
```{r, echo = FALSE, eval = got_fExtremes}
detach("package:fExtremes")
```
## mev
```{r, echo = FALSE, message = FALSE}
got_mev <- requireNamespace("mev", quietly = TRUE)
```
The `fit.gev()` function in *mev* only fits stationary GEV models.
```{r, eval = got_mev}
library(mev, quietly = TRUE)
gfit <- fit.gev(ow$temp)
adj_gfit <- alogLik(gfit)
summary(adj_gfit)
```
```{r, echo = FALSE, eval = got_mev}
detach("package:mev")
```
## POT
```{r, echo = FALSE, message = FALSE}
got_POT <- requireNamespace("POT", quietly = TRUE)
```
Among other things, the `fitgpd()` function in the *POT* package can fit a GP distribution to threshold excesses using maximum likelihood estimation. We illustrate `alogLik` using an example from the `fitgpd` documentation. There is no cluster dependence here. However, there may be interest in using a sandwich estimator of covariance if we are concerned about model misspecification. In this case, where we simulate from the correct model, we expect the adjustment to make little difference, and so it proves.
```{r, eval = got_POT}
library(POT, quietly = TRUE)
set.seed(24082019)
x <- POT::rgpd(200, 1, 2, 0.25)
fit <- fitgpd(x, 1, "mle")
adj_fit <- alogLik(fit)
summary(adj_fit)
```
## References
<script type="text/x-mathjax-config">
MathJax.Hub.Config({ "HTML-CSS": { minScaleAdjust: 125, availableFonts: [] } });
</script>