-
Notifications
You must be signed in to change notification settings - Fork 3
/
covlmc.Rmd
469 lines (403 loc) · 18.5 KB
/
covlmc.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
---
title: "Variable Length Markov Chains with Covariates (COVLMC)"
output:
rmarkdown::html_vignette:
df_print: kable
vignette: >
%\VignetteIndexEntry{Variable Length Markov Chains with Covariates (COVLMC)}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
markdown:
wrap: 72
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.width = 7,
fig.height = 7,
fig.align = "center"
)
```
```{r setup}
library(mixvlmc)
library(ggplot2)
```
## Limitations of Variable Length Markov Chains
Variable Length Markov Chains (VLMC) are very useful to capture complex
structures in discrete time series as they can mix short memory with
long memory in a contextual way, leading to sparse models.
However, they cannot capture the influence of exogenous variables on the
behaviour of a time series. As a consequence, a VLMC adjusted on
sequences influenced by covariates could fail to capture interesting
patterns.
### A typical example
Let us focus again on the electrical usage data set used in the package
introduction, using the following two weeks of data:
```{r active_power_week_15_16, fig.height=5}
pc_week_15_16 <- powerconsumption[powerconsumption$week %in% c(15, 16), ]
elec <- pc_week_15_16$active_power
ggplot(pc_week_15_16, aes(x = date_time, y = active_power)) +
geom_line() +
xlab("Date") +
ylab("Activer power (kW)")
```
We build again a discrete time series using the following thresholds:
- low active power at night (typically below 0.4 kW);
- standard use between 0.4 and 2 kW;
- peak use above 2 kW.
```{r}
elec_dts <- cut(elec, breaks = c(0, 0.4, 2, 8), labels = c("low", "typical", "high"))
```
Adjusting a VLMC to this sequence gives the following model (using BIC
for model selection):
```{r}
elec_vlmc_tune <- tune_vlmc(elec_dts)
best_elec_vlmc <- as_vlmc(elec_vlmc_tune)
draw(best_elec_vlmc)
```
Considering the apparent regularity of the time series, one may wonder
whether this model is really capturing the dynamics of power electricity
consumption. In particular the longest memory of 3 time steps, i.e., 30
minutes, may seem a bit short compared to the somewhat long periods of
stability.
## Theoretical aspects
VLMC with covariates have been introduced in [Variable length Markov
chain with exogenous covariates](https://doi.org/10.1111/jtsa.12615).
The core idea is to enable the conditional probabilities of the next
state given the context to depend on exogenous covariates.
### Variable memory
A COVLMC models a pair of sequences. The main sequence is denoted
$\mathbf{X}=X_1, X_2, \ldots, X_n, \ldots$. The random variables take
values in a finite state space $S$ exactly as in a standard VLMC. We
have *in addition* another sequence of random variables
$\mathbf{Y}=Y_1, Y_2, \ldots, Y_n, \ldots$ which take values in
$\mathbb{R}^p$. This latter assumption $Y_l\in \mathbb{R}^p$ is can be
relaxed to more general spaces.
A COVLMC puts restriction on the conditional probabilities of $X_n$
given the past of *both* sequences. To simplify the presentation, we
denote $X^k_m$ the sequence of random variables $$
X_k^m=(X_k, X_{k-1}, \ldots, X_m),
$$ and we use similar notations for the values taken by the variables
$x^k_m$, as well as for $Y^k_m$ and $y^k_m$. The notation accommodates both temporal order if $k<m$ and reverse temporal if $k>m$.
A pair of sequences $\mathbf{X}$ and $\mathbf{Y}$ is a COVLMC if there
is a maximal order $l_{\max}$ and a function $l$ from $S^{l_{\max}}$ to
$\{0,\ldots,l_{\max}\}$ such that for all $n>l_{\max}$ $$
\begin{multline}
\mathbb{P}(X_n=x_n\mid X^{n-1}_1=x^{n-1}_1, Y^{n-1}_1=y^{n-1}_1)=\\
\mathbb{P}\left(X_n=x_n\mid X^{n-1}_{n-l\left(x^{n-1}_{n-l_{\max}}\right)}=x^{n-1}_{n-l\left(x^{n-1}_{n-l_{\max}}\right)}, Y^{n-1}_{n-l\left(x^{n-1}_{n-l_{\max}}\right)}=y^{n-1}_{n-l\left(x^{n-1}_{n-l_{\max}}\right)}\right).
\end{multline}
$$ As in VLMC, the $\mathbf{X}$ process has a finite and variable
memory, but this memory applies to both $\mathbf{X}$ and $\mathbf{Y}$.
Notice that the memory order depends only on $\mathbf{X}$ and that no
assumptions are made on the temporal behaviour of $\mathbf{Y}$. Thus a
COVLMC on $\mathbf{X}$ and $\mathbf{Y}$ is not a VLMC on the pair
$(\mathbf{X}, \mathbf{Y})$ (which can make sense if $\mathbf{Y}$ is
discrete).
As for VLMC, the memory length function generates a *context* function
$c$ which keeps in the past the part needed to obtain the conditional
distribution: $c$ is a function from $S^{l_{\max}}$ to
$\bigcup_{k=0}^{l_{\max}}S^k$ given by $$
c(x^{n-1}_{n-l_{\max}})=x^{n-1}_{n-l\left(x^{n-1}_{n-l_{\max}}\right)}
$$ The image by $c$ of $S^{l_{\max}}$ is the set of contexts of the
COVLMC which is entirely specified by $l$ and one conditional
distribution by unique context.
As the notations are somewhat opaque at first, we can illustrate the
definition with a simple example. We consider a binary sequence (values
in $S=\{0, 1\}$) and a single numerical covariate $Y_t\in\mathbb{R}$. As
in the theoretical example in
`vignette("variable-length-markov-chains")` we assume that $l_{\max}=3$
and that $l$ is given by $$
\begin{align*}
l(a, b, 0)&=1&\forall a, \forall b,\\
l(c, 1, 1)&=2&\forall c,\\
l(0, 0, 1)&=3,&\\
l(1, 0, 1)&=3.&\\
\end{align*}
$$ In practice the COVLMC is therefore fully described by specifying the
following probabilities: $$
\begin{align*}
\mathbb{P}(X_t=1\mid& X_{t-1}=0, Y_{t-1}=y_{t-1})\\
\mathbb{P}(X_t=1\mid& X_{t-1}=1, X_{t-2}=1, Y_{t-1}=y_{t-1}, Y_{t-2}=y_{t-2})\\
\mathbb{P}(X_t=1\mid& X_{t-1}=1, X_{t-2}=0, X_{t-3}=0, Y_{t-1}=y_{t-1}, Y_{t-2}=y_{t-2}, Y_{t-3}=y_{t-3})\\
\mathbb{P}(X_t=1\mid& X_{t-1}=1, X_{t-2}=0, X_{t-3}=1, Y_{t-1}=y_{t-1}, Y_{t-2}=y_{t-2}, Y_{t-3}=y_{t-3})
\end{align*}
$$
### Conditional distributions
The main difficulty induced by COVLMC compared to VLMC is the
specification of the conditional distributions. Indeed the conditional
distributions depend on $\mathbf{Y}$ and cannot simply be given by a
table. For instance, in the example above, we need to specify, among
others, $\mathbb{P}(X_t=1\mid X_{t-1}=0, Y_{t_1}=y_{t-1})$, for each all
values of $y_{t-1}\in \mathbb{R}$ (in $\mathbb{R}^p$ in the general
case).
A natural choice in this particular example is to use a logistic model,
i.e. to assume $$
\mathbb{P}(X_t=1\mid X_{t-1}=0, Y_{t_1}=y_{t-1})=g(\alpha^0+y_{t-1}\beta_1^0),
$$ with $g(t)=\frac{1}{1+\exp(-t)}$ is the logistic function. The
superscript on $\alpha^0$ and $\beta^0_1$ refer to the context, here
$0$, while the subscript on $\beta^0_1$ refers to the time delay (here
$1$). By extension, we have for example $$
\begin{multline*}
\mathbb{P}(X_t=1\mid X_{t-1}=1, X_{t-2}=1, Y_{t_1}=y_{t-1}, Y_{t_2}=y_{t-2})=\\
g\left(\alpha^{1,1}+y_{t-1}\beta^{1,1}_1+y_{t-2}\beta^{1,1}_2\right).
\end{multline*}
$$
More generally, the probability distribution associated to a context
could be given by any function that maps the values of the covariates to
a distribution on $S$, the state space. Following the original
[paper](https://doi.org/10.1111/jtsa.12615), `mixvlmc` uses multinomial
logistic regression as implemented by `VGAM::vglm()` or
`nnet::multinom()`, or a logistic regression provided by `stats::glm()`
for state spaces with only 2 states. This has several advantages over a
more general solution:
1) during the estimation phase, one probability distribution is
estimated for each relevant context: this could induce a large
computational burden for more complex models than multinomial
logistic ones;
2) having a simple model enables to fit contexts with limited number of
occurrences which is important to allow searching for long term
dependencies;
3) logistic models with different memory order are easy to compare
using a likelihood-ratio test.
The last point is used in `mixvlmc` (as in the original paper) to
simplify local models with respect to the covariates. For a context of
length $l$, the probability distribution is assumed to depend on the $l$
past values of $\mathbf{Y}$ but in practice we allow a dependency to
only $k<l$ past values. For instance, we could have $$
\mathbb{P}(X_t=1\mid X_{t-1}=1, X_{t-2}=1, Y_{t_1}=y_{t-1}, Y_{t_2}=y_{t-2})=g\left(\alpha^{1,1}+y_{t-1}\beta^{1,1}_1\right).
$$
### Beta-Context algorithm
Estimating a COVLMC model from two time series is more complex than
estimating a VLMC model. `mixvlmc` implements the Beta-Context algorithm
proposed in the original [paper](https://doi.org/10.1111/jtsa.12615). It
is inspired from the context algorithm used for VLMC (and proposed in
[Variable length Markov
chains](https://dx.doi.org/10.1214/aos/1018031204)). It can be
summarized as follows:
1) the first step consists in building a context tree (see
`vignette("context-trees")`) from the $\mathbf{X}$ discrete
sequences, almost exactly as for a VLMC: the only difference is that
to be kept in the tree a context must appear a number of times that
depends on its length, on the dimension of the covariates and on the
number of states. This guarantees a minimal number of observations
for the (maximum likelihood) estimation of the context dependant
multinomial logistic regression.
2) the second step is an estimation one: a multinomial logistic
regression model is estimated for each context, using a number of
past values of $\mathbf{Y}$ equal to the length of the context.
3) the rest of the algorithm consists in pruning the context tree and
the logistic models:
1) leaf contexts are first assessed in terms of model
simplification. The likelihood-ratio test is used to decide
whether the oldest value of $\mathbf{Y}$ is relevant or not.
Essentially we compare e.g.
$g\left(\alpha^{1,1}+y_{t-1}\beta^{1,1}_1\right)$ to
$g\left(\alpha^{1,1}+y_{t-1}\beta^{1,1}_1+y_{t-2}\beta^{1,1}_2\right)$
as an estimator of
$\mathbb{P}(X_t=1\mid X_{t-1}=1, X_{t-2}=1, Y_{t_1}=y_{t-1}, Y_{t_2}=y_{t-2})$;
2) based on the results of those tests, it may be possible to
completely remove a context from the tree, see the
[paper](https://doi.org/10.1111/jtsa.12615) for details.
This last pruning phase is carried out repeatedly as context removals
turn internal contexts into leaves that can then be further simplified.
## COVLMC in practice
### Estimation
COVLMC estimation is provided by the `covlmc()` function. We build first
a simple example data set as follows:
```{r}
set.seed(0)
nb_obs <- 200
covariates <- data.frame(y = runif(nb_obs))
x <- 0
for (k in 2:nb_obs) {
## we induce a simple dependency to the covariate
## and an order 1 memory
if (covariates$y[k - 1] < 0.5) {
if (x[k - 1] == 0) {
x[k] <- sample(c(0, 1), 1, prob = c(0.7, 0.3))
} else {
x[k] <- sample(c(0, 1), 1, prob = c(0.3, 0.7))
}
} else {
if (x[k - 1] == 0) {
x[k] <- sample(c(0, 1), 1, prob = c(0.1, 0.9))
} else {
x[k] <- sample(c(0, 1), 1, prob = c(0.5, 0.5))
}
}
}
```
Then we estimate a COVLMC as follows:
```{r}
model <- covlmc(x, covariates)
model
```
The estimation process is controlled by three parameters:
- `max_depth`: the largest order/memory considered for the COVLMC
(defaults to 100). This parameter is essentially a computational
burden control parameter. It does not play a major role in COVLMC
estimation because of the constraints imposed by `min_size`. The
default value is very conservative;
- `min_size`: this parameter controls the minimal number of
occurrences needed for a context to be included in the initial
context tree. It gives the number of occurrences *per parameter* of
the logistic model which depends on both the length of the context,
the dimension of the covariates and the number of states;
- `alpha`: is the parameter of the pruning process of the Beta-Context
algorithm. Pruning decisions are all based on likelihood ratio tests
and `alpha` is the common level of all those tests.
The default parameters work well on the previous example, as shown by the obtained
model:
```{r}
draw(model, model = "full")
```
The model has 2 contexts, 0 and 1, as expected. The logistic models are described
by their parameters. For context 0, the intercept is negative and the coefficient of
$y_{t-1}$ is positive: the probability of switching to 1 is small when $y_{t-1}$ is
small and increases with $y_{t-1}$. For context 1, the situation is reversed and
the effect of $y_{t-1}$ is smaller. This is consistent with the way the series
were constructed.
Notice however that obtained an interesting model with the default parameters
should not be seen as a general property and proper model choice must be implemented.
### Model choice
As COVLMC estimation fits a potentially large number of logistic models to the
data, the use of a penalized likelihood approach is recommended to set its parameters
and avoid overfitting.
However, model choice is more complex in the case of the COVLMC than for VLMC. In particular,
the pair `cutoff()`/`prune()` does not work as well for COVLMC than for VLMC (see `vignette("variable-length-markov-chains")`). Indeed the pruning process of the
Beta-Context algorithm is such that its effects cannot be predicted as easily as
the ones of the Context algorithm. In practice, computing the largest `alpha`
(test level) that is guaranteed to make a minimal but actual pruning of a given
COVLMC is easy. But subsequent cut off values could be misleading. To avoid any
problem it is therefore recommended to rely on the `tune_covlmc()` function that
has been designed to explore the full "pruning space" associated to a given data set.
Used on the artificial example above, it gives:
```{r}
model_tune <- tune_covlmc(x, covariates)
model_tune
```
and
```{r}
draw(as_covlmc(model_tune), model = "full")
```
The resulting model is the same as the one obtained before but it was properly
obtained by minimising the BIC.
As for `tune_vlmc()`, the `max_depth` parameter is automatically increased to
avoid using it inadvertently as a regularisation parameter. However, while
`min_size` is generally considered as not having a major influence on the model
selection for VLMC, this is not the case for COVLMC. There is currently no
support in `mixvlmc` for an automatic choice of `min_size` and one should therefore
test several values and compare the obtained BIC/AIC. In the articial case, we
can try for instance `min_size=2` as follows:
```{r}
model_tune_2 <- tune_covlmc(x, covariates, min_size = 2)
model_tune_2
```
and `min_size=10` as follows:
```{r}
model_tune_10 <- tune_covlmc(x, covariates, min_size = 10)
model_tune_10
```
Here we see no particular effect of the parameter because of the simplicity of the
problem.
Let us now come back to the electricity consumption example described above. We
introduce a very basic day/night covariate as follows:
```{r}
elec_cov <- data.frame(day = (pc_week_15_16$hour >= 7 & pc_week_15_16$hour <= 18))
```
and fit a COVLMC with
```{r}
elec_tune <- tune_covlmc(elec_dts, elec_cov)
elec_tune
```
The model selection process can be represented graphically as follows:
```{r fig.height=4}
ggplot(elec_tune$results, aes(x = alpha, y = BIC)) +
geom_line() +
geom_point()
```
or automatically using e.g. `autoplot()` as follows:
```{r}
print(autoplot(elec_tune))
```
The final model is:
```{r}
draw(as_covlmc(elec_tune), model = "full", with_state = TRUE)
```
It shows some interesting patterns:
- one of the logistic model is degenerate: in the `high` context, no transition
to a `low` context can happen. This was already observed with the VLMC;
- the only dependency with respect to the covariate is in the context typical,
typical, typical. In this case, the probability to stay in the typical context is
increased during day time and decreased (but to a lesser extent) during the night.
In practice, this means that the model is able to generate longer sequences that
stay in the typical state during the day than at night.
Notice finally that for this real world example, the `min_size` parameter has
some influence on the results. Setting it to a smaller value does not change the
final model, as shown below:
```{r}
elec_tune_3 <- tune_covlmc(elec_dts, elec_cov, min_size = 3)
elec_tune_3
```
However, increasing the parameter to, e.g., 10 generates a simpler but weaker model
as show here:
```{r}
elec_tune_10 <- tune_covlmc(elec_dts, elec_cov, min_size = 10)
elec_tune_10
```
### Diagnostics
The package provides numerous ways to analyse a COVLMC. asic functions include
- `states()` returns the state space of the model;
- `depth()` returns the length of the longest context in the model;
- `covariate_depth()` returns the longest memory used by the model with respect to covariates;
- `context_number()` returns the number of contexts in the model.
For instance, the large model obtained above has the following
characteristics:
```{r}
elec_model <- as_covlmc(elec_tune)
states(elec_model)
depth(elec_model)
covariate_depth(elec_model)
context_number(elec_model)
```
VLMC objects support classical statistical functions such as:
```{r}
logLik(elec_model)
AIC(elec_model)
BIC(elec_model)
```
### Contexts
The model can be explored in details by drawing its context tree
(see `vignette("context-trees")` for details) as follows:
```{r}
draw(elec_model, p_value = TRUE)
```
The `draw.covlmc()` function is more advanced than its VLMC counterpart. It provides more detailed information, particularly regarding p-values associated with pruning operations. The "merging" p-value corresponds to replacing some of the contexts with a single joint model, while the "collapsing" p-value is associated to pruning all the sub-contexts of a context. P-values associated to specific models correspond to reducing the memory of the corresponding model, that is discarding the dependency
of the conditional probability towards the oldest covariates.
To explore the contexts in a programmatic way, one should rely on the `contexts()` function. COVLMC contexts have additional characteristics compared to VLMC and context trees.
In particular, the `contexts()` function can report the model associated to each context, either
by its parameters only:
```{r}
contexts(elec_model, model = "coef")
```
or using the models themselves:
```{r}
contexts(elec_model, model = "full")
```
As for context trees, focused analysis of specific contexts can be done by
requesting a `ctx_node_covlmc` representation of the context of interest, for
instance via the `find_sequence.covlmc()` function, or with the default result type of
`contexts()`
```{r}
ctxs <- contexts(elec_model)
ctxs
```
Individual contexts can be analysed using a collection of dedicated functions,
for instance
```{r}
model(ctxs[[2]])
model(ctxs[[3]], type = "full")
```
See `contexts.covlmc()` for details.