-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
203 lines (155 loc) · 6.44 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# stcpR6: Sequential Test and Change-Point detection algorithms based on E-values / E-detectors
<!-- badges: start -->
[![R-CMD-check](https://github.com/shinjaehyeok/stcpR6/workflows/R-CMD-check/badge.svg)](https://github.com/shinjaehyeok/stcpR6/actions)
[![R-CMD-check](https://github.com/shinjaehyeok/stcpR6/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/shinjaehyeok/stcpR6/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->
stcpR6 is an R package built to run nonparametric sequential tests and
online change point detection algorithms in [SRR 21'](https://arxiv.org/abs/2010.08082) and [SRR 23'](https://arxiv.org/abs/2203.03532).
This package supports APIs of nonparametric sequential test and online
change-point detection for streams of univariate sub-Gaussian, binary,
and bounded random variables.
Disclaimer: This R package is a personal project and is not affiliated with my company. It is provided "as is" without warranty of any kind, express or implied. The author does not assume any liability for any errors or omissions in this package.
<!-- You'll still need to render `README.Rmd` regularly, to keep `README.md` up-to-date. `devtools::build_readme()` is handy for this. -->
## Installation
You can install the CRAN version of stcpR6 with:
``` r
install.packages("stcpR6")
```
You can also install the development version of stcpR6 from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("shinjaehyeok/stcpR6")
```
## Example
### Setup: tracking prediction accuracy.
* Under pre-change, $Y_t = t + ϵ_t$ for $t = 1, 2, \dots, \nu$.
* Under post-change, $Y_t = t + 0.01 * (t-\nu)^2 + ϵ_t$ for $t = \nu +1, \nu+2, \dots$.
* Prediction: Simple linear regression based on $(t, Y_t)$ history.
### Build E-detector on residuals.
We will use E-detector for bounded random variables with the following setup
* Pre-change: Expected residual is equal to zero given sample history.
* Post-change : Expected residual is not equal to zero given sample history.
* Minimum practically interesting gap between pre- and post-means : $\Delta = 1$.
* Average run length (ARL) level : ARL $\geq 1000$. i.e., False alert once in 1000 times.
* Cap residuals into $[-20, 20]$.
```{r}
kCap <- 20
kARL <- 1000
normalize_obs <- function(y) {
obs <- (y + kCap) / (2 * kCap)
obs[obs > 1] <- 1
obs[obs < 0] <- 0
return(obs)
}
e_detector <- stcpR6::Stcp$new(
method = "SR",
family = "Bounded",
alternative = "two.sided",
threshold = log(kARL),
m_pre = normalize_obs(0),
delta_lower = 1 / kCap / 2
)
```
### Case 1. No change happened (Normal condition)
```{r, fig1}
# Set random seed for reproducibility
set.seed(100)
# No change
sig <- 10
v <- 200
y_history <- seq(1,v) + rnorm(v, 0, sig)
# Simple regression predictor
predict_y <- function(i) {
if (i <= 2) return (0)
j <- i-1
coeff <- lm.fit(x = cbind(rep(1, j),1:j), y = y_history[1:j])$coefficients
return(coeff[1] + coeff[2] * i)
}
y_hat <- sapply(seq_along(y_history), predict_y)
plot(seq_along(y_history), y_history, type = "l",
xlab = "t", ylab = "Y_t", main = "Sample history (Black) and Prediction (Red)")
lines(seq_along(y_hat), y_hat, col = 2)
plot(seq_along(y_history), y_history-y_hat, type = "l",
xlab = "t", ylab = "Residual", main = "Prediction error")
```
Apply E-detector on normalized residuals
```{r, fig2}
res <- normalize_obs(as.numeric(y_history-y_hat))
# In real applications,
# we should use e_detector$updateLogValues(res_i) to update e-deetector
# for each newly observed residual res_i.
# In this example, however, we use a vectorized update for simplicity.
# Initialize the model.
e_detector$reset()
# Compute the log values of e-detectors.
log_values <- e_detector$updateAndReturnHistories(res)
# Print a summary of updates
print(e_detector)
# Plot the log values and stopping threshold
plot(seq_along(log_values), log_values, type = "l",
xlab = "t", ylab = "log value", ylim = c(0, 3 * e_detector$getThreshold()))
abline(h = e_detector$getThreshold(), col = 2)
```
Note that the log values of e-detector have remained below the threshold. E-detector didn't trigger any alert.
### Case 2. Change happened at t = 100
```{r, fig3}
# Set random seed for reproducibility
set.seed(100)
# Change point: 100
v <- 100
# Pre-change history
y_pre <- seq(1,v) + rnorm(v, 0, sig)
# Post-change history
y_post <- seq(v+1, 2*v) + 0.01 * seq(1,v)^2 + rnorm(v, 0, sig)
y_history <- c(y_pre, y_post)
# Sample history
y_history <- c(y_pre, y_post)
y_hat <- sapply(seq_along(y_history), predict_y)
plot(seq_along(y_history), y_history, type = "l",
xlab = "t", ylab = "Y_t", main = "Sample history (Black) and Prediction (Red)")
lines(seq_along(y_hat), y_hat, col = 2)
abline(v = v, lty = 2)
plot(seq_along(y_history), y_history-y_hat, type = "l",
xlab = "t", ylab = "Residual", main = "Prediction error")
abline(v = v, lty = 2)
```
Apply E-detector on normalized residuals
```{r, fig4}
res <- normalize_obs(as.numeric(y_history-y_hat))
# In real applications,
# we should use e_detector$updateLogValues(res_i) to update e-deetector
# for each newly observed residual res_i.
# In this example, however, we use a vectorized update for simplicity.
# Initialize the model.
e_detector$reset()
# Compute the log values of e-detectors.
log_values <- e_detector$updateAndReturnHistories(res)
# Print a summary of updates
print(e_detector)
# Plot the log values and stopping threshold
plot(seq_along(log_values), log_values, type = "l",
xlab = "t", ylab = "log value", ylim = c(0, 3 * e_detector$getThreshold()))
abline(h = e_detector$getThreshold() , col = 2)
abline(v = v, lty = 2)
abline(v = e_detector$getStoppedTime(), col = 2, lty = 2)
```
In this case, the log values have crossed the threshold at time `r e_detector$getStoppedTime()` and triggered alert. Detection delay was `r e_detector$getStoppedTime()` - 100 = `r e_detector$getStoppedTime() - 100`.
Note that triggering alert at time `r e_detector$getStoppedTime()` would be non-trivial if we only check the residual trend as below.
```{r, fig5}
plot(seq_along(y_history), y_history-y_hat, type = "l",
xlab = "t", ylab = "Residual", main = "Prediction error")
abline(v = v, lty = 2)
abline(v = e_detector$getStoppedTime(), col = 2, lty = 2)
```