-
-
Notifications
You must be signed in to change notification settings - Fork 7
Expand file tree
/
Copy pathPINstimation.rmd
More file actions
347 lines (257 loc) · 11.5 KB
/
PINstimation.rmd
File metadata and controls
347 lines (257 loc) · 11.5 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
---
title: "Get started"
output:
rmarkdown::html_document:
toc_float: true
df_print: paged
description: >
This vignette describes how to install the package PINstimation, and provides several examples on how to use its main functionalities.
vignette: >
%\VignetteIndexEntry{Get started with PINstimation}
%\VignetteEngine{knitr::rmarkdown}
\usepackage[utf8]{inputenc}
---
## Overview
----
This vignette describes how to install the package 'PINstimation', either in its stable version on CRAN, or in its development version on Github. It also provides several usage examples on how to use the different functionalities of the package.
## Installation
----
The easiest way to get PINstimation is the following:
```r
install.packages("PINstimation")
```
To get a bug fix or to use a feature from the development version, you
can install the development version of PINstimation from GitHub.
```r
# install.packages("devtools")
# library(devtools)
devtools::install_github("monty-se/PINstimation", build_vignettes = TRUE)
```
Loading the package
```{r installation, results = 'hide', message=FALSE, warning=FALSE}
library(PINstimation)
library(dplyr)
library(tidyr)
```
## Note to frequent users
----
If you are a frequent user of PINstimation, you might want to avoid repetitively
loading the package PINstimation whenever you open a new R session. You can do
that by adding PINstimation to `.R profile` either manually, or using the function
`load_pinstimation_for_good()`.
To automatically load PINstimation, run `load_pinstimation_for_good()`,
and the following code will be added to your .R profile.
```r
if (interactive()) suppressMessages(require(PINstimation))
```
After restart of the R session, PINstimation will be loaded automatically, whenever a new R
session is started. To remove the automatic loading of PINstimation, just open the
.R profile for editing `usethis::edit_r_profile()`, find the code above, and delete it.
## Usage examples
----
Below, you find five usage examples for the main functions in the package.
- **Example 1**: [PIN] Use daily trade data to estimate the standard probability of informed trading.
- **Example 2**: [MPIN] Use daily trade data to estimate the number of layers in the data, as well as the multi-layer probability of informed trading.
- **Example 3**: [AdjPIN] Use daily trade data to estimate the adjusted probability of informed trading.
- **Example 4**: [VPIN] Use high-frequency data to estimate the volume-adjusted probability of informed trading.
- **Example 5**: Classify high frequency trades into daily trading data, and use it to estimate the adjusted probability of informed trading using the Maximum-likelihood method, and the Expectation-Maximization algorithm.
### Example 1: Estimate the PIN model
---
We estimate the PIN model on preloaded dataset `dailytrades` using the initial parameter sets of Ersan & Alici (2016).
```{r Example.1.1, results=F}
estimate <- pin_ea(dailytrades)
```
```
## [+] PIN Estimation started
## |[1] Likelihood function factorization: Ersan (2016)
## |[2] Loading initial parameter sets : 5 EA initial set(s) loaded
## |[3] Estimating PIN model (1996) : Using Maximum Likelihood Estimation
## |+++++++++++++++++++++++++++++++++++++| 100% of PIN estimation completed
## [+] PIN Estimation completed
```
```{r Example.1.2}
show(estimate)
```
### Example 2: Estimate the Multilayer PIN model
---
We run the estimation of the MPIN model on preloaded dataset `dailytrades` using:
* the maximum-likelihood method.
```{r Example.2.1, results=F}
ml_estimate <- mpin_ml(dailytrades)
```
```
## [+] MPIN estimation started
## |[1] Detecting layers from data : using Ersan and Ghachem (2022a)
## |[=] Number of layers in the data : 3 information layer(s) detected
## |[2] Computing initial parameter sets : using algorithm of Ersan (2016)
## |[3] Estimating the MPIN model : Maximum-likelihood standard estimation
## |+++++++++++++++++++++++++++++++++++++| 100% of mpin estimation completed
## [+] MPIN estimation completed
```
* the ECM algorithm.
Note: For efficiency purposes, we cap the number of initial parameter sets to 20
for all layers. The default is 100 initial parameter sets.
```{r Example.2.2, results=F}
ecm_estimate <- mpin_ecm(dailytrades, hyperparams = list(maxinit = 20))
```
```
## [+] MPIN estimation started
## |[1] Computing the range of layers : information layers from 1 to 8
## |[2] Computing initial parameter sets : using algorithm of Ersan (2016)
## |[=] Selecting initial parameter sets : max 20 initial sets per estimation
## |[3] Estimating the MPIN model : Expectation-Conditional Maximization algorithm
## |+++++++++++++++++++++++++++++++++++++| 100% of estimation completed [8 layer(s)]
## |[3] Selecting the optimal model : using lowest Information Criterion (BIC)
## [+] MPIN estimation completed
```
Compare the aggregate parameters obtained from the ML, and ECM estimations.
```{r Example.2.3}
mpin_comparison <- rbind(ml_estimate@aggregates, ecm_estimate@aggregates)
rownames(mpin_comparison) <- c("ML", "ECM")
```
```{r Example.2.4, echo=F, eval=T}
cat("Probabilities of ML, and ECM estimations of the MPIN model\n")
print(mpin_comparison)
```
Display the summary of the model estimates for all number of layers.
```{r Example.2.5, eval=FALSE}
summary <- getSummary(ecm_estimate)
```
```
## layers em.layers MPIN Likelihood AIC BIC AWE
## Model[1] 1 1 0.566 -3226.469 6462.9 6473.4 6508.9
## Model[2] 2 2 0.577 -800.379 1616.8 1633.5 1690.3
## Model[3] 3 3 0.574 -643.458 1308.9 1332.0 1410.0
## Model[4] 4 3 0.574 -643.458 1308.9 1332.0 1410.0
## Model[5] 5 3 0.574 -643.458 1308.9 1332.0 1410.0
## Model[6] 6 3 0.574 -643.458 1308.9 1332.0 1410.0
## Model[7] 7 4 0.575 -642.631 1313.3 1342.6 1441.9
## Model[8] 8 4 0.575 -642.631 1313.3 1342.6 1441.9
```
### Example 3: Estimate the Adjusted PIN model
---
We estimate the adjusted PIN model on preloaded dataset `dailytrades` using `20` initial parameter sets computed by the algorithm of Ersan and Ghachem (2022b).
```{r Example.3.1, results=F}
estimate_adjpin <- adjpin(dailytrades, initialsets = "GE")
```
```
## [+] AdjPIN estimation started
## |[1] Computing initial parameter sets : 20 GE initial sets generated
## |[2] Estimating the AdjPIN model : Expectation-Conditional Maximization algorithm
## |+++++++++++++++++++++++++++++++++++++| 100% of AdjPIN estimation completed
## [+] AdjPIN estimation completed
```
```{r Example.3.2}
show(estimate_adjpin)
```
### Example 4: Estimate the Volume-adjusted PIN model
---
We run a VPIN estimation on preloaded dataset `hfdata` of 100 000 observations with `timebarsize` of `5` minutes (`300` seconds).
```{r Example.4.1, results=F}
estimate.vpin <- vpin(hfdata, timebarsize = 300)
```
```
## [+] VPIN Estimation started.
## |-[1] Checking and preparing the data...
## |-[2] Creating 300-second timebars...[~1 seconds]
## |-[3] Calculating Volume Bucket Size (VBS) and Sigma(DP)...
## |-[4] Breaking up large 300-second timebars' volume...
## |-[5] Assigning 300-second timebars into buckets...
## |-[6] Balancing timebars and adjusting bucket sizes to VBS...
## |-[7] Calculating aggregate bucket data...
## |-[8] Calculating VPIN vector...
## [+] VPIN estimation completed
```
```{r Example.4.2, results = T}
show(estimate.vpin)
```
Plot the unweighted daily vpin stored at the variable `dvpin` in the dataframe `dailyvpin` stored at the slot `@dailyvpin` of the object `estimate.vpin`.
```{r Example.4.3, dev='png'}
plot(estimate.vpin@dailyvpin$dvpin ~seq_len(nrow(estimate.vpin@dailyvpin)),
lwd=1 , type="l" , bty="n" , xlab="day" , ylab="daily vpin",
col=rgb(0.2,0.4,0.6,0.8) )
```
We run a IVPIN estimation on a subset of the preloaded dataset `hfdata` of
100 000 observations with `timebarsize` of `5` minutes (`300` seconds).
```{r Example.4.4, eval=FALSE}
estimate.ivpin <- ivpin(hfdata[1:10000,], timebarsize = 300)
```
```
## [+] IVPIN Estimation started.
## |-[1] Checking and preparing the data...
## |-[2] Creating 300-second timebars...[~1 seconds]
## |-[3] Calculating Volume Bucket Size (VBS) and Sigma(DP)...
## |-[4] Breaking up large 300-second timebars' volume...
## |-[5] Assigning 300-second timebars into buckets...
## |-[6] Balancing timebars and adjusting bucket sizes to VBS...
## |-[7] Calculating aggregate bucket data...
## |-[8] Finding ML estimates for the ivpin model parameters...
## |+++++++++++++++++++++++++++++++++++++| 100% of buckets treated
## |-[9] Calculating IVPIN vector...
## [+] IVPIN estimation completed
```
### Example 5: Estimate the AdjPIN model using aggregated high-frequency data
---
We use the preloaded high-frequency dataset `hfdata`, prepare it for aggregation by deleting the variable `volume`.
```{r Example.5.1}
data <- hfdata
data$volume <- NULL
```
We classify data using the LR algorithm with a time lag of `500000` microseconds (`0.5 s`), using the function `aggregate_trades()`.
```{r Example.5.2, results=F}
daytrades <- aggregate_trades(data, algorithm = "LR", timelag = 500000)
```
```
## [+] Trade classification started
## |[=] Classification algorithm : LR algorithm
## |[=] Number of trades in dataset : 100 000 trades
## |[=] Time lag of lagged variables : 5e+05 microseconds
## |[1] Computing lagged variables : using parallel processing
## |+++++++++++++++++++++++++++++++++++++| 100% of variables computed
## |[=] Computed lagged variables : in 5.432 seconds
## |[2] Computing aggregated trades : using lagged variables
## [+] Trade classification completed
```
We use the obtained dataset to estimate the (adjusted) probability of informed trading via the two available estimated methods, i.e, the standard Maximum-likelihood method, and the Expectation-Maximization algorithm.
```{r Example.5.4, results=F}
adjpin_ml <- adjpin(daytrades, method = "ML", initialsets = "GE")
```
```
## [+] AdjPIN estimation started
## |[1] Computing initial parameter sets : 20 GE initial sets generated
## |[2] Estimating the AdjPIN model : Maximum-likelihood Standard Estimation
## |+++++++++++++++++++++++++++++++++++++| 100% of AdjPIN estimation completed
## [+] AdjPIN estimation completed
```
```{r Example.5.5, results=F}
adjpin_ecm <- adjpin(daytrades, method = "ECM", initialsets = "GE")
```
```
## [+] AdjPIN estimation started
## |[1] Computing initial parameter sets : 20 GE initial sets generated
## |[2] Estimating the AdjPIN model : Expectation-Conditional Maximization algorithm
## |+++++++++++++++++++++++++++++++++++++| 100% of AdjPIN estimation completed
## [+] AdjPIN estimation completed
```
Compare the estimated parameters obtained from the ML, and ECM parameters.
```{r Example.5.6, results=F}
adj.prob <- rbind(adjpin_ml@parameters[1:4], adjpin_ecm@parameters[1:4])
rownames(adj.prob) <- c("ML", "ECM")
```
```{r Example.5.7, echo=F, eval=T}
cat("Probability terms in ML and ECM estimations of the AdjPIN model\n")
print(adj.prob)
```
```{r Example.5.8, results=F}
adj.params <- rbind(adjpin_ml@parameters[5:10], adjpin_ecm@parameters[5:10])
rownames(adj.params) <- c("ML", "ECM")
```
```{r Example.5.9, echo=F, eval=T}
cat("Rate parameters of ML and ECM estimations of the AdjPIN model\n")
print(adj.params)
```
## Getting help
---
If you encounter a clear bug, please file an issue with a minimal
reproducible example on
[GitHub](https://github.com/monty-se/PINstimation/issues).