-
-
Notifications
You must be signed in to change notification settings - Fork 6
/
sample_datasets.rmd
271 lines (217 loc) · 9.94 KB
/
sample_datasets.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
---
title: "Sample datasets"
output:
rmarkdown::html_document:
toc_float: true
df_print: paged
description: >
This vignette describes how to create sample datasets fitting the user's preferences, and needs.
vignette: >
%\VignetteIndexEntry{Sample datasets}
\usepackage[utf8]{inputenc}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
chunk_output_type: inline
---
## Overview
---
This vignette describes various scenarios of creating sample datasets that fit the preferences
and needs of the users; and this for the different models. The functions in the package PINstimation
use two types of datasets: (1) A sequence of daily buys and sells (2) A high-frequency trading data.
This is the reason why only two sample datasets are preloaded with the package, namely `dailytrades`,
and `hfdata`. The users can also generate simulation data using the function `generatedata_mpin()`
for PIN, and MPIN models; and the function `generatedata_adjpin()` for the ADJPIN model. Below we
provide some scenarios of creating sample datasets both for the PIN, MPIN, and ADJPIN models.
```{r loading, echo = FALSE, results = 'hide', message=FALSE, warning=FALSE}
library(PINstimation)
```
```{r echo=FALSE}
options(crayon.enabled = TRUE)
knitr::knit_hooks$set(output = function(x, options){
paste0(
'<pre class="r-output"><code>',
fansi::sgr_to_html(x = htmltools::htmlEscape(x), warn = FALSE),
'</code></pre>'
)
})
```
## Sample datasets for the PIN model
---
The PIN model is an multilayer PIN model with a single information layer. We can, therefore,
use the function `generatedata_mpin()`, in order to generate sample data for the PIN model.
Generically, this is done as follows:
```r
generatedata_mpin(..., layers=1)
```
If the user would like to create a sample dataset for infrequently traded stock, she can
specify low values or ranges for the trade intensity rates. For instance, let's assume
that the user suspects that an infrequently-traded stock has an average of uninformed
trading intensity for buys and sells between `300` and `500`. They generate a single
sample dataset for this scenario as follows:
```{r sample.pin.1.1}
pindata <- generatedata_mpin(layers=1, ranges = list(eps.b=c(300, 500), eps.s=c(300,500)), verbose = FALSE)
```
The details of the generated sample dataset can be displayed with the following code
```{r sample.pin.1.2, results='markup'}
show(pindata)
```
You access the sequences of buys, and sells through the slot `@data` of the object
`pindata`.
```{r sample.pin.1.3, results='markup'}
show(pindata@data[1:10, ])
```
You can, now use the dataset object `pindata` to check the accuracy of the different
estimation functions. You can do that by comparing the actual parameters of
the sample datasets to the estimated parameters of the estimation functions. Let us start
with displaying the actual parameters of the sample datasets. These can be accessed
through the slot `@empiricals` of the dataset object, which stores the empirical
parameters computed from the sequences of buys and sells generated. Please refer to
the documentation of `generatedata_mpin()` for more information.
```{r sample.pin.1.4, results='markup'}
actual <- unlist(pindata@empiricals)
show(actual)
```
Estimate the PIN model using the function `pin_ea()`, and display the estimated
parameters
```{r sample.pin.1.5, results='markup'}
model <- pin_ea(data=pindata@data, verbose = FALSE)
estimates <- model@parameters
show(estimates)
```
Now calculate the absolute errors of the estimation method.
```{r sample.pin.1.6, results='markup'}
errors <- abs(actual - estimates)
show(errors)
```
## Sample datasets for the MPIN model
---
In contrast to the PIN model, the number of information layers is free. We can, therefore,
use the function `generatedata_mpin()` with the desired number of information layers, in
order to generate sample data for the MPIN model. We can also skip specifying the number
of layers, and the default setting will be used: the number of layers will be randomly
selected from the integer set from `1` to `5`. Generically, this is done as follows:
```r
generatedata_mpin(...)
```
If the user would like to create a sample dataset for frequently traded stock with two
information layers, she can set the argument layers to 2, and specify high values or ranges
for the trade intensity rates. For instance, let's assume that the user suspects that a
frequently-traded stock has an average of uninformed trading intensity for buys and sells
between `12000` and `15000`. They generate a single sample dataset for this scenario as follows:
```{r sample.mpin.1.1}
mpindata <- generatedata_mpin(layers=2, ranges = list(eps.b=c(12000, 15000), eps.s=c(12000,15000)), verbose = FALSE)
```
The details of the generated sample dataset can be displayed with the following code
```{r sample.mpin.1.2, results='markup'}
show(mpindata)
```
You access the sequences of buys, and sells through the slot `@data` of the object
`pindata`.
```{r sample.mpin.1.3, results='markup'}
show(mpindata@data[1:10, ])
```
You can, now use the dataset object `mpindata` to check the accuracy of the different
estimation functions, namely `mpin_ml()`, and `mpin_ecm()`. You can do that by comparing
the empirical PIN value derived from the sample dataset to the estimated PIN value of
the estimation functions. Let us start with displaying the empirical PIN value obtained
from the sample dataset. This value can be accessed through the slot `@emp.pin` of the
dataset object, which stores the empirical PIN value computed from the sequences of buys
and sells generated. Please refer to the documentation of `generatedata_mpin()` for more information.
```{r sample.mpin.1.4, results='markup'}
actualmpin <- unlist(mpindata@emp.pin)
show(actualmpin)
```
Estimate the MPIN model using the functions `mpin_ml()`, and `mpin_ecm`, and display the
estimated MPIN values.
```{r sample.mpin.1.5, results='markup'}
model_ml <- mpin_ml(data=mpindata@data, verbose = FALSE)
model_ecm <- mpin_ecm(data=mpindata@data, verbose = FALSE)
mlmpin <- model_ml@mpin
ecmpin <- model_ecm@mpin
estimates <- setNames(c(mlmpin, ecmpin), c("ML", "ECM"))
show(estimates)
```
Now calculate the absolute errors of both estimation methods.
```{r sample.mpin.1.6, results='markup'}
errors <- abs(actualmpin - estimates)
show(errors)
```
The function `generatedata_mpin()` can generate a `data.series` object that
contains a collection of `dataset` objects. For instance, the user can generate
a collection of 10 datasets, whose data sequences span 60 days, and contain 3
layers, and use it to check the accuracy of the MPIN estimation.
```{r sample.mpin.1.7, results='markup'}
size <- 10
collection <- generatedata_mpin(series = size, layers = 3, verbose = FALSE)
show(collection)
```
```{r sample.mpin.1.8, results='markup'}
accuracy <- devmpin <- 0
for (i in 1:size) {
sdata <- collection@datasets[[i]]
model <- mpin_ml(sdata@data, xtraclusters = 3, verbose=FALSE)
accuracy <- accuracy + (sdata@layers == model@layers)
devmpin <- devmpin + abs(sdata@emp.pin - model@mpin)
}
cat('The accuracy of layer detection: ', paste0(accuracy*(100/size),"%.\n"), sep="")
cat('The average error in MPIN estimates: ', devmpin/size, ".\n", sep="")
```
## Sample datasets for the ADJPIN model
---
The AdjPIN model is an extension of the PIN model that includes the possibility of
liquidity shocks. To obtain a sample dataset distributed according to the assumptions
of the AdjPIN model, users can use the function `generatedata_adjpin()`. Generically,
this is done as follows:
```r
generatedata_adjpin(...)
```
If the user desires to create 10 sample datasets for frequently traded stock, they can
specify high values or ranges for the trade intensity rates. For instance, let's assume
that the user suspects that a frequently-traded stock has an average of uninformed trading
intensity for buys and sells between `10000` and `15000`.
```{r sample.adjpin.1.1}
adjpindatasets <- generatedata_adjpin(series = 10, ranges = list(eps.b=c(10000, 15000), eps.s=c(10000,15000)), verbose = FALSE)
```
The details of the generated sample data series can be displayed with the following code:
```{r sample.adjpin.1.2, results='markup'}
show(adjpindatasets)
```
You access the first dataset from `adjpindata` using this code:
```{r sample.adjpin.1.3, results='markup'}
adjpindata <- adjpindatasets@datasets[[1]]
show(adjpindata)
```
You can, now use the dataset object `adjpindata` to check the accuracy of the different
estimation functions, namely MLE, and ECM algorithms. You can do that by comparing
the empirical adjpin, and psos values derived from the sample dataset to the estimated
adjpin, and psos values obtained from the estimation functions. Let us start with
displaying the empirical adjpin, and psos values obtained from the sample dataset. These
values can be accessed through the slot `@emp.pin` of the dataset object, which stores
the empirical adjpin/psos value computed from the sequences of buys and sells generated.
Please refer to the documentation of `generatedata_adjpin()` for more information.
```{r sample.adjpin.1.4, results='markup'}
actualpins <- unlist(adjpindata@emp.pin)
show(actualpins)
```
Estimate the AdjPIN model using `adjpin(method="ML")`, and `adjpin(method="ECM"`, and
display the estimated adjpin/psos values.
```{r sample.adjpin.1.5, results='markup'}
model_ml <- adjpin(data=adjpindata@data, method = "ML", verbose = FALSE)
model_ecm <- adjpin(data=adjpindata@data, method = "ECM", verbose = FALSE)
mlpins <- c(model_ml@adjpin, model_ml@psos)
ecmpins <- c(model_ecm@adjpin, model_ecm@psos)
estimates <- rbind(mlpins, ecmpins)
colnames(estimates) <- c("adjpin", "psos")
rownames(estimates) <- c("ML", "ECM")
show(estimates)
```
Now calculate the absolute errors of both estimation methods.
```{r sample.adjpin.1.6, results='markup'}
errors <- abs(estimates - rbind(actualpins, actualpins))
show(errors)
```
## Getting help
---
If you encounter a clear bug, please file an issue with a minimal
reproducible example on
[GitHub](https://github.com/monty-se/PINstimation/issues).