-
Notifications
You must be signed in to change notification settings - Fork 11
/
data-combined.Rmd
314 lines (237 loc) · 11.7 KB
/
data-combined.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
---
title: "Working with `DataCombined` class"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Working with `DataCombined` class}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
warning = FALSE,
message = FALSE,
comment = "#>",
out.width = "100%",
fig.height = 6,
fig.width = 8,
fig.showtext = TRUE
)
```
## Introduction
In Modeling and Simulation (M&S) workflows, we often need to work with multiple
observed and/or simulated datasets. Some of these datasets can be linked
together in a group (e.g., the simulated and the corresponding observed data)
for analysis and visualization.
The `DataCombined` class in `{ospsuite}` provides a container to store
data, to group them, and to transform them (e.g. scale or offset). The
class accepts data coming from `SimulationResults` or as a `DataSet` object
(see [Observed data](observed-data.html)).
Let's first generate some simulation results and load observed data.
```{r}
library(ospsuite)
# Simulation results
simFilePath <- system.file("extdata", "Aciclovir.pkml", package = "ospsuite")
sim <- loadSimulation(simFilePath)
simResults <- runSimulations(sim)[[1]]
outputPath <- "Organism|PeripheralVenousBlood|Aciclovir|Plasma (Peripheral Venous Blood)"
# observed data
obsData <- lapply(
c("ObsDataAciclovir_1.pkml", "ObsDataAciclovir_2.pkml", "ObsDataAciclovir_3.pkml"),
function(x) loadDataSetFromPKML(system.file("extdata", x, package = "ospsuite"))
)
# Name the elements of the list according to the names of the loaded `DataSet`.
names(obsData) <- lapply(obsData, function(x) x$name)
```
Typically, `DataCombined` is used to create figures using the `plotXXX()` functions.
The different plot types are described in [Visualizations with `DataCombined`](data-combined-plotting.html).
To visualize the different functionalities of `DataCombined` in this document,
we will use the `plotIndividualTimeProfile()` function.
## Creating `DataCombined` object
First, we create a new instance of `DataCombined` class.
```{r}
myDataCombined <- DataCombined$new()
```
Then we add simulated results to this object using `$addSimulationResults()`:
```{r}
myDataCombined$addSimulationResults(
simulationResults = simResults,
quantitiesOrPaths = outputPath,
names = "Aciclovir Plasma",
groups = "Aciclovir PVB"
)
```
Next we add observed data to this object using `$addDataSets()`:
```{r}
myDataCombined$addDataSets(
obsData$`Vergin 1995.Iv`,
groups = "Aciclovir PVB"
)
```
Every data, be it simulated results or from `DataSet`, must have a unique name
within `DataCombined`. If not specified by user, the path of simulated results
or the `$name` property of the `DataSet` are used as the name. Alternatively, we
can define the name when adding the data, as in the above example adding simulated
results.
There are a few things to keep in mind here:
- It doesn't matter in which order observed or simulated datasets are added.
- You can add multiple `DataSet` objects in `$addDataSets()` method call.
- You can add only a single instance of `SimulationResults` in `$addSimulationResults()` method call.
- If you add a dataset with the same name as existing (e.g., adding simulation results
from multiple simulations and not specifying the name, so the path of the results
is used as the name), the new data will replace the existing.
## Grouping
Since the `DataCombined` object can store many datasets, some of them may naturally form a grouping and you would wish to track this in the object. Both `$addDataSets()` and `$addSimulationResults()` allow group specification via `groups` argument.
When being plotted, data sets without a grouping will appear with distinct color,
line style (for simulated results), or symbol style (for observed data) with a
separate entry in the legend, with the name of the data set being the legend entry.
Though it is possible to group data with different dimensions, it makes sense to
only group data sets that have the same dimension (or dimensions that can be
transformed into each other, like `Amount` and `Mass`) in order to be able to compare the
data. Grouping data sets with different dimensions most likely will result in an
error when trying to create plots or calculating residuals with such a `DataCombined`.
Let's create a `DataCombined` and add simulation results and observed data *without*
specifying their grouping and create a time profile:
```{r}
myDataCombined <- DataCombined$new()
myDataCombined$addSimulationResults(
simulationResults = simResults,
quantitiesOrPaths = outputPath,
names = "Aciclovir Plasma"
)
myDataCombined$addDataSets(
obsData$`Vergin 1995.Iv`
)
plotIndividualTimeProfile(dataCombined = myDataCombined)
```
If you do not specify `groups` when you add datasets, and wish to update
groupings later, you can use the `$setGroups()` method. All data within one
group will get one entry in the legend with the name of the group, and will be
plotted with the same color.
```{r}
myDataCombined$setGroups(
names = c("Aciclovir Plasma", obsData$`Vergin 1995.Iv`$name),
groups = c("Aciclovir PVB", "Aciclovir PVB")
)
plotIndividualTimeProfile(dataCombined = myDataCombined)
```
At any point, you can check the current names and groupings with the following active field:
```{r}
myDataCombined$groupMap
```
## Transformations
Sometimes the raw data included in `DataCombined` needs to be transformed using specified
offset and scale factor values.
This is supported via `$setDataTransformations()` method, where you can specify the names of datasets, and offsets and scale factors. If these arguments are scalar (i.e., of length 1), then the same value will be applied to all specified names. Scale factors are unitless values by which raw data values will
be multiplied, offsets are added to the raw values without any conversion and must
therefore be in the same units as the raw data.
The internal data frame in `DataCombined` will be transformed with the specified parameters, with the new columns computed as:
- For x values: `newXValue = (rawXValue + xOffset) * xScaleFactor`
- For y values: `newYValue = (rawYValue + yOffset) * yScaleFactor`
- For arithmetic error: `newErrorValue = rawErrorValue * abs(yScaleFactor)`
- For geometric error: no transformation.
At any point, you can check the applied offsets and scale factors with the following active field:
```{r}
myDataCombined$dataTransformations
```
Now lets take a closer look on possible situations where you would apply any
transformations. Consider the observed data from the examples above. It reports
concentrations of aciclovir after single intravenous administration.
```{r}
myDataCombinedTranformations <- DataCombined$new()
myDataCombinedTranformations$addDataSets(
obsData$`Vergin 1995.Iv`
)
plotIndividualTimeProfile(dataCombined = myDataCombinedTranformations)
```
However, we might want to use this data set with a simulation where aciclovir is
administered 24 hours after simulation begin. To be able to compare simulation
results with the data set, we can *offset* the observed data time by 24 hours.
Keep in mind that the offset must be given in the same unit as the data set values are.
```{r}
# Check the units of the observed time values
obsData$`Vergin 1995.Iv`$xUnit
myDataCombinedTranformations$setDataTransformations(
forNames = obsData$`Vergin 1995.Iv`$name,
xOffsets = 24
)
plotIndividualTimeProfile(dataCombined = myDataCombinedTranformations)
```
In the next step, we want to *normalize* observed concentrations to a dose. We
can easily achieve this with the scale factor. In the next example, we normalize
observed values to 250 mg dose by setting the `yScaleFactor` to 1/250:
```{r}
myDataCombinedTranformations$setDataTransformations(
forNames = obsData$`Vergin 1995.Iv`$name,
yScaleFactors = 1 / 250
)
plotIndividualTimeProfile(dataCombined = myDataCombinedTranformations)
```
Finally, offsetting the observation values might be useful when working with measurements
of endogenous substrates, such as the hormone glucagon, and want to correct for the
individual specific baseline levels of the hormone.
## Extracting a combined data frame
The data frame (also sometimes called as a table) data structure is central to R-based workflows, and, thus, we may wish to extract a data frame for datasets present in the object.
Internally, `DataCombined` extracts data frames for observed and simulated datasets and combines them.
```{r}
myDataCombined$toDataFrame()
```
This function returns a [tibble data frame](https://r4ds.had.co.nz/tibbles.html#tibbles-vs.-data.frame). If you wish to modify how it is printed, you can have a look at the available options [here](https://pillar.r-lib.org/reference/pillar_options.html). In fact, let's change a few options and print the data frame again.
```{r, eval=FALSE}
options(
pillar.width = Inf, # show all columns
pillar.min_chars = Inf # to turn off truncation of column titles
)
myDataCombined$toDataFrame()
```
```{r, echo=FALSE}
# change these settings only temporarily
withr::with_options(
list(
pillar.width = Inf, # show all columns
pillar.min_chars = Inf # to turn off truncation of column titles
),
code = {
print(myDataCombined$toDataFrame())
}
)
```
`{ospsuite}` also provides a few helper functions to modify the data frame further.
When multiple (observed and/or simulated) datasets are present in a data frame, they are likely to have different units. `convertUnits()` function helps to convert them to a common unit. The function
will not modify the `DataCombined` object but return a new data frame with unified
units.^[Note that if you are using `DataCombined` object for plotting functions, you don't need to this conversion; the plotting functions will take care of this internally.].
```{r}
convertUnits(
myDataCombined,
xUnit = ospUnits$Time$s,
yUnit = ospUnits$`Concentration [mass]`$`µg/l`
)
```
## Further functionalities
Grouping of simulated results with observed data inside `DataCombined` also allows
to calculate the error between simulated results and observations using the
`calculateResiduals()` function. The function calculates the distance between
each observed data point within the group and the simulated results. Simulation
results interpolated if no simulated value is available for a certain observed
time point. Observed data beyond simulated time is ignored, i.e., no *extrapolation*
is performed.
There are two ways to calculate the distance (residuals) - linear and logarithmic -
specified by the `scaling` argument. For linear scaling (`scaling = tlf::Scaling$lin`):
> Residuals are calculated as: Simulation value - Observed value. This means that the residuals are defined by absolute differences.
For logarithmic scaling (`scaling = tlf::Scaling$log`):
> Residuals are calculated as: log(Simulation value) - log(Observed value) = log (Simulation Value / Observed Value). This means that the ratio of values is considered which is independent of the magnitude of the value. But for very small observed values, in particular close to 0 values, this can lead to problems, because log(10E-N) = -N can becomes large.
```{r}
# Linear residuals
calculateResiduals(myDataCombined, scaling = tlf::Scaling$lin)$residualValues
# Logarithmic residuals
calculateResiduals(myDataCombined, scaling = tlf::Scaling$log)$residualValues
```
To quickly calculate the total error of the `DataCombined`, one can sum up the
absolute values of the residuals:
```{r}
# Linear residuals
totalError <- sum(abs(calculateResiduals(myDataCombined, scaling = tlf::Scaling$lin)$residualValues))
print(totalError)
```
## Visualizations with `DataCombined`
See [Visualizations with `DataCombined`](data-combined-plotting.html) describing functions that can visualize data stored in `DataCombined` object.