-
Notifications
You must be signed in to change notification settings - Fork 3
hvdvm and vvm Class Usage
The hvdvm
and the vvm
classes process the raw image samples in different way, and correspondingly their output must also be interpreted in a different way. From the point of view of these classes implementation, this means they have very different digest()
functions and the data in their var.df
and cov.df
variables have different meanings. However, from an operational point of view, everything else in these classes works in the same way and with the same semantics. Of course, everything derived from their data, as plot or a fitted models must be interpreted considering its source class.
This vignette describes the functionality in the hvdvm and the vvm classes with a common behaviour. Before to read this vignette we encourage you to read first the "hvdvm class usage" or the "vvm class usage" vignette, because this one is the follow up of both of them.
In the examples, instead of the my.hvdvm
or the my.vvm
object ---when possible--- we will use my.obj
to represent an object that can be instance of hvdvm
or the vvm
class.
In the code examples, the function output is presented prefixed with #>
, while a single prefix #
represents real (user) comments.
- Getting Help about a Class components
- Introduction to Model Fitting
- Fitting a Model
- Plotting
- Persistence
In the following sections we will focus on the concepts and general
workflow using the hvdvm
or the vvm
class. If you want more complete
information about technical details, as which parameters are available
for a given function, please use the R help system. In this section we
will see how to get help about those classes components.
To get the documentation of the new()
function in the class hvdvm
,
one would expect the use of either of the following equivalent R
commands:
?hvdvm$new
help('hvdvm$new')
However, the first one won't work and will throw you an error. The second one will work just fine.
The first case ?hvdvm$new
fails because the use of R6 classes in the
style made for the development of the imgnoiser
package is rather new
and the R help system is not ready for them yet. To call the help
documentation, use the second way as in help('hvdvm$new')
.
If you don't recall which functions are available in a class instance, just "print" it and you will get a list of the class members.
my.vvm <- vvm$new()
# This command will 'print' the instance
my.vvm
#> <vvm>
#> function:
#> digest()
#> digest.rgb()
#> exists.model()
#> fit.model()
#> get.model()
#> get.model.predictions()
#> initialize()
#> plot()
#> print()
#> print.model.summary()
#> remove.model()
#> variable:
#> channel.labels
#> cov.df
#> green.channels
#> merged.var.cov.df
#> model.list
#> RGGB.indices
#> var.df
#> wide.var.df
Now, if —for example— you want to know more detail about the
fit.model()
function, in the above list, —as we saw above— you can
use:
help('vvm$fit.model')
If you don't have a class instance, you can easily get an overview of
the hvdvm
class components, with the following command.
?hvdvm.class
You will get a list of the class functions with links to more detailed
documentation of the hvdvm
class components. The same applies to the
vvm
class.
The notion of model in the imgnoiser
package involves two concepts:
- Model Data: The specific data which will be used as the observations over which we want to fit a model.
- Model Family: The mathematical model we want to fit over the data mentioned in the previous point.
Actually, only one data source exists by default and its called the
standard data model and its named std.var
. This model comes from the
variance result we got after calling to the hvdvm$digest()
function.
Actually —by default— only one model data source exists and its named std.var
. This model comes from the noise variance result we get after calling to the either of the digest()
function, in the vvm
or the hvdvm
class. In particular, the std.var
data model contains data from the var.df
variable, where mean
is the predictor variable (x) and var
is the predicted one (y). These observations are grouped by channel
.
You can add more data models at your will by changing the
get.model.predictions
option value.
When a function requires, as an argument the name
of a model data source, by default, it will use the fit.model.data
option
value, which as you might guess, by default, is std.var
.
Unless you provide more models, leave the default value for those
model.data.name
arguments, as we will do in the following examples.
We can choose, among different statistical models, the one that we want to fit to the data source. We group those models by the R function that actually makes the fitting.
By default, the following model families, from the stat
package, are
available:
lm
lmrob
smooth.spline
As these fitting functions allow a wide variety of options, in fact
directing them to different mathematical models or fitting procedures,
we have a broader amount of options than may it seem at first glance.
For example, you can use the lm()
with weights.
When a function requires as an argument a model family, it will use by
default the fit.model.data
option value, which by default is lm
.
You can add more fitting function at your taste by changing the
fit.model.family
option value.
We can start using a simple OLS (ordinary least squares) line fitting.
To this effect we will leave the default value for all the
hvdvm$fit.model()
function arguments:
# Fit the default 'lm' model and save it with the default 'standard' name.
my.obj$fit.model()
#> The model "standard" was succesfully fitted using:
#> lm(formula = var ~ mean + I(mean^2), data = model.src("std.var"))
The output after the model fitting mentions the model family (lm
), the
formula (formula
) and the source data (data = model.src("std.var")
).
If there would be other parameters in the calling to the lm()
function, they will appear in this output. Everything sent to the
fitting functions will appear in this output; nothing is sent "under the
hood".
Notice we did not input a formula, it has been inferred from the source
data model variables and by the default argument for the degree
parameter of that function, which
for the hvdvm
class is 1
for the vvm
one is 2
. This argument
indicates a linear relationship with the predicted variable var
being
predicted by a first or second degree polynomial on mean
. If you set
degree = 3
, a cubic polynomial on mean
will be fitted to predict the
var
variable, with 4
you will fit a bi-quadratic one and so on.
Instead of the degree
you can directly set the formula
argument for
the fitting with the specific formula you want. In such cases, when you
specify the formula, the degree
argument is silently ignored.
Other of the default arguments in the call to the my.obj$fit.model()
function we just did, is the name for the model we are fitting
(model.name
parameter). By default the function uses the
fit.model.name
option value; which by default is standard
.
We can call the fit.model()
function again but with an argument for the model.name
parameter different to standard
to do not overwrite the model we just fitted, or we can omit again this argument to overwrite that model.
Lets fit a robust linear model using as weight the reciprocal squared values of the mean:
# This time we will set the model name and model family parameters
my.object$fit.model(model.name = 'weighted', model.family = 'lmrob', weights=1/mean^2)
#> The model "weighted" was succesfully fitted using:
#> lmrob(formula = var ~ mean + I(mean^2), data = model.src("std.var"), weights = 1/mean^2)
Notice how we specify the weights
argument for the lmrob
function
using the variable names in our model. All the parameters that don't
correspond to the my.obj$fit.model()
function are passed directly to
the fitting function. This allows the use any and every parameter
available for them.
We can get the predictions from the standard
model we just fit, by
calling the my.obj$get.model.predictions()
function with the default
arguments:
# Get a data frame with the model predictions for the 'standard' model
preds <- my.obj$get.model.predictions()
head(preds)
mean | var | channel | fit.se | lcl | ucl |
---|---|---|---|---|---|
5.317 | 2.937 | Red | 10.756 | -163.03 | 168.9 |
10.306 | 5.246 | Red | 10.716 | -160.71 | 171.2 |
19.976 | 9.723 | Red | 10.639 | -156.20 | 175.6 |
38.717 | 18.406 | Red | 10.494 | -147.47 | 184.3 |
The columns in the resulting data frame are:
-
mean
: The predictor variable. -
var
: The predicted variable. -
channel
: The grouping factor. -
fit.se
: The prediction standard error. -
lcl
: The prediction interval lower limit. -
ucl
: The prediction interval upper limit.
The confidence limits are computed using as "confidence factor" the
confid.factor
option value, currently when we fit the model calling
the fit.model()
function. By default this factor is 1.96
. You would
set this option, to the value that suits you best, before calling the
fit.model()
function. However, given the predicted value and its
standard error (sderr
) in this function output, you can compute them
again by yourself as in the following example.
# Confidence limits using 2.5 as confidence factor
preds$lcl <- preds$var - 2.5*preds$sderr
preds$ucl <- preds$var + 2.5*preds$sderr
head(preds)
mean | var | channel | sderr | lcl | ucl |
---|---|---|---|---|---|
15.58 | 12.03 | Red | 3.40 | 3.52 | 20.54 |
27.05 | 17.13 | Red | 3.37 | 8.70 | 25.55 |
46.97 | 25.98 | Red | 3.31 | 17.69 | 34.26 |
81.55 | 41.37 | Red | 3.22 | 33.31 | 49.43 |
141.59 | 68.17 | Red | 3.09 | 60.45 | 75.88 |
245.82 | 114.92 | Red | 2.92 | 107.63 | 122.21 |
These model predictions span the complete range of predictor values in the original data source:
# Get the predictor range from predictions
pred.ranges <- t(sapply(split(preds$mean, preds$channel), range))
# Get the predictor range from source data
src.ranges <- t(sapply(split(my.obj$var.df$mean, my.obj$var.df$channel), range))
# Compare source and prediction ranges
data.frame('pred.bot' = pred.ranges[,1], 'pred.top' = pred.ranges[,2],
'src.bot' = src.ranges[,1], 'src.top' = src.ranges[,2])
pred.bot | pred.top | src.bot | src.top | |
---|---|---|---|---|
Red | 15.58 | 6733 | 15.58 | 6733 |
Green R | 36.35 | 14967 | 36.35 | 14967 |
Green B | 35.49 | 15018 | 35.49 | 15018 |
Blue | 28.12 | 12258 | 28.12 | 12258 |
Green Avg | 35.92 | 14992 | 35.92 | 14992 |
There are 30 predictions per channel in this function results. When possible, 20 of them are equally spaced in the range of the predictor source data, plus 10 of them logarithmic equally spaced in that range. If that is not possible because there are zero or negative values in the predictor source range, there will be 30 equally spaced predictor values per channel. This makes this predicted data suitable for a nice plotting, which we will do later.
To get the predictions from a fitting model not named standard
, as the
weighted
model we fit before, we just have to mention its name in the
call to the my.obj$get.model.predictions()
function.
# Get a data frame with the model predictions for the 'weighted' mode
my.obj$get.model.predictions('weighted')
mean | var | channel | sderr | lcl | ucl |
---|---|---|---|---|---|
15.58 | 9.03 | Red | 0.14 | 8.75 | 9.30 |
27.05 | 14.22 | Red | 0.12 | 13.97 | 14.46 |
46.97 | 23.24 | Red | 0.11 | 23.01 | 23.46 |
81.55 | 38.91 | Red | 0.15 | 38.62 | 39.21 |
141.59 | 66.20 | Red | 0.28 | 65.65 | 66.76 |
245.82 | 113.77 | Red | 0.54 | 112.72 | 114.83 |
To get the model summary, we call the my.obj$print.model.summary()
function.
As the source data is grouped by channel
, there is a model fitted for
each one of them. This fact gets reflected in the summary, which
correspondingly, shows four sub-summaries.
# Get a data frame with the model predictions
my.obj$print.model.summary()
#> #-------
#> # Model Name: "standard"
#> #
#> # Call:
#> # lm(formula = var ~ mean + I(mean^2), data = model.src("std.var"))
#> #-------
#>
#> ##-- channel : "Red" --##
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -197.46 -4.01 -2.54 2.35 260.82
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 5.12e+00 3.45e+00 1.48 0.14
#> mean 4.43e-01 5.49e-03 80.73 <2e-16
#> I(mean^2) 1.29e-05 9.85e-07 13.12 <2e-16
#>
#> Residual standard error: 41.4 on 237 degrees of freedom
#> Multiple R-squared: 0.997, Adjusted R-squared: 0.997
#> F-statistic: 4.28e+04 on 2 and 237 DF, p-value: <2e-16
#>
#> ##-- channel : "Green R" --##
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -450.7 -6.7 -0.1 3.2 441.7
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -1.91e-01 7.05e+00 -0.03 0.98
#> mean 4.13e-01 5.05e-03 81.76 <2e-16
#> I(mean^2) 6.16e-06 4.07e-07 15.15 <2e-16
#>
#> Residual standard error: 84.6 on 237 degrees of freedom
#> Multiple R-squared: 0.997, Adjusted R-squared: 0.997
#> F-statistic: 4.61e+04 on 2 and 237 DF, p-value: <2e-16
#>
#> ##-- channel : "Green B" --##
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -416.2 -4.1 -0.9 2.2 381.1
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 2.30e+00 5.84e+00 0.39 0.69
#> mean 4.06e-01 4.16e-03 97.69 <2e-16
#> I(mean^2) 6.94e-06 3.34e-07 20.79 <2e-16
#>
#> Residual standard error: 70 on 237 degrees of freedom
#> Multiple R-squared: 0.998, Adjusted R-squared: 0.998
#> F-statistic: 6.88e+04 on 2 and 237 DF, p-value: <2e-16
#>
#> #>-- channel : "Blue" --#>
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -459.9 -7.0 6.3 10.5 429.2
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -1.02e+01 6.20e+00 -1.64 0.1
#> mean 5.05e-01 5.45e-03 92.58 < 2e-16
#> I(mean^2) 2.81e-06 5.37e-07 5.24 3.6e-07
#>
#> Residual standard error: 74.6 on 237 degrees of freedom
#> Multiple R-squared: 0.997, Adjusted R-squared: 0.997
#> F-statistic: 4.7e+04 on 2 and 237 DF, p-value: <2e-16
#>
#> ##-- channel : "Green Avg" --##
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -290.9 -4.7 0.8 4.9 363.5
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -7.53e-01 6.07e+00 -0.12 0.9
#> mean 4.14e-01 4.33e-03 95.46 <2e-16
#> I(mean^2) 6.83e-06 3.49e-07 19.61 <2e-16
#>
#> Residual standard error: 72.8 on 237 degrees of freedom
#> Multiple R-squared: 0.998, Adjusted R-squared: 0.998
#> F-statistic: 6.49e+04 on 2 and 237 DF, p-value: <2e-16
Because we did not specify a model name in the call to the
my.obj$print.model.summary()
function, it has returned the standard
model summary. To get the summary for the weighted
model we fitted
before, we must call the my.obj$print.model.summary()
function
specifying that model name.
By default this function prints the summary of each fitted channel
.
You can use the select
parameter to choose which specific models
summaries you want. You can pick them by index value (between 1 and 4)
or by channel
label. For example, lets get the summary for the Blue
channel in the weighted
model:
my.obj$print.model.summary('weighted', select='Blue')
#> #-------
#> # Model Name: "weighted"
#> #
#> # Call:
#> # lmrob(formula = var ~ mean + I(mean^2), data = model.src("std.var"), weights = 1/mean^2)
#> #-------
#>
#> ##-- channel : "Blue" --##
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -561.250 -4.010 0.127 5.529 369.654
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1.71e+00 2.42e-01 7.07 1.8e-11
#> mean 4.73e-01 2.54e-03 186.69 < 2e-16
#> I(mean^2) 6.58e-06 7.37e-07 8.93 < 2e-16
#>
#> Robust residual standard error: 0.0234
#> Multiple R-squared: 0.997, Adjusted R-squared: 0.997
#> Convergence in 11 IRWLS iterations
#>
#> Robustness weights:
#> 15 weights are ~= 1. The remaining 225 ones are summarized as
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.323 0.889 0.951 0.912 0.985 0.999
#> Algorithmic parameters:
#> tuning.chi bb tuning.psi refine.tol rel.tol solve.tol
#> 1.55e+00 5.00e-01 4.69e+00 1.00e-07 1.00e-07 1.00e-07
#> nResample max.it best.r.s k.fast.s k.max
#> 500 50 2 1 200
#> maxit.scale trace.lev mts compute.rd fast.s.large.n
#> 200 0 1000 0 2000
#> psi subsampling cov
#> "bisquare" "nonsingular" ".vcov.avar1"
#> seed : int(0)
This function calls the base::summary()
function corresponding to the
model family used for the model fitting. Those functions accept other
parameters you can also specify when calling to the
my.obj$print.model.summary()
function.
The R functions that actually fit the models, like lm()
or lmrob()
,
return an object which can be used in other R functions supporting them.
For example, the function car::avPlots()
Added-Variable Plots constructs and plots added-variable (also called partial-regression) models as an analysis tool to check the good fitness of each component in a linear model. We can use that function to find out if the linear and quadratic component mean
component in the vvm
models have a good fitness. However, to be able to do that, we need the object returned by the fitting functions we mentioned above.
The function get.model()
returns a list with the model objects for every fitted channels in a given model. As with the print.model.summary()
function, we can select (select
parameter) the channel(s) whose model we want, and as usual, if we don't provide a model name the one in the fit.model.name
option value will be used.
# Get a list with the 'standard' model fitting objects
std.model <- my.obj$get.model()
class(std.model)
#> [1] "list"
names(std.model)
#> [1] "Red" "Green R" "Green B" "Blue" "Green Avg"
Now we can use the Red
channel model object to check it with the avPlots()
function.
car::avPlots(std.model[['Red']])
The my.obj$get.model()
function has a select
parameter to get a list
with only selected models, i.e. for specific channels. We can select the
channel by label or by index. For example, we will use the confint()
function, to get the confidence intervals for the coefficients in the
regression for the Red
channel.
# Confidence intervals for the 'Red' model parameters (from the 'standard' fitting)
models <- my.obj$get.model(select='Red')
confint(models[[1]])
2.5 % | 97.5 % | |
---|---|---|
(Intercept) | 6.56 | 12.59 |
mean | 0.43 | 0.43 |
The hvdvm$model.list
variable provides a list with the models already
fitted.
# List the models fitted so far.
my.obj$model.list
#> $standard
#> [1] "lm(formula = var ~ mean, data = model.src(\"std.var\"))"
#>
#> $weighted
#> [1] "lmrob(formula = var ~ mean, data = model.src(\"std.var\"), weights = 1/mean^2)"
The my.obj$exists.model
function tests if a model with a given name already exists .
# We have a model named 'weighted'
my.obj$exists.model('weighted')
#> lmrob(formula = var ~ mean, data = model.src("std.var"), weights = 1/mean^2)
#> [1] TRUE
# But we dont have other one called 'log-log'
my.obj$exists.model('log-log')
#> The model"log-log" does not exists.
#> [1] FALSE
To remove a model, we call the, my.obj$remove.model()
function. In
this case, for security reasons, there is no default model name, you must specify it.
# We have a model named 'standard' ...
my.obj$exists.model('standard')
#> lm(formula = var ~ mean, data = model.src("std.var"))
#> [1] TRUE
# .. but we don't want it any more
my.obj$remove.model('standard')
#> The model with name"standard"has been removed.
# Now, it doesn't exists
my.obj$exists.model('standard')
#> The model"standard" does not exists.
#> [1] FALSE
There is nothing like a good plot to let the data speak by itself, showing us its features, validating our assumptions or raising new questions.
With the data we get through the variables my.obj$var.df
,
my.obj$cov.df
, hvdvm$photo.conditions.df
and the
my.obj$get.model.predictions()
function, we can build any kind of plot
we want. However, the function my.obj$plot()
provides an easy way to
plot the data and the predictions of a model already fitted, either together or
individually.
By default, without any argument, the my.obj$plot()
function just
plots the standard data model observations. Using the model.name
parameter, we can include a fitted model in the plot. As a convenience,
if we set this parameter to TRUE
, the default model is plotted.
As we already saw, this is the one with the same name as the
fit.model.name
option value, which by default is standard
.
# Rebuild the previously deleted 'standard' model
my.obj$fit.model()
#> The model "standard" was succesfully fitted using:
#> lm(formula = var ~ mean, data = model.src("std.var"))
# Plot the 'standard' model
my.obj$plot(model.name = TRUE)
For the model.name
parameter, instead of TRUE, you can use the name of
the fitted model you want to plot.
The parameters fit
and confid
are relevant only when a fitted model
is included in the plot. The fit
parameter set to TRUE draws a line
over the values predicted by the model. The confid
parameter set to
TRUE
draws a semi-transparent area between the confidence limits of
the predicted values. By default they are set to TRUE
and FALSE
respectively.
Using the xlim
and `ylim
parameters, you can force the
my.obj$plot()
function to draw only the particular area of the plot
you are interested in. They must be a vector of two numeric values: the
lower and upper limit of the corresponding axis.
You can also specify the axis labels with the parameters xlab
and
ylab
. Furthermore, you can set the main and secondary titles with the
main
and subt
parameters. For further customization you can set the
plot.point.size
and plot.line.width
option values, which are used
for the size of the points and the width of the lines in the plot. The
color.pallette
option is a vector with the color pallette used in the
plots. You can set it to NULL to use the ggplot2 default pallette or to
whichever colors you want in the plots.
The my.obj$plot()
function returns a ggplot2::ggplot
object which
you can use to customize the plot at your taste. To this effect tou can
set the print
parameter to FALSE
to avoid the plot rendering,
customize the received ggplot object, and print it whenever you are
ready.
For example, the standard
and the weighted
plots are very much alike
to each other, except in their interception to the y
axis, close to
the axes origin. To notice this, we will "plot" both models without
printing them, with the x
axis limits to the 0, 300
range and the y
axis to 0, 150
. We will call to this
function,
using those plot objects as arguments, to draw both plots sharing the
same title, y
axis label and legend.
# Get the OLS linear regression plot
std.plot <- my.obj$plot(model.name=TRUE, print=FALSE,
xlab='Mean', ylab=NULL, main=NULL, xlim=c(0,300), ylim=c(0, 150))
# Get the weighted, robust, linear regression plot
weighted.plot <- my.obj$plot(model.name='weighted', print=FALSE,
xlab='Mean', ylab=NULL, main=NULL, xlim=c(0,300), ylim=c(0, 150))
# Use both plot objects for detailed customization.
shared.legend(std.plot
,weighted.plot
,'Two plots sharing title, y label and legend'
,'Variance (ADU^2)'
)
Now we can clearly notice how the OLS linear regression, misses the data trend close to the origin, while the weighted one correctly follows it.
Using the imgnoiser.save()
function we can save an instance object to
a disk file, and later, in another R session, retrieve it with the
function imgnoiser.load()
. The files saved to disk have .imn
as name
extension.
By default when we save an object, we also save in the same file all the
imgnoiser
package options. Correspondingly, when we load an object, by
default, we also load all the imgnoiser
package options. The
save.options
and load.options
parameters in those functions allow us
to change that behavior and save or load the package options only when
desired.
# Save the my.hvdvm object, and the package options, in a file named 'my.hvdvm'
imgnoiser.save(my.hvdvm)
# Save the my.hvdvm object in a particular path and file name without
# saving also the package options
imgnoiser.save(my.hvdvm, 'path/to/my.object', save.options = FALSE)
# Some days later, load the saved object, in another R session
hvdvm.obj <- imgnoiser.load('path/to/my.object')
# Load the object but not the package options
hvdvm.obj <- imgnoiser.load('path/to/my.object', load.options = FALSE)
Additionally we can save or load just the package options —all of them—
with the imgnoiser.options.save()
and imgnoiser.options.load()
functions.
# Save the current package options
imgnoiser.options.save('this.options')
# In another R session, load the saved package options
imgnoiser.options.load('this.options')
It is not required to specify the extension in the file names, if
actually it is not .imn
, that extension will be appended to the given
file name.
↵