Skip to content

hvdvm and vvm Class Usage

Oscar de Lama edited this page Feb 21, 2015 · 34 revisions

The hvdvm and the vvm classes process the raw image samples in different way, and correspondingly their output must also be interpreted in a different way. From the point of view of these classes implementation, this means they have very different digest() functions and the data in their var.df and cov.df variables have different meanings. However, from an operational point of view, everything else in these classes works in the same way and with the same semantics. Of course, everything derived from their data, as plot or a fitted models must be interpreted considering its source class.

This vignette describes the functionality in the hvdvm and the vvm classes with a common behaviour. Before to read this vignette we encourage you to read first the "hvdvm class usage" or the "vvm class usage" vignette, because this one is the follow up of both of them.

In the examples, instead of the my.hvdvm or the my.vvm object ---when possible--- we will use my.obj to represent an object that can be instance of hvdvm or the vvm class.

In the code examples, the function output is presented prefixed with #>, while a single prefix # represents real (user) comments.


Table of Contents


Getting Help about a Class Components

In the following sections we will focus on the concepts and general workflow using the hvdvm or the vvm class. If you want more complete information about technical details, as which parameters are available for a given function, please use the R help system. In this section we will see how to get help about those classes components.

To get the documentation of the new() function in the class hvdvm, one would expect the use of either of the following equivalent R commands:

    ?hvdvm$new
    help('hvdvm$new')

However, the first one won't work and will throw you an error. The second one will work just fine.

The first case ?hvdvm$new fails because the use of R6 classes in the style made for the development of the imgnoiser package is rather new and the R help system is not ready for them yet. To call the help documentation, use the second way as in help('hvdvm$new').

If you don't recall which functions are available in a class instance, just "print" it and you will get a list of the class members.

  my.vvm <- vvm$new()
  # This command will 'print' the instance
  my.vvm
  #> <vvm>
  #>   function:
  #>     digest()
  #>     digest.rgb()
  #>     exists.model()
  #>     fit.model()
  #>     get.model()
  #>     get.model.predictions()
  #>     initialize()
  #>     plot()
  #>     print()
  #>     print.model.summary()
  #>     remove.model()
  #>   variable:
  #>     channel.labels
  #>     cov.df
  #>     green.channels
  #>     merged.var.cov.df
  #>     model.list
  #>     RGGB.indices
  #>     var.df
  #>     wide.var.df

Now, if —for example— you want to know more detail about the fit.model() function, in the above list, —as we saw above— you can use:

    help('vvm$fit.model')

If you don't have a class instance, you can easily get an overview of the hvdvm class components, with the following command.

    ?hvdvm.class

You will get a list of the class functions with links to more detailed documentation of the hvdvm class components. The same applies to the vvm class.

Introduction to Model Fitting

The notion of model in the imgnoiser package involves two concepts:

  1. Model Data: The specific data which will be used as the observations over which we want to fit a model.
  2. Model Family: The mathematical model we want to fit over the data mentioned in the previous point.

Model Data Source

Actually, only one data source exists by default and its called the standard data model and its named std.var. This model comes from the variance result we got after calling to the hvdvm$digest() function.

Actually —by default— only one model data source exists and its named std.var. This model comes from the noise variance result we get after calling to the either of the digest() function, in the vvm or the hvdvm class. In particular, the std.var data model contains data from the var.df variable, where mean is the predictor variable (x) and var is the predicted one (y). These observations are grouped by channel.

You can add more data models at your will by changing the get.model.predictions option value.

When a function requires, as an argument the name of a model data source, by default, it will use the fit.model.data option value, which as you might guess, by default, is std.var.

Unless you provide more models, leave the default value for those model.data.name arguments, as we will do in the following examples.

Model Family

We can choose, among different statistical models, the one that we want to fit to the data source. We group those models by the R function that actually makes the fitting.

By default, the following model families, from the stat package, are available:

  • lm
  • lmrob
  • smooth.spline

As these fitting functions allow a wide variety of options, in fact directing them to different mathematical models or fitting procedures, we have a broader amount of options than may it seem at first glance. For example, you can use the lm() with weights.

When a function requires as an argument a model family, it will use by default the fit.model.data option value, which by default is lm.

You can add more fitting function at your taste by changing the fit.model.family option value.

Fitting a Model

We can start using a simple OLS (ordinary least squares) line fitting. To this effect we will leave the default value for all the hvdvm$fit.model() function arguments:

    # Fit the default 'lm' model and save it with the default 'standard' name.
    my.obj$fit.model()

    #> The model "standard" was succesfully fitted using:
    #> lm(formula = var ~ mean + I(mean^2), data = model.src("std.var"))

The output after the model fitting mentions the model family (lm), the formula (formula) and the source data (data = model.src("std.var")). If there would be other parameters in the calling to the lm() function, they will appear in this output. Everything sent to the fitting functions will appear in this output; nothing is sent "under the hood".

Notice we did not input a formula, it has been inferred from the source data model variables and by the default argument for the degree parameter of that function, which for the hvdvm class is 1 for the vvm one is 2. This argument indicates a linear relationship with the predicted variable var being predicted by a first or second degree polynomial on mean. If you set degree = 3, a cubic polynomial on mean will be fitted to predict the var variable, with 4 you will fit a bi-quadratic one and so on.

Instead of the degree you can directly set the formula argument for the fitting with the specific formula you want. In such cases, when you specify the formula, the degree argument is silently ignored.

Other of the default arguments in the call to the my.obj$fit.model() function we just did, is the name for the model we are fitting (model.name parameter). By default the function uses the fit.model.name option value; which by default is standard.

We can call the fit.model() function again but with an argument for the model.name parameter different to standard to do not overwrite the model we just fitted, or we can omit again this argument to overwrite that model.

Lets fit a robust linear model using as weight the reciprocal squared values of the mean:

    # This time we will set the model name and model family parameters
    my.object$fit.model(model.name = 'weighted', model.family = 'lmrob', weights=1/mean^2)

    #> The model "weighted" was succesfully fitted using:
    #> lmrob(formula = var ~ mean + I(mean^2), data = model.src("std.var"), weights = 1/mean^2)

Notice how we specify the weights argument for the lmrob function using the variable names in our model. All the parameters that don't correspond to the my.obj$fit.model() function are passed directly to the fitting function. This allows the use any and every parameter available for them.

Getting Model Predictions

We can get the predictions from the standard model we just fit, by calling the my.obj$get.model.predictions() function with the default arguments:

    # Get a data frame with the model predictions for the 'standard' model
    preds <- my.obj$get.model.predictions()
    head(preds)
Model predictions
mean var channel fit.se lcl ucl
5.317 2.937 Red 10.756 -163.03 168.9
10.306 5.246 Red 10.716 -160.71 171.2
19.976 9.723 Red 10.639 -156.20 175.6
38.717 18.406 Red 10.494 -147.47 184.3

The columns in the resulting data frame are:

  1. mean: The predictor variable.
  2. var: The predicted variable.
  3. channel: The grouping factor.
  4. fit.se: The prediction standard error.
  5. lcl: The prediction interval lower limit.
  6. ucl: The prediction interval upper limit.

The confidence limits are computed using as "confidence factor" the confid.factor option value, currently when we fit the model calling the fit.model() function. By default this factor is 1.96. You would set this option, to the value that suits you best, before calling the fit.model() function. However, given the predicted value and its standard error (sderr) in this function output, you can compute them again by yourself as in the following example.

    # Confidence limits using 2.5 as confidence factor
    preds$lcl <- preds$var - 2.5*preds$sderr
    preds$ucl <- preds$var + 2.5*preds$sderr
    head(preds)
Customized Confidence Limits
mean var channel sderr lcl ucl
15.58 12.03 Red 3.40 3.52 20.54
27.05 17.13 Red 3.37 8.70 25.55
46.97 25.98 Red 3.31 17.69 34.26
81.55 41.37 Red 3.22 33.31 49.43
141.59 68.17 Red 3.09 60.45 75.88
245.82 114.92 Red 2.92 107.63 122.21

These model predictions span the complete range of predictor values in the original data source:

    # Get the predictor range from predictions
    pred.ranges <- t(sapply(split(preds$mean, preds$channel), range))
    # Get the predictor range from source data
    src.ranges <- t(sapply(split(my.obj$var.df$mean, my.obj$var.df$channel), range))
    # Compare source and prediction ranges
    data.frame('pred.bot' = pred.ranges[,1], 'pred.top' = pred.ranges[,2],
               'src.bot'  = src.ranges[,1],  'src.top'  = src.ranges[,2])
Predictor vs Source data ranges
pred.bot pred.top src.bot src.top
Red 15.58 6733 15.58 6733
Green R 36.35 14967 36.35 14967
Green B 35.49 15018 35.49 15018
Blue 28.12 12258 28.12 12258
Green Avg 35.92 14992 35.92 14992

There are 30 predictions per channel in this function results. When possible, 20 of them are equally spaced in the range of the predictor source data, plus 10 of them logarithmic equally spaced in that range. If that is not possible because there are zero or negative values in the predictor source range, there will be 30 equally spaced predictor values per channel. This makes this predicted data suitable for a nice plotting, which we will do later.

To get the predictions from a fitting model not named standard, as the weighted model we fit before, we just have to mention its name in the call to the my.obj$get.model.predictions() function.

    # Get a data frame with the model predictions for the 'weighted' mode
    my.obj$get.model.predictions('weighted')
Model predictions: Weighted
mean var channel sderr lcl ucl
15.58 9.03 Red 0.14 8.75 9.30
27.05 14.22 Red 0.12 13.97 14.46
46.97 23.24 Red 0.11 23.01 23.46
81.55 38.91 Red 0.15 38.62 39.21
141.59 66.20 Red 0.28 65.65 66.76
245.82 113.77 Red 0.54 112.72 114.83

Model Summary

To get the model summary, we call the my.obj$print.model.summary() function.

As the source data is grouped by channel, there is a model fitted for each one of them. This fact gets reflected in the summary, which correspondingly, shows four sub-summaries.

    # Get a data frame with the model predictions
    my.obj$print.model.summary()
    #> #-------
    #> # Model Name: "standard"
    #> #
    #> # Call:
    #> # lm(formula = var ~ mean + I(mean^2), data = model.src("std.var"))
    #> #-------
    #> 
    #> ##--  channel  :  "Red" --##
    #> 
    #> Residuals:
    #>     Min      1Q  Median      3Q     Max 
    #> -197.46   -4.01   -2.54    2.35  260.82 
    #> 
    #> Coefficients:
    #>             Estimate Std. Error t value Pr(>|t|)
    #> (Intercept) 5.12e+00   3.45e+00    1.48     0.14
    #> mean        4.43e-01   5.49e-03   80.73   <2e-16
    #> I(mean^2)   1.29e-05   9.85e-07   13.12   <2e-16
    #> 
    #> Residual standard error: 41.4 on 237 degrees of freedom
    #> Multiple R-squared:  0.997,  Adjusted R-squared:  0.997 
    #> F-statistic: 4.28e+04 on 2 and 237 DF,  p-value: <2e-16
    #> 
    #> ##--  channel  :  "Green R" --##
    #> 
    #> Residuals:
    #>    Min     1Q Median     3Q    Max 
    #> -450.7   -6.7   -0.1    3.2  441.7 
    #> 
    #> Coefficients:
    #>              Estimate Std. Error t value Pr(>|t|)
    #> (Intercept) -1.91e-01   7.05e+00   -0.03     0.98
    #> mean         4.13e-01   5.05e-03   81.76   <2e-16
    #> I(mean^2)    6.16e-06   4.07e-07   15.15   <2e-16
    #> 
    #> Residual standard error: 84.6 on 237 degrees of freedom
    #> Multiple R-squared:  0.997,  Adjusted R-squared:  0.997 
    #> F-statistic: 4.61e+04 on 2 and 237 DF,  p-value: <2e-16
    #> 
    #> ##--  channel  :  "Green B" --##
    #> 
    #> Residuals:
    #>    Min     1Q Median     3Q    Max 
    #> -416.2   -4.1   -0.9    2.2  381.1 
    #> 
    #> Coefficients:
    #>             Estimate Std. Error t value Pr(>|t|)
    #> (Intercept) 2.30e+00   5.84e+00    0.39     0.69
    #> mean        4.06e-01   4.16e-03   97.69   <2e-16
    #> I(mean^2)   6.94e-06   3.34e-07   20.79   <2e-16
    #> 
    #> Residual standard error: 70 on 237 degrees of freedom
    #> Multiple R-squared:  0.998,  Adjusted R-squared:  0.998 
    #> F-statistic: 6.88e+04 on 2 and 237 DF,  p-value: <2e-16
    #> 
    #> #>--  channel  :  "Blue" --#>
    #> 
    #> Residuals:
    #>    Min     1Q Median     3Q    Max 
    #> -459.9   -7.0    6.3   10.5  429.2 
    #> 
    #> Coefficients:
    #>              Estimate Std. Error t value Pr(>|t|)
    #> (Intercept) -1.02e+01   6.20e+00   -1.64      0.1
    #> mean         5.05e-01   5.45e-03   92.58  < 2e-16
    #> I(mean^2)    2.81e-06   5.37e-07    5.24  3.6e-07
    #> 
    #> Residual standard error: 74.6 on 237 degrees of freedom
    #> Multiple R-squared:  0.997,  Adjusted R-squared:  0.997 
    #> F-statistic: 4.7e+04 on 2 and 237 DF,  p-value: <2e-16
    #> 
    #> ##--  channel  :  "Green Avg" --##
    #> 
    #> Residuals:
    #>    Min     1Q Median     3Q    Max 
    #> -290.9   -4.7    0.8    4.9  363.5 
    #> 
    #> Coefficients:
    #>              Estimate Std. Error t value Pr(>|t|)
    #> (Intercept) -7.53e-01   6.07e+00   -0.12      0.9
    #> mean         4.14e-01   4.33e-03   95.46   <2e-16
    #> I(mean^2)    6.83e-06   3.49e-07   19.61   <2e-16
    #> 
    #> Residual standard error: 72.8 on 237 degrees of freedom
    #> Multiple R-squared:  0.998,  Adjusted R-squared:  0.998 
    #> F-statistic: 6.49e+04 on 2 and 237 DF,  p-value: <2e-16

Because we did not specify a model name in the call to the my.obj$print.model.summary() function, it has returned the standard model summary. To get the summary for the weighted model we fitted before, we must call the my.obj$print.model.summary() function specifying that model name.

By default this function prints the summary of each fitted channel. You can use the select parameter to choose which specific models summaries you want. You can pick them by index value (between 1 and 4) or by channel label. For example, lets get the summary for the Blue channel in the weighted model:

    my.obj$print.model.summary('weighted', select='Blue')
    #> #-------
    #> # Model Name: "weighted"
    #> #
    #> # Call:
    #> # lmrob(formula = var ~ mean + I(mean^2), data = model.src("std.var"), weights = 1/mean^2)
    #> #-------
    #> 
    #> ##--  channel  :  "Blue" --##
    #> 
    #> Residuals:
    #>      Min       1Q   Median       3Q      Max 
    #> -561.250   -4.010    0.127    5.529  369.654 
    #> 
    #> Coefficients:
    #>             Estimate Std. Error t value Pr(>|t|)
    #> (Intercept) 1.71e+00   2.42e-01    7.07  1.8e-11
    #> mean        4.73e-01   2.54e-03  186.69  < 2e-16
    #> I(mean^2)   6.58e-06   7.37e-07    8.93  < 2e-16
    #> 
    #> Robust residual standard error: 0.0234 
    #> Multiple R-squared:  0.997,  Adjusted R-squared:  0.997 
    #> Convergence in 11 IRWLS iterations
    #> 
    #> Robustness weights: 
    #>  15 weights are ~= 1. The remaining 225 ones are summarized as
    #>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    #>   0.323   0.889   0.951   0.912   0.985   0.999 
    #> Algorithmic parameters: 
    #> tuning.chi         bb tuning.psi refine.tol    rel.tol  solve.tol 
    #>   1.55e+00   5.00e-01   4.69e+00   1.00e-07   1.00e-07   1.00e-07 
    #>      nResample         max.it       best.r.s       k.fast.s          k.max 
    #>            500             50              2              1            200 
    #>    maxit.scale      trace.lev            mts     compute.rd fast.s.large.n 
    #>            200              0           1000              0           2000 
    #>           psi   subsampling           cov 
    #>    "bisquare" "nonsingular" ".vcov.avar1" 
    #> seed : int(0)

This function calls the base::summary() function corresponding to the model family used for the model fitting. Those functions accept other parameters you can also specify when calling to the my.obj$print.model.summary() function.

Model Objects

The R functions that actually fit the models, like lm() or lmrob(), return an object which can be used in other R functions supporting them.

For example, the function car::avPlots() Added-Variable Plots constructs and plots added-variable (also called partial-regression) models as an analysis tool to check the good fitness of each component in a linear model. We can use that function to find out if the linear and quadratic component mean component in the vvm models have a good fitness. However, to be able to do that, we need the object returned by the fitting functions we mentioned above.

The function get.model() returns a list with the model objects for every fitted channels in a given model. As with the print.model.summary() function, we can select (select parameter) the channel(s) whose model we want, and as usual, if we don't provide a model name the one in the fit.model.name option value will be used.

    # Get a list with the 'standard' model fitting objects
    std.model <- my.obj$get.model()
    class(std.model)
    #> [1] "list"
    names(std.model)
    #> [1] "Red"       "Green R"   "Green B"   "Blue"      "Green Avg"

Now we can use the Red channel model object to check it with the avPlots() function.

car::avPlots(std.model[['Red']])

avPlots

The my.obj$get.model() function has a select parameter to get a list with only selected models, i.e. for specific channels. We can select the channel by label or by index. For example, we will use the confint() function, to get the confidence intervals for the coefficients in the regression for the Red channel.

    # Confidence intervals for the 'Red' model parameters (from the 'standard' fitting)
    models <- my.obj$get.model(select='Red')
    confint(models[[1]])
2.5 % 97.5 %
(Intercept) 6.56 12.59
mean 0.43 0.43

List, Query and Remove Fitted Models

The hvdvm$model.list variable provides a list with the models already fitted.

    # List the models fitted so far.
    my.obj$model.list
    #> $standard
    #> [1] "lm(formula = var ~ mean, data = model.src(\"std.var\"))"
    #> 
    #> $weighted
    #> [1] "lmrob(formula = var ~ mean, data = model.src(\"std.var\"), weights = 1/mean^2)"

The my.obj$exists.model function tests if a model with a given name already exists .

    # We have a model named 'weighted'
    my.obj$exists.model('weighted')
    #> lmrob(formula = var ~ mean, data = model.src("std.var"), weights = 1/mean^2)
    #> [1] TRUE
    # But we dont have other one called 'log-log'
    my.obj$exists.model('log-log')
    #> The model"log-log" does not exists.
    #> [1] FALSE

To remove a model, we call the, my.obj$remove.model() function. In this case, for security reasons, there is no default model name, you must specify it.

    # We have a model named 'standard' ...
    my.obj$exists.model('standard')
    #> lm(formula = var ~ mean, data = model.src("std.var"))
    #> [1] TRUE
    # .. but we don't want it any more
    my.obj$remove.model('standard')
    #> The model with name"standard"has been removed.
    # Now, it doesn't exists
    my.obj$exists.model('standard')
    #> The model"standard" does not exists.
    #> [1] FALSE

Plotting

There is nothing like a good plot to let the data speak by itself, showing us its features, validating our assumptions or raising new questions.

With the data we get through the variables my.obj$var.df, my.obj$cov.df, hvdvm$photo.conditions.df and the my.obj$get.model.predictions() function, we can build any kind of plot we want. However, the function my.obj$plot() provides an easy way to plot the data and the predictions of a model already fitted, either together or individually.

By default, without any argument, the my.obj$plot() function just plots the standard data model observations. Using the model.name parameter, we can include a fitted model in the plot. As a convenience, if we set this parameter to TRUE, the default model is plotted. As we already saw, this is the one with the same name as the fit.model.name option value, which by default is standard.

    # Rebuild the previously deleted 'standard' model
    my.obj$fit.model()
    #> The model "standard" was succesfully fitted using:
    #> lm(formula = var ~ mean, data = model.src("std.var"))
    # Plot the 'standard' model
    my.obj$plot(model.name = TRUE)

the 'standard' model

For the model.name parameter, instead of TRUE, you can use the name of the fitted model you want to plot.

The parameters fit and confid are relevant only when a fitted model is included in the plot. The fit parameter set to TRUE draws a line over the values predicted by the model. The confid parameter set to TRUE draws a semi-transparent area between the confidence limits of the predicted values. By default they are set to TRUE and FALSE respectively.

Using the xlim and `ylim parameters, you can force the my.obj$plot() function to draw only the particular area of the plot you are interested in. They must be a vector of two numeric values: the lower and upper limit of the corresponding axis.

You can also specify the axis labels with the parameters xlab and ylab. Furthermore, you can set the main and secondary titles with the main and subt parameters. For further customization you can set the plot.point.size and plot.line.width option values, which are used for the size of the points and the width of the lines in the plot. The color.pallette option is a vector with the color pallette used in the plots. You can set it to NULL to use the ggplot2 default pallette or to whichever colors you want in the plots.

The my.obj$plot() function returns a ggplot2::ggplot object which you can use to customize the plot at your taste. To this effect tou can set the print parameter to FALSE to avoid the plot rendering, customize the received ggplot object, and print it whenever you are ready.

For example, the standard and the weighted plots are very much alike to each other, except in their interception to the y axis, close to the axes origin. To notice this, we will "plot" both models without printing them, with the x axis limits to the 0, 300 range and the y axis to 0, 150. We will call to this function, using those plot objects as arguments, to draw both plots sharing the same title, y axis label and legend.

    # Get the OLS linear regression plot
    std.plot <- my.obj$plot(model.name=TRUE, print=FALSE, 
                      xlab='Mean', ylab=NULL, main=NULL, xlim=c(0,300), ylim=c(0, 150))

    # Get the weighted, robust, linear regression plot
    weighted.plot <- my.obj$plot(model.name='weighted', print=FALSE, 
                      xlab='Mean', ylab=NULL, main=NULL, xlim=c(0,300), ylim=c(0, 150))

    # Use both plot objects for detailed customization.
    shared.legend(std.plot
                 ,weighted.plot
                 ,'Two plots sharing title, y label and legend'
                 ,'Variance (ADU^2)'
                 )

Two plots sharing title, y label and legend

Now we can clearly notice how the OLS linear regression, misses the data trend close to the origin, while the weighted one correctly follows it.

Persistence

Using the imgnoiser.save() function we can save an instance object to a disk file, and later, in another R session, retrieve it with the function imgnoiser.load(). The files saved to disk have .imn as name extension.

By default when we save an object, we also save in the same file all the imgnoiser package options. Correspondingly, when we load an object, by default, we also load all the imgnoiser package options. The save.options and load.options parameters in those functions allow us to change that behavior and save or load the package options only when desired.

    # Save the my.hvdvm object, and the package options, in a file named 'my.hvdvm'
    imgnoiser.save(my.hvdvm)
    # Save the my.hvdvm object in a particular path and file name without
    # saving also the package options
    imgnoiser.save(my.hvdvm, 'path/to/my.object', save.options = FALSE)

    # Some days later, load the saved object, in another R session 
    hvdvm.obj <- imgnoiser.load('path/to/my.object')
    # Load the object but not the package options
    hvdvm.obj <- imgnoiser.load('path/to/my.object', load.options = FALSE)

Additionally we can save or load just the package options —all of them— with the imgnoiser.options.save() and imgnoiser.options.load() functions.

    # Save the current package options
    imgnoiser.options.save('this.options')

    # In another R session, load the saved package options
    imgnoiser.options.load('this.options')

It is not required to specify the extension in the file names, if actually it is not .imn, that extension will be appended to the given file name.