#' @title Create a Custom CPO Constructor #' #' @description #' \code{makeCPO} creates a \emph{Feature Operation} \code{\link{CPOConstructor}}, i.e. a constructor for a \code{\link{CPO}} that will #' operate on feature columns. \code{makeCPOTargetOp} creates a \emph{Target Operation} \code{\link{CPOConstructor}}, which #' creates \code{\link{CPO}}s that operate on the target column. \code{makeCPORetrafoless} creates a \emph{Retrafoless} \code{\link{CPOConstructor}}, #' which creates \code{\link{CPO}}s that may operate on both feature and target columns, but have no retrafo operation. See \link{OperatingType} for further #' details on the distinction of these. \code{makeCPOExtendedTrafo} creates a \emph{Feature Operation} \code{\link{CPOConstructor}} that #' has slightly more flexibility in its data transformation behaviour than \code{makeCPO} (but is otherwise identical). #' \code{makeCPOExtendedTargetOp} creates a \emph{Target Operation} \code{\link{CPOConstructor}} that has slightly more flexibility in its #' data transformation behaviour than \code{makeCPOTargetOp} but is otherwise identical. #' #' See example section for some simple custom CPO. #' #' @section CPO Internals: #' The mlrCPO package offers a powerful framework for handling the tasks necessary for preprocessing, so that the user, when creating custom CPOs, #' can focus on the actual data transformations to perform. It is, however, useful to understand \emph{what} it is that the framework does, and how #' the process can be influenced by the user during CPO definition or application. Aspects of preprocessing that the user needs to influence are: #' \describe{ #' \item{\strong{Operating Type}}{ #' The core of preprocessing is the actual transformation being performed. In the most general sense, there are three points in a machine #' learning pipeline that preprocessing can influence. #' \enumerate{ #' \item Transformation of training data \emph{before model fitting}, done in mlr using \code{\link[mlr]{train}}. In the CPO framework #' (\emph{when not using a \code{\link{CPOLearner}} which makes all of these steps transparent to the user}), this is #' done by a \code{\link{CPO}}. #' \item transformation of new validation or prediction data that is given to the fitted model for \emph{prediction}, done using #' \code{\link[stats]{predict}}. This is done by a \code{\link{CPORetrafo}} retrieved using \code{\link{retrafo}} from the result of step 1. #' \item transformation of the predictions made to invert the transformation of the target values done in step 1, which is done using #' the \code{\link{CPOInverter}} retrieved using \code{\link{inverter}} from the result of step 2. #' } #' The framework poses restrictions on primitive (i.e. not compound using \code{\link{composeCPO}}) \code{\link{CPO}}s to simplify internal #' operation: A \code{\link{CPO}} may be one of three \link{OperatingType}s (see there). The \emph{Feature Operation} \code{\link{CPO}} does not #' transform target columns and hence only needs to be involved in steps 1 and 2. The \emph{Target Operation} \code{\link{CPO}} only transforms #' target columns, and therefore mostly concerns itself with steps 1 and 3. A \emph{Retrafoless} \code{\link{CPO}} may change both feature and #' target columns, but may not perform a retrafo \emph{or} inverter operation (and is therefore only concerned with step 1). Note that this #' is effectively a restriction on what kind of transformation a Retrafoless CPO may perform: it must not be a transformation of the data #' or target \emph{space}, it may only act or subtract points within this space. #' #' The Operating Type of a \code{\link{CPO}} is ultimately dependent on the function that was used to create the \code{\link{CPOConstructor}}: #' \code{makeCPO} / \code{makeCPOExtendedTrafo}, \code{makeCPOTargetOp} / \code{makeCPOExtendedTargetOp}, or \code{makeCPORetrafoless}.} #' \item{\strong{Data Transformation}}{ #' At the core of a CPO is the modification of data it performs. For Feature Operation CPOs, the transformation of each row, #' during training \emph{and} prediction, should #' happen in the same way, and it may only depend on the entirety of the \emph{training} data--i.e. the value of a data row in a prediction #' data set may not influence the transformation of a different prediction data row. Furthermore, if a data row occurs in both training and prediction #' data, its transformation result should ideally be the same. #' #' This property is ensured by \code{makeCPO} by splitting the transformation #' into two functions: One function that collects all relevant information from the training data (called \code{cpo.train}), and one that transforms #' given data, using this collected information and (\emph{potentially new, unseen}) data to be transformed (called \code{cpo.retrafo}). The \code{cpo.retrafo} #' function should handle all data as if it were prediction data and unrelated to the data given to \code{cpo.train}. #' #' Internally, when a \code{\link{CPO}} gets applied to a data set using \code{\link{applyCPO}}, the \code{cpo.train} function is called, and the #' resulting control object is used for a subsequent \code{cpo.retrafo} call which transforms the data. Before the result is given back from the #' \code{\link{applyCPO}} call, the control object is used to create a \code{\link{CPORetrafo}} object, #' which is attached to the result as attribute. Target Operating CPOs additionally create and add a \code{\link{CPOInverter}} object. #' #' When a \code{\link{CPORetrafo}} is then applied to new prediction data, the control object previously returned by \code{cpo.train} is given, #' combined with this \emph{new} data, to another \code{cpo.retrafo} call that performs the new transformation. #' #' \code{makeCPOExtendedTrafo} gives more flexibility by having calling only the \code{cpo.trafo} in the training step, which both creates a control #' object \emph{and} modifies the data. This can increase performance if the underlying operation creates a control object and the transformed data in one step, #' as for example \emph{PCA} does. Note that the requirement that the same row in training and prediction data should result in the same transformation #' result still stands. The \code{cpo.trafo} function returns the transformed data and creates a local variable with the control information, which the #' CPO framework will access.} #' \item{\strong{Inversion}}{ #' If a \code{\link{CPO}} performs transformations of the \emph{target} column, the predictions made by a following machine learning process should #' ideally have this transformation undone, so that if the process makes a prediction that coincides with a target value \emph{after} the #' transformation, the whole pipeline should return a prediction that equals to the target value \emph{before} this transformation. #' #' This is done by the \code{cpo.invert} function given to \code{makeCPOTargetOp}. It has access to information from both the preceding training and prediction #' steps. During the training step, \code{cpo.train} createas a \code{control} object that is not only given to \code{cpo.retrafo}, but also #' to \code{cpo.train.invert}. This latter function is called before the prediction step, whenever new data is fed to the machine learning process. #' It takes the new data and the old \code{control} object and transforms it to a new \code{control.invert} object to include information about the prediction #' data. This object is then given to \code{cpo.invert}. #' #' It is possible to have Target Operation CPOs that do not require information from the retrafo step. This is specified by setting #' \code{constant.invert} to \code{TRUE}. It has the advantage that the same \code{\link{CPOInverter}} #' can be used for inversion of predictions made with any new data. Otherwise, a new \code{\link{CPOInverter}} object must be obtained for each #' new data set after the retrafo step (using the \code{\link{inverter}} function on the retrafo result). Having \code{constant.invert} set to \code{TRUE} #' results in \emph{hybrid} retrafo / inverter objects: The \code{\link{CPORetrafo}} object can then also be used for \code{inversions}. #' When defining a \code{constant.invert} Target Operating CPO, no \code{cpo.train.invert} function is given, and the same \code{control} #' object is given to both \code{cpo.retrafo} and \code{cpo.invert}. #' #' \code{makeCPOExtendedTargetOp} gives more flexibility and allows more efficient implementation of Target Operating CPOs at cost of more complexity. #' With this method, a \code{cpo.trafo} function is given that is executed during the first training step; It must return the transformed target column, #' as well as a \code{control} and \code{control.invert} object. The \code{cpo.retrafo} function not only transforms the target, but must also #' create a new \code{control.invert} object (unless \code{constant.invert} is \code{TRUE}). The semantics of \code{cpo.invert} is identical with the #' basic \code{makeCPOTargetOp}.} #' \item{\strong{\code{cpo.train}-\code{cpo.retrafo} information transfer}}{ #' One possibility to transfer information from \code{cpo.train} to \code{cpo.retrafo} is to have \code{cpo.train} return a #' control object (a \code{\link[base]{list}}) #' that is then given to \code{cpo.retrafo}. The CPO is then called an \emph{object based} CPO. #' #' Another possibility is to not give the \code{cpo.retrafo} #' argument (set it to \code{NULL} in the \code{makeCPO} call) and have \code{cpo.train} instead return a \emph{function} instead. This function is then #' used as the \code{cpo.retrafo} function, and should have access to all relevant information about the training data as a closure. This is called #' \emph{functional} CPO. To save memory, the actual data (including target) given to \code{cpo.train} is removed from the environment of its #' return value in this case #' (i.e. the environment of the \code{cpo.retrafo} function). This means the \code{cpo.retrafo} function may not reference a \dQuote{\code{data}} variable. #' #' There are similar possibilities of functional information transfer for other types of CPOs: \code{cpo.trafo} in \code{makeCPOExtendedTargetOp} may #' create a \code{cpo.retrafo} function instead of a \code{control} object. \code{cpo.train} in \code{makeCPOTargetOp} has the option of creating #' a \code{cpo.retrafo} and \code{cpo.train.invert} (\code{cpo.invert} if \code{constant.invert} is \code{TRUE}) function (and returning \code{NULL}) #' instead of returning a \code{control} object. Similarly, \code{cpo.train.invert} may return a \code{cpo.invert} function instead of a \code{control.invert} #' object. In \code{makeCPOExtendedTargetOp}, \code{cpo.trafo} may create a \code{cpo.retrafo} or a \code{cpo.invert} function, each optionally instead #' of a \code{control} or \code{control.invert} object (one \emph{or} both may be functional). \code{cpo.retrafo} similarly may create a \code{cpo.invert} #' function instead of giving a \code{control.invert} object. Functional information transfer may be more parsimonious and elegant than control #' object information transfer.} #' \item{\strong{Hyperparameters}}{ #' The action performed by a CPO may be influenced using \emph{hyperparameters}, during its construction as well as afterwards (then using #' \code{\link[mlr]{setHyperPars}}). Hyperparameters must be specified as a \code{\link[ParamHelpers:makeParamSet]{ParamSet}} and given as argument \code{par.set}. #' Default values for each parameter may be specified in this \code{\link[ParamHelpers:makeParamSet]{ParamSet}} or optionally as another argument \code{par.vals}. #' #' Hyperparameters given are made part of the \code{\link{CPOConstructor}} function and can thus be given during construction. #' Parameter default values function as the default values for the \code{\link{CPOConstructor}} function parameters (which are thus made optional function #' parameters of the \code{\link{CPOConstructor}} function). The CPO framework handles storage and changing of hyperparameter values. #' When the \code{cpo.train} and \code{cpo.retrafo} functions are called to transform data, the hyperparameter values are given to them as arguments, so #' \code{cpo.train} and \code{cpo.retrafo} functions must be able to accept these parameters, either directly, or with a \code{...} argument. #' #' Note that with \emph{functional} \code{\link{CPO}}s, the \code{cpo.retrafo} function does not take hyperparameter arguments (and instead can usually #' refer to them by its environment). #' #' Hyperparameters may be \emph{exported} (or not), thus making them available for \code{\link[mlr]{setHyperPars}}. Not exporting a parameter #' has advantage that it does not clutter the \code{\link[ParamHelpers:makeParamSet]{ParamSet}} of a big \code{\link{CPO}} or \code{\link{CPOLearner}} pipeline with #' many hyperparameters. Which hyperparameters are exported is chosen during the constructing call of a \code{\link{CPOConstructor}}, but the default #' exported hyperparameters can be chosen with the \code{export.params} parameter.} #' \item{\strong{Properties}}{ #' Similarly to \code{\link[mlr:makeLearner]{Learner}}s, \code{\link{CPO}}s may specify what kind of data they are and are not able to handle. This is done by #' specifying \code{.properties.*} arguments. The names of possible properties are the same as possible \code{\link[mlr]{LearnerProperties}}, but since #' \code{\link{CPO}}s mostly concern themselves with data, only the properties indicating column and task types are relevant. #' #' For each \code{\link{CPO}} one must specify #' \enumerate{ #' \item which kind of data does the \code{\link{CPO}} handle, #' \item which kind of data must the \code{\link{CPO}} or \code{\link[mlr:makeLearner]{Learner}} be able to handle that comes \emph{after} #' the given \code{\link{CPO}}, and #' \item which kind of data handling capability does the given \code{\link{CPO}} \emph{add} to a following #' \code{\link{CPO}} or \code{\link[mlr:makeLearner]{Learner}} if coming before it in a pipeline. #' } #' The specification of (1) is done with \code{properties.data} and \code{properties.target}, (2) is specified using \code{properties.needed}, and #' (3) is specified using \code{properties.adding}. Internally, \code{properties.data} and \code{properties.target} are concatenated and treated as #' one vector, they are specified separately in \code{makeCPO} etc. for convenience reasons. See \code{\link{CPOProperties}} for details. #' #' The CPO framework checks the \code{cpo.retrafo} etc. functions for adherence to these properties, so it e.g. throws an error if a \code{cpo.retrafo} #' function adds missing values to some data but didn't declare \dQuote{missings} in \code{properties.needed}. It may be desirable to have this #' internal checking happen to a laxer standard than the property checking when composing CPOs (e.g. when a CPO adds missings only with certain #' hyperparameters, one may still want to compose this CPO to another one that can't handle missings). Therefore it is possible to postfix #' listed properties with \dQuote{.sometimes}. The internal CPO checking will ignore these when listed in \code{properties.adding} #' (it uses the \sQuote{minimal} set of adding properties, \code{adding.min}), and it will not declare them externally when listed in #' \code{properties.needed} (but keeps them internally in the \sQuote{maximal} set of needed properties, \code{needed.max}). The \code{adding.min} #' and \code{needed.max} can be retrieved using \code{\link{getCPOProperties}} with \code{get.internal = TRUE}.} #' \item{\strong{Data Format}}{ #' Different CPOs may want to change different aspects of the data, e.g. they may only care about numeric columns, they may or may not care about #' the target column values, sometimes they might need the actual task used as input. The CPO framework offers to present the data in a specified #' formats to the \code{cpo.train}, \code{cpo.retrafo} and other functions, to reduce the need for boilerplate data subsetting on the user's part. The format is #' requested using the \code{dataformat} and \code{dataformat.factor.with.ordered} parameter. A \code{cpo.retrafo} function is expected to return #' data in the same format as it requested, so if it requested a \code{\link[mlr]{Task}}, it must return one, while if it only #' requested the feature \code{data.frame}, a \code{data.frame} must be returned.} #' \item{\strong{Task Conversion}}{ #' Target Operation CPOs can be used for conversion between \code{\link[mlr]{Task}}s. For this, the \code{type.out} value must be given. Task conversion #' works with all values of \code{dataformat} and is handled by the CPO framework. The \code{cpo.trafo} function must take care to return the target data #' in a proper format (see above). Note that for conversion, not only does the \code{\link[mlr]{Task}} type need to be changed during \code{cpo.trafo}, but #' also the \emph{prediction} format (see above) needs to change.} #' \item{\strong{Fix Factors}}{ #' Some preprocessing for factorial columns needs the factor levels to be the same during training and prediction. This is usually not guarranteed #' by mlr, so the framework offers to do this if the \code{fix.factors} flag is set.} #' \item{\strong{ID}}{ #' To prevent parameter name clashes when \code{\link{CPO}}s are concatenated, the parameters are prefixed with the \code{\link{CPO}}s #' \emph{\link[=getCPOId]{id}}. #' The ID can be set during \code{\link{CPO}} construction, but will default to the \code{\link{CPO}}s \emph{name} if not given. The name is set #' using the \code{cpo.name} parameter.} #' \item{\strong{Packages}}{ #' Whenever a \code{\link{CPO}} needs certain packages to be installed to work, it can specify these in the \code{packages} parameter. The framework #' will check for the availability of the packages and throw an error if not found \emph{during construction}. This means that loading a \code{\link{CPO}} #' from a savefile will omit this check, but in most cases it is a sufficient measure to make the user aware of missing packages in time.} #' \item{\strong{Target Column Format}}{ #' Different \code{\link[mlr]{Task}} types have the target in a different formats. They are listed here for reference. Target data is in this format #' when given to the \code{target} argument of some functions, and must be returned in this format by \code{cpo.trafo} #' in Target Operation CPOs. Target values are always in the format of a \code{\link[base]{data.frame}}, even when only one column. #' \tabular{ll}{ #' \bold{Task type} \tab \bold{target format} \cr #' \dQuote{classif} \tab one column of \code{\link[base]{factor}} \cr #' \dQuote{cluster} \tab \code{data.frame} with zero columns. \cr #' \dQuote{multilabel} \tab several columns of \code{\link[base]{logical}}\cr #' \dQuote{regr} \tab one column of \code{\link[base]{numeric}} \cr #' \dQuote{surv} \tab two columns of \code{\link[base]{numeric}} #' } #' #' When inverting, the format of the \code{target} argument, as well as the return value of, the \code{cpo.invert} function depends on the #' \code{\link[mlr]{Task}} type as well as the \code{predict.type}. The requested return value \code{predict.type} is given to the \code{cpo.invert} function #' as a parameter, the \code{predict.type} of the \code{target} parameter depends on this and the \code{predict.type.map} (see \link{PredictType}). #' The format of the prediction, depending on the task type and \code{predict.type}, is: #' \tabular{lll}{ #' \bold{Task type} \tab \bold{\code{predict.type}} \tab \bold{target format} \cr #' \dQuote{classif} \tab \dQuote{response} \tab \code{\link[base]{factor}} \cr #' \dQuote{classif} \tab \dQuote{prob} \tab \code{\link[base]{matrix}} with nclass cols \cr #' \dQuote{cluster} \tab \dQuote{response} \tab \code{\link[base]{integer}} cluster index \cr #' \dQuote{cluster} \tab \dQuote{prob} \tab \code{\link[base]{matrix}} with nclustr cols \cr #' \dQuote{multilabel} \tab \dQuote{response} \tab \code{\link[base]{logical}} \code{\link[base]{matrix}} \cr #' \dQuote{multilabel} \tab \dQuote{prob} \tab \code{\link[base]{matrix}} with nclass cols \cr #' \dQuote{regr} \tab \dQuote{response} \tab \code{\link[base]{numeric}} \cr #' \dQuote{regr} \tab \dQuote{se} \tab 2-col \code{\link[base]{matrix}} \cr #' \dQuote{surv} \tab \dQuote{response} \tab \code{\link[base]{numeric}} \cr #' \dQuote{surv} \tab \dQuote{prob} \tab [NOT YET SUPPORTED] #' } #' All \code{\link[base]{matrix}} formats are \code{\link[base]{numeric}}, unless otherwise stated.} #' } #' #' @section Headless function definitions: #' In the place of all \code{cpo.*} arguments, it is possible to make a \emph{headless} function definition, consisting only of the function body. #' This function body must always begin with a \sQuote{\code{\{}}. For example, instead of #' \code{cpo.retrafo = function(data, control) data[-1]}, it is possible to use #' \code{cpo.retrafo = function(data, control) \{ data[-1] \}}. The necessary function head is then added automatically by the CPO framework. #' This will always contain the necessary parameters (e.g. \dQuote{\code{data}}, \dQuote{\code{target}}, hyperparameters as defined in \code{par.set}) #' in the names as required. This can declutter the definition of a \code{\link{CPOConstructor}} and is recommended if the CPO consists of #' few lines. #' #' Note that if this is used when writing an R package, inside a function, this may lead to the automatic R correctness checker to print warnings. #' #' #' @param cpo.name [\code{character(1)}]\cr #' The name of the resulting \code{\link{CPOConstructor}} / \code{\link{CPO}}. This is used for identification in output, #' and as the default \code{\link[=getCPOId]{id}}. #' @param par.set [\code{\link[ParamHelpers:makeParamSet]{ParamSet}}]\cr #' Optional parameter set, for configuration of CPOs during construction or by hyperparameters. #' Default is an empty \code{\link[ParamHelpers:makeParamSet]{ParamSet}}. #' It is recommended to use \code{\link{pSS}} to construct this, as it greatly reduces the verbosity of #' creating a \code{\link[ParamHelpers:makeParamSet]{ParamSet}} and makes it more readable. #' @param par.vals [\code{list} | \code{NULL}]\cr #' Named list of default parameter values for the CPO. These are used \emph{instead of} the #' parameter default values in \code{par.set}, if not \code{NULL}. It is preferred to use #' \code{\link[ParamHelpers:makeParamSet]{ParamSet}} default values, #' and not \code{par.vals}. Default is \code{NULL}. #' @param dataformat [\code{character(1)}]\cr #' Indicate what format the data should be as seen by the \code{cpo.train} and \code{cpo.retrafo} function. #' The following table shows what values of \code{dataformat} lead to what is given to \code{cpo.train} and \code{cpo.retrafo} #' as \code{data} and \code{target} parameter value. (Note that for Feature Operating CPOs, \code{cpo.retrafo} has no \code{target} argument.) Possibilities are: #' \tabular{lll}{ #' \bold{dataformat} \tab \bold{data} \tab \bold{target} \cr #' \dQuote{df.all} \tab \code{data.frame} with target cols \tab target colnames \cr #' \dQuote{df.features} \tab \code{data.frame} without target \tab \code{data.frame} of target \cr #' \dQuote{task} \tab full \code{\link[mlr]{Task}} \tab target colnames \cr #' \dQuote{split} \tab list of \code{data.frames} by type \tab \code{data.frame} of target \cr #' [type] \tab \code{data.frame} of [type] feats only \tab \code{data.frame} of target #' } #' [type] can be any one of \dQuote{factor}, \dQuote{numeric}, \dQuote{ordered}; if these are given, only a subset of the total #' data present is seen by the \code{\link{CPO}}. #' #' Note that \code{makeCPORetrafoless} accepts only \dQuote{task} and \dQuote{df.all}. #' #' For \code{dataformat == "split"}, \code{cpo.train} and \code{cpo.retrafo} get a list with entries \dQuote{factor}, \dQuote{numeric}, #' \dQuote{other}, and, if \code{dataformat.factor.with.ordered} is \code{FALSE}, \dQuote{ordered}. #' #' If the CPO is a Feature Operation CPO, then the return value of the \code{cpo.retrafo} function must be in the same format as the one requested. #' E.g. if \code{dataformat} is \dQuote{split}, the return value must be a named list with entries \code{$numeric}, #' \code{$factor}, and \code{$other}. The types of the returned data may be arbitrary: In the given example, #' the \code{$factor} slot of the returned list may contain numeric data. (Note however that if data is returned #' that has a type not already present in the data, \code{properties.needed} must specify this.) #' #' For Feature Operating CPOs, if \code{dataformat} is either \dQuote{df.all} or \dQuote{task}, the #' target column(s) in the returned value of the retrafo function must be identical with the target column(s) given as input. #' #' If \code{dataformat} is \dQuote{split}, the \code{$numeric} slot of the value returned by the \code{cpo.retrafo} function #' may also be a \code{\link[base]{matrix}}. If \code{dataformat} is \dQuote{numeric}, the returned object may also be a #' matrix. #' #' Default is \dQuote{df.features} for all functions except \code{makeCPORetrafoless}, for which it is \dQuote{df.all}. #' @param dataformat.factor.with.ordered [\code{logical(1)}]\cr #' Whether to treat \code{ordered} typed features as \code{factor} typed features. This affects how \code{dataformat} is handled, for which it only #' has an effect if \code{dataformat} is \dQuote{split} or \dQuote{factor}. If \code{dataformat} is \dQuote{ordered}, this must be \code{FALSE}. #' It also affects how strictly data fed to a \code{\link{CPORetrafo}} object #' is checked for adherence to the data format of data given to the generating \code{\link{CPO}}. Default is \code{TRUE}. #' @param export.params [\code{logical(1)} | \code{character}]\cr #' Indicates which CPO parameters are exported by default. Exported parameters can be changed after construction using \code{\link[mlr]{setHyperPars}}, #' but exporting too many parameters may lead to messy parameter sets if many CPOs are combined using \code{\link{composeCPO}} or \code{\link{\%>>\%}}. #' The exported parameters can be set during construction, but \code{export.params} determines the \emph{default} exported parameters. #' If this is a \code{logical(1)}, \code{TRUE} exports all parameters, \code{FALSE} to exports no parameters. It may also be a \code{character}, #' indicating the names of parameters to be exported. Default is \code{TRUE}. #' @param fix.factors [\code{logical(1)}]\cr #' Whether to constrain factor levels of new data to the levels of training data, for each factorial or ordered column. If new data contains #' factors that were not present in training data, the values are set to \code{NA}. Default is \code{FALSE}. #' @param properties.data [\code{character}]\cr #' The kind if data that the CPO will be able to handle. This can be one or more of: \dQuote{numerics}, #' \dQuote{factors}, \dQuote{ordered}, \dQuote{missings}. #' There should be a bias towards including properties. If a property is absent, the preproc #' operator will reject the data. If an operation e.g. only works on numeric columns that have no #' missings (like PCA), it is recommended to give all properties, ignore the columns that #' are not numeric (using \code{dataformat = "numeric"}), and giving an error when #' there are missings in the numeric columns (since missings in factorial features are not a problem). #' Defaults to the maximal set. #' @param properties.target [\code{character}]\cr #' For Feature Operation CPOs, this can be one or more of \dQuote{cluster}, \dQuote{classif}, \dQuote{multilabel}, \dQuote{regr}, \dQuote{surv}, #' \dQuote{oneclass}, \dQuote{twoclass}, \dQuote{multiclass}. Just as \code{properties.data}, it #' indicates what kind of data a CPO can work with. To handle data given as \code{data.frame}, the \dQuote{cluster} property is needed. Default is the maximal set. #' #' For Target Operation CPOs, this \emph{must} contain exactly one of \dQuote{cluster}, \dQuote{classif}, \dQuote{multilabel}, \dQuote{regr}, \dQuote{surv}. #' This indicates the type of \code{\link[mlr]{Task}} the #' \code{\link{CPO}} can work on. If the input is a \code{data.frame}, it is treated as a \dQuote{cluster} type \code{\link[mlr]{Task}}. #' If the \code{properties.target} contains \dQuote{classif}, the value must then also contain one or more of \dQuote{oneclass}, #' \dQuote{twoclass}, or \dQuote{multiclass}. Default is \dQuote{cluster}. #' @param properties.adding [\code{character}]\cr #' Can be one or many of the same values as \code{properties.data} for Feature Operation CPOs, and one or many of the same values as \code{properties.target} #' for Target Operation CPOs. These properties \emph{get added} to a \code{\link[mlr:makeLearner]{Learner}} (or \code{\link{CPO}}) coming after / behind this CPO. #' When a CPO imputes missing values, for example, this should be \dQuote{missings}. This must be a subset of \dQuote{properties.data} or #' \dQuote{properties.target}. #' #' Note that this may \emph{not} contain a \code{\link[mlr]{Task}}-type property, even if the \code{\link{CPO}} is a Target Operation CPO that performs #' conversion. #' #' Property names may be postfixed with \dQuote{.sometimes}, to indicate that adherence should not be checked internally. This distinction is made by #' not putting them in the \code{$adding.min} slot of the \code{\link{getCPOProperties}} return value when \code{get.internal = TRUE}. #' #' Default is \code{character(0)}. #' @param properties.needed [\code{character}]\cr #' Can be one or many of the same values as \code{properties.data} for Feature Operation CPOs, #' and one or many of the same values as \code{properties.target}. These properties are \emph{required} #' from a \code{\link[mlr:makeLearner]{Learner}} (or \code{\link{CPO}}) coming after / behind this CPO. E.g., when a CPO converts factors to #' numerics, this should be \dQuote{numerics} (and \code{properties.adding} should be \dQuote{factors}). #' #' Note that this may \emph{not} contain a \code{\link[mlr]{Task}}-type property, even if the \code{\link{CPO}} is a Target Operation CPO that performs #' conversion. #' #' Property names may be postfixed with \dQuote{.sometimes}, to indicate that adherence should not be checked internally. This distinction is made by #' not putting them in the \code{$needed} slot of properties. They can still be found in the \code{$needed.max} slot of the #' \code{\link{getCPOProperties}} return value when \code{get.internal = TRUE}. #' #' Default is \code{character(0)}. #' @param packages [\code{character}]\cr #' Package(s) that should be loaded when the CPO is constructed. This gives the user an error if #' a package required for the CPO is not available on his system, or can not be loaded. Default is \code{character(0)}. #' @param constant.invert [\code{logical(1)}]\cr #' Whether the \code{cpo.invert} step should not have information from the previous \code{cpo.retrafo} or \code{cpo.train.invert} step in #' Target Operation CPOs (\code{makeCPOTargetOp} or \code{makeCPOExtendedTargetOp}). #' #' For \code{makeCPOTargetOp}, if this is \code{TRUE}, the #' \code{cpo.train.invert} argument must be \code{NULL}. If \code{cpo.retrafo} and \code{cpo.invert} are given, the same \code{control} #' object is given to both of them. Otherwise, if \code{cpo.retrafo} and \code{cpo.invert} are \code{NULL}, the \code{cpo.train} function #' must return \code{NULL} and define a \code{cpo.retrafo} and \code{cpo.invert} function in its namespace (see \code{cpo.train} documentation #' for more details). If \code{constant.invert} is \code{FALSE}, \code{cpo.train} may either return a \code{control} object that will then be #' given to \code{cpo.train.invert}, or define a \code{cpo.retrafo} and \code{cpo.train.invert} function in its namespace. #' #' For \code{makeCPOExtendedTargetOp}, if this is \code{TRUE}, \code{cpo.retrafo} does not need to generate a \code{control.invert} object. #' The \code{control.invert} object created in \code{cpo.trafo} will then always be given to \code{cpo.invert} for all data sets. #' #' Default is \code{FALSE}. #' @param predict.type.map [\code{character} | \code{list}]\cr #' This becomes the \code{\link{CPO}}'s \code{predict.type}, explained in detail in \link{PredictType}. #' #' In short, the \code{predict.type.map} is a character vector, or a \code{list} of \code{character(1)}, #' with \emph{names} according to the predict types \code{predict} can request #' in its \code{predict.type} argument when the created \code{\link{CPO}} was used as part of a \code{\link{CPOLearner}} to create the #' model under consideration. The \emph{values} of \code{predict.type.map} are the \code{predict.type} that will be requested from the #' underlying \code{\link[mlr:makeLearner]{Learner}} for prediction. #' #' \code{predict.type.map} thus determines the format that the \code{target} parameter of \code{cpo.invert} can take: It is #' the format according to \code{predict.type.map[predict.type]}, where \code{predict.type} is the respective \code{cpo.invert} parameter. #' @param task.type.out [\code{character(1)} | \code{NULL}]\cr #' If \code{\link[mlr]{Task}} conversion is to take place, this is the output task that the data should be converted to. Note that the #' CPO framework takes care of the conversion if \code{dataformat} is not \dQuote{task}, but the target column needs to have the #' proper format for that. #' #' If this is \code{NULL}, \code{\link[mlr]{Task}}s will not be converted. Default is \code{NULL}. #' @param cpo.train [\code{function} | \code{NULL}]\cr #' This is a function which must have the parameters \code{data} and \code{target}, #' as well as the parameters specified in \code{par.set}. (Alternatively, #' the function may have only some of these arguments and a \code{\link[methods:dotsMethods]{dotdotdot}} argument). #' It is called whenever a \code{\link{CPO}} is applied to #' a data set to prepare for transformation of the training \emph{and} prediction data. #' Note that this function is only used in Feature Operating CPOs created with \code{makeCPO}, and in Target Operating CPOs #' created with \code{makeCPOExtendedTargetOp}. #' #' The behaviour of this function differs slightly in Feature Operation and Target Operation CPOs. #' #' For \bold{Feature Operation CPOs}, if \code{cpo.retrafo} is \code{NULL}, this is a constructor function which must return a \dQuote{retrafo} function which #' will then modify (possibly new unseen) data. This retrafo function must have exactly one argument--the (new) data--and return the modified data. The format #' of the argument, and of the return value of the retrafo function, depends on the value of the \code{dataformat} parameter, see documentation there. #' #' If \code{cpo.retrafo} is not \code{NULL}, this is a function which must return a control object. #' This control object returned by \code{cpo.train} will then be given as the \code{control} argument of the \code{cpo.retrafo} function, along with #' (possibly new unseen) data to manipulate. #' #' For \bold{Target Operation CPOs}, if \code{cpo.retrafo} is \code{NULL}, \code{cpo.train.invert} #' (or \code{cpo.invert} if \code{constant.invert} is \code{TRUE}) must likewise be \code{NULL}. #' In that case \code{cpo.train}'s return value is ignored and it must define, within its namespace, two #' functions \code{cpo.retrafo} and \code{cpo.train.invert} (or \code{cpo.invert} if \code{constant.invert} #' is \code{TRUE}) which will take the place of the respective functions. \code{cpo.retrafo} must take the #' parameters \code{data} and \code{target}, and return the modified target \code{target} (or \code{data}, #' depending on \code{dataformat}) data. \code{cpo.train.invert} must take a \code{data} and \code{control} #' argument and return either a modified control object, or a \code{cpo.invert} function. #' \code{cpo.invert} must have a \code{target} and \code{predict.type} argument and return the modified #' target data. #' #' If \code{cpo.retrafo} is not \code{NULL}, \code{cpo.train.invert} #' (or \code{cpo.invert} if \code{constant.invert} is \code{TRUE}) must likewise be non-\code{NULL}. #' In that case, \code{cpo.train} must return a control object. This control object will then be #' given as the \code{control} argument of both \code{cpo.retrafo} and \code{cpo.train.invert} #' (or the \code{control.invert} argument of \code{cpo.invert} if \code{constant.invert} is \code{TRUE}). #' #' This parameter may be \code{NULL}, resulting in a so-called \emph{stateless} CPO. For Target Operation CPOs created with \code{makeCPOTargetOp}, #' \code{constant.invert} must be \code{TRUE} in this case. #' A stateless CPO does the same transformation for initial CPO #' application and subsequent prediction data transformation (e.g. taking the logarithm of numerical columns). Note that \code{cpo.retrafo} #' and \code{cpo.invert} should not #' have a \code{control} argument in a stateless CPO. #' @param cpo.trafo [\code{function}]\cr #' This is a function which must have the parameters \code{data} and \code{target}, #' as well as the parameters specified in \code{par.set}. (Alternatively, #' the function may have only some of these arguments and a \code{\link[methods:dotsMethods]{dotdotdot}} argument). #' It is called whenever a \code{\link{CPO}} is applied to #' a data set to transform the training data, and (except for Retrafoless CPOs) to collect a control object used by other transformation functions. #' Note that this function is not used in \code{makeCPO}. #' #' This functions primary task is to transform the given data when the \code{\link{CPO}} gets applied to training data. For Target Operating CPOs #' (created with \code{makeCPOExtendedTargetOp}(!)), #' it must return the complete transformed target column(s), unless \code{dataformat} is \dQuote{df.all} (in which case the complete, modified, #' \code{data.frame} must be returned) or \dQuote{task} (in which case the complete, modified, \code{Task} must be returned). It must furthermore #' create the control objects for \code{cpo.retrafo} and \code{cpo.invert}, or create these functins themselves, and save them in its function #' environment (see below). For Retrafoless CPOs #' (created with \code{makeCPORetrafoless}) and Feature Operation CPOs (created with \code{makeCPOExtendedTrafo}(!)), it must return the #' data in the same format as received it in its \code{data} argument (depending on \code{dataformat}). If \code{dataformat} is a #' \code{df.all} or \code{task}, this means the target column(s) contained in the \code{data.frame} or \code{Task} returned must not be modified. #' #' For CPOs that are not Retrafoless, a unit of information to be carried over to the retrafo step needs to be created inside the \code{cpo.trafo} #' function. This unit of information is a variable that must be defined inside the environment of the \code{cpo.trafo} function and will be #' retrieved by the CPO framework. #' #' If \code{cpo.retrafo} is not \code{NULL} #' the unit is an object named \dQuote{\code{control}} that will be passed on as the \code{control} argument to the #' \code{cpo.retrafo} function. If \code{cpo.retrafo} is \code{NULL}, the unit is a \emph{function}, called \dQuote{\code{cpo.retrafo}}, #' that will be used #' \emph{instead} of the \code{cpo.retrafo} #' function passed over to \code{makeCPOExtendedTargetOp} / \code{makeCPOExtendedTrafo}. It must behave #' the same as the function it replaces, but has only the \code{data} (and \code{target}, for Target Operation CPOs) argument. #' #' For Target Operation CPOs created with \code{makeCPOExtendedTargetOp}, another unit of information to be used by \code{cpo.invert} #' must be used. The options here are similar to \code{cpo.retrafo}: Either a control object, named \code{control.invert}, is created, #' or the \code{cpo.invert} function itself is given (and \code{cpo.invert} in the \code{makeCPOExtendedTargetOp} call is set to \code{NULL}), #' with the \code{target} and \code{predict.type} arguments. #' @param cpo.retrafo [\code{function} | \code{NULL}]\cr #' This is a function which must have the parameters \code{data}, \code{target} (Target Operation CPOs only) and \code{control}, #' as well as the parameters specified in \code{par.set}. (Alternatively, #' the function may have only some of these arguments and a \code{\link[methods:dotsMethods]{dotdotdot}} argument). #' In Feature Operation CPOs created with \code{makeCPO}, if \code{cpo.train} is \code{NULL}, the \code{control} argument must be absent. #' #' This function gets called during the \dQuote{retransformation} step where prediction data is given to the \code{\link{CPORetrafo}} object before it #' is given to a fitted machine learning model for prediction. In \code{makeCPO} Featore Operation CPOs and \code{makeCPOTargetOp} Target Operation CPOs, #' this is \emph{also} called during the #' first trafo step, where the \code{\link{CPO}} object is applied to training data. #' #' In Feature Operation CPOs, this function receives the data to be #' transformed and must return the transformed data in the same format as it received them. #' The format of \code{data} is the same as the format in \code{cpo.train} and \code{cpo.trafo}, with the exception that if \code{dataformat} is #' \dQuote{task} or \dQuote{df.all}, the behaviour here is as if \dQuote{df.split} had been given. #' #' In Target Operation CPOs created with \code{makeCPOTargetOp}, this function receives the data and target to be transformed #' and must return the transformed target. The input format of these parameters depends on \code{dataformat}. #' If \code{dataformat} is \dQuote{task} or \dQuote{df.all}, the returned value must be the modified \code{\link[mlr]{Task}} / \code{data.frame} #' with the feature columns not modified. Otherwise, the target values to be modified are in the \code{target} parameter, and the return #' value must be a \code{data.frame} of the modified target values only. #' #' In Target Operation CPOs created with \code{makeCPOExtendedTargetOp}, this function is called during the retrafo step, and it must #' create a \code{control.invert} object in its environment to be used in the inversion step, as well as return the modified target #' data.The format of the data given to \code{cpo.retrafo} in Target Operation CPOs created with \code{makeCPOExtendedTargetOp} is the same #' as in other functions, with the exception that, if \code{dataformat} is \dQuote{df.all} or \dQuote{task}, the full \code{data.frame} #' or \code{\link[mlr]{Task}} will be given as the \code{target} parameter, while the \code{data} parameter will behave as if #' \code{dataformat} \dQuote{df.split}. Depending on what object the \code{\link{CPORetrafo}} object was applied to, #' the \code{target} argument \emph{may be \code{NULL}}; in that case \code{NULL} must also be returned by the function. #' #' If \code{cpo.invert} is \code{NULL}, \code{cpo.retrafo} should create a \code{cpo.invert} function in its environment instead of #' creating the control object; this function should then take the \code{target} and \code{predict.type} arguments. If \code{constant.invert} #' is \code{TRUE}, this function does not need to define the \code{control.invert} or \code{cpo.invert} variables, they are instead #' taken from \code{cpo.trafo}. #' @param cpo.train.invert #' This is a function which must have the parameters \code{data}, and \code{control}, #' as well as the parameters specified in \code{par.set}. (Alternatively, #' the function may have only some of these arguments and a \code{\link[methods:dotsMethods]{dotdotdot}} argument). #' #' This function receives the feature columns given for prediction, and must return a #' control object that will be passed on to the \code{cpo.invert} function, \emph{or} it must return a \emph{function} that will be treated #' as the \code{cpo.invert} function if the \code{cpo.invert} argument is \code{NULL}. In the latter case, the returned function takes #' exactly two arguments (the prediction column to be inverted, and \code{predict.type}), and otherwise behaves identically to \code{cpo.invert}. #' #' If \code{constant.invert} is \code{TRUE}, this must be \code{NULL}. #' #' @param cpo.invert [\code{function} | \code{NULL}]\cr #' This is a function which must have the parameters \code{target} (a \code{data.frame} containing the columns of a prediction made), \code{control.invert}, #' and \code{predict.type}, as well as the parameters specified in \code{par.set}. (Alternatively, #' the function may have only some of these arguments and a \code{\link[methods:dotsMethods]{dotdotdot}} argument). #' #' The \code{predict.type} \emph{requested} by the \code{\link[stats]{predict}} or \code{\link{invert}} call is given as a \code{character(1)} in #' the \code{predict.type} argument. Note that this is not necessarily the \code{predict.type} of the prediction made and given as \code{target} argument, #' depending on the value of \code{predict.type.map} (see there). #' #' This function performs the inversion for a Target Operation CPO. It takes a control object, which summarizes information from the training and #' retrafo step, and the prediction as returned by a machine learning model, and undoes the operation done to the target column in the \code{cpo.trafo} #' function. #' #' For example, if the trafo step consisted of taking the logarithm of a regression target, the \code{cpo.invert} function could return the exponentiated #' prediction values by taking the \code{exp} of the only column in the \code{target} \code{data.frame} and returning the result of that. This kind of #' operation does not need the \code{cpo.retrafo} step and should have \code{skip.retrafo} set to \code{TRUE}. #' #' As a more elaborate example, a CPO could train a model on the training data and set the target values to the \emph{residues} of that trained model. #' The \code{cpo.retrafo} function would then make predictions with that model on the new prediction data and save the result to the \code{control} object. #' The \code{cpo.invert} function would then add these predictions to the predictions given to it in the \code{target} argument to \dQuote{invert} the #' antecedent subtraction of model predictions from target values when taking the residues. #' @return [\code{\link{CPOConstructor}}]. A Constructor for \code{\link{CPO}}s. #' @family CPOConstructor related #' @family CPO lifecycle related #' @family advanced topics #' #' @examples #' # an example constant feature remover CPO #' constFeatRem = makeCPO("constFeatRem", #' dataformat = "df.features", #' cpo.train = function(data, target) { #' names(Filter(function(x) { # names of columns to keep #' length(unique(x)) > 1 #' }, data)) #' }, cpo.retrafo = function(data, control) { #' data[control] #' }) #' # alternatively: #' constFeatRem = makeCPO("constFeatRem", #' dataformat = "df.features", #' cpo.train = function(data, target) { #' cols.keep = names(Filter(function(x) { #' length(unique(x)) > 1 #' }, data)) #' # the following function will do both the trafo and retrafo #' result = function(data) { #' data[cols.keep] #' } #' result #' }, cpo.retrafo = NULL) #' @name makeCPO NULL ################### # The following is a rudiment, possibly some of this needs to be used. # TODO: Delete if documentation is done and it turns out this is not needed. ################### # @title Create a custom CPO constructor # # @description # \code{makeCPOExtended} creates a Feature Operation CPO constructor, i.e. a constructor for a CPO that will # operate on feature columns. \code{makeCPOTargetOp} creates a Target Operation CPO constructor, which # creates CPOs that operate on the target column. # # \code{makeCPOExtended} is for advanced users and internal use; for a much simpler user-interface, use # \code{\link{makeCPO}}. # # @inheritparams makeCPO # @param ... # Parameters of the CPO, in the format of \code{\link[ParamHelpers]{pSS}}. These parameters are used in addition # to the \code{par.set} parameters. # @param trafo.type [\code{character(1)}]\cr # Indicates what API is used for \code{cpo.trafo} and \code{cpo.retrafo}, and how state information is transferred # between them. Possibilities are: # \itemize{ # \item{trafo.returns.data} \code{cpo.trafo} must be specified and is called with the training data and the CPO parameters. # It must return the modified data, and within its namespace must either specify a \dQuote{control} variable ("Object-Based CPO"), # if \code{cpo.retrafo} is given, or a \dQuote{cpo.retrafo} variable, if (the makeCPOExtended parameter) \code{cpo.retrafo} # is \code{NULL} ("Functional CPO"). For Object-Based CPO, \code{cpo.retrafo} is called with the \code{control} object # created in \code{cpo.trafo}, additionally with the new data, and the CPO parameters. For Functional CPO, \code{cpo.retrafo} is # constructed inside the \code{cpo.trafo} call and is used for transformation of new data. It must take a single argument and # return the transformed data. # \item{trafo.returns.control} \code{cpo.trafo} must be specified and is called with the training data and the CPO parameters. It must return # a \code{cpo.retrafo} function that takes the data to be transformed as a single argument, and returns the transformed data. # If \code{trafo.type} is \dQuote{trafo.returns.control}, \code{pco.retrafo} must be \code{NULL}. # \item{stateless} Specification of \code{cpo.trafo} is optional and may be \code{NULL}. If it is not given, \code{cpo.retrafo} is used on both # training and new data; otherwise, \code{cpo.trafo} is applied to training data, \code{cpo.retrafo} is used on predict data. There # is no transfer of information from trafo to retrafo. If \code{cpo.trafo} is not given, \code{dataformat} must not be \dQuote{task} or \dQuote{df.all}. # } # @param .type [\code{character(1)}]\cr # For Target Operation CPOs, the type of task that it operates on. Must be one of \dQuote{cluster}, \dQuote{classif}, \dQuote{multilabel}, \dQuote{regr}, # or \dQuote{surv}. If input data is a data.frame, it will be treated as a cluster task. Default is \dQuote{cluster}. # @param .type.out [\code{character(1)}]\cr # For Target Operation CPOs, the type of task that will be generated by this CPO. If this is the same as \code{.type}, no conversion takes place. # Possible values are the same as for \code{.type}. Default is \code{.type}. # @param predict.type [\code{character} | \code{list}]\cr # Must be a named \code{character}, or named \code{list} of \code{character(1)}, indicating # what \code{predict.type} (see \link{Prediction}) a prediction must have if the output prediction # is to be of some type. E.g. if a CPO converts a \dQuote{regr} \code{Task} into a # \dQuote{classif} \code{Task}, and if for \dQuote{se} prediction it needs a classification # learner to give \dQuote{prob} type predictions, while for \dQuote{response} prediction it # also needs \dQuote{response} predictions, this would be \code{c(response = "response", # se = "prob")}. The names are the prediction types that are requested from this CPO, the # values are types that this CPO will request from an underlying learner. If a name is not # present, the \code{predict.type} is assumed not supported. Default is \code{c(response = "response")}. # @param data.dependent [\code{logical(1)}]\cr # Whether to make a data-dependent inverter CPO. If this is \code{FALSE}, the \code{cpo.trafo} function does not have # a \code{data} parameter. # @param cpo.trafo [\code{language} | \code{function} | \code{NULL}]\cr # This can either be a function, or just the function body wrapped in curly braces. # If this is a function, it must have the parameters \dQuote{data} and \dQuote{target}, # as well as the parameters specified in \dQuote{...} or \dQuote{par.set}. (Alternatively, # the function may have a dotdotdot argument). Depending on the values of \code{trafo.type} and # \code{dataformat} -- see there --, it must return a \dQuote{data.frame}, a \dQuote{task}, # a dQuote{matrix}, \dQuote{list} of \dQuote{data.frame} and \dQuote{matrix} objects, or a retrafo function. # # If \dQuote{cpo.retrafo} is given and \code{trafo.type} is \dQuote{trafo.returns.data}, it must create a \dQuote{control} # variable in its namespace, which will be passed on to \dQuote{cpo.retrafo}. If \dQuote{cpo.retrafo} is # not given and \code{trafo.type} is \dQuote{trafo.returns.data}, it must create a \dQuote{cpo.retrafo} function within its namespace, which will be called # for re-transformation. # # If \code{trafo.type} is \dQuote{trafo.returns.control}, this function must return a \dQuote{cpo.retrafo} function. # # If \code{trafo.type} is # \dQuote{stateless}, this argument may be \code{NULL}, or a function which just returns the transformed data. # # If \dQuote{cpo.trafo} is a list of expressions (preferred), it is turned into a function by mlr, with the correct function arguments. # @param cpo.retrafo [\code{language} | \code{function}]\cr # Similarly to \dQuote{cpo.trafo}, this is either a function, the function body in curly braces (preferred), or \code{NULL}. # If this is not \code{NULL}, this function must have the same arguments as \code{cpo.trafo}, with the exception that # the \dQuote{target} argument is replaced by a \dQuote{control} argument, which will be # the value created in the \dQuote{cpo.trafo} run. It gets its input data in the same format as # \dQuote{cpo.trafo}, with the exception that if \dQuote{dataformat} is \dQuote{task}, it gets a # \dQuote{data.frame} as if \dQuote{dataformat} were \dQuote{df.all}. This function must similarly return an # object in the same format as it received as input. # # @family CPO # @export # # @examples # # an example 'pca' CPO # # demonstrates the (object based) "trafo.returns.data" CPO API # pca = makeCPOExtended("pca", # name # center = TRUE: logical, # one logical parameter 'center' # dataformat= "numeric", # only handle numeric columns # trafo.type = "trafo.returns.data", # default, can be omitted # # cpo.trafo is given as a function body. The function head is added # # automatically, containing 'data', 'target', and 'center' # # (since a 'center' parameter was defined) # cpo.trafo = { # pcr = prcomp(as.matrix(data), center = center) # # The following line creates a 'control' object, which will be given # # to retrafo. # control = list(rotation = pcr$rotation, center = pcr$center) # pcr$x # returning a matrix is ok # # Just like cpo.trafo, cpo.retrafo is a function body, with implicit # # arguments 'data', 'control', and 'center'. # }, cpo.retrafo = { # scale(as.matrix(data), center = control$center, scale = FALSE) %*% # control\$rotation # }) # # # an example 'scale' CPO # # demonstrates the (functional) "trafo.returns.data" CPO API # scaleCPO = makeCPOExtended("scale", # dataformat = "numeric", # # trafo.type = "trafo.returns.data" is implicit # cpo.trafo = function(data, target) { # result = scale(as.matrix(data), center = center, scale = scale) # cpo.retrafo = function(data) { # # here we can use the 'result' object generated in cpo.trafo # scale(as.matrix(data), attr(result, "scaled:center"), # attr(result, "scaled:scale")) # } # result # }, cpo.retrafo = NULL) # don't forget to set it cpo.retrafo to NULL # # # an example constant feature remover CPO # # demonstrates the "trafo.returns.control" CPO API # constFeatRem = makeCPOExtended("constFeatRem", # dataformat = "df.features", # trafo.type = "trafo.returns.control", # cpo.trafo = function(data, target) { # cols.keep = names(Filter(function(x) { # length(unique(x)) > 1 # }, data)) # # the following function will do both the trafo and retrafo # result = function(data) { # data[cols.keep] # } # result # }, cpo.retrafo = NULL) # # # an example 'square' CPO # # demonstrates the "stateless" CPO API # square = makeCPOExtended("scale", # dataformat = "numeric", # trafo.type = "stateless", # cpo.trafo = function(data) { # as.matrix(data) * 2 # }, cpo.retrafo = NULL) # optional, we don't need it since trafo & retrafo same #