You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PipeOpFeatureUnion could under some circumstances want to unite tasks that have differing row IDs, e.g. after PipeOpSubsample on two different paths sampled (and $filter()ed) different sets of rows.
graph= greplicate(PipeOpSubsample$new() %>>%
PipeOpLearnerCV$new("classif.rpart"), 2) %>>%
PipeOpFeatureUnion$new()
graph$plot() # this is what it looks likegraph$train("iris") # assertion error
mlr-org/mlr3#309 could solve part of this, but the problem goes deeper:
what if we do sampling with replacement?
what if PipeOpLearnerCV has a resampling that predicts some entries multiple times, e.g. RepCV or bootstrapping?
The text was updated successfully, but these errors were encountered:
If we subsample with replacement, we basically have the row_id twice in the data used for training the learner. cv-ed predictions will be the same for duplicated instances, this should not be a problem. We might get a problem when we do something stochastic after PipeOpSubsample, but I currently can not think about anything.
As a solution, I would suggest to simply choose the first element if multiple are provided until we encounter a solution where this yields invalid result. Note also, that the first problem we'd encounter here is an invalid Resampling objects as rows could end up in train as well as in test data I guess.
what if PipeOpLearnerCV has a resampling that predicts some entries multiple times, e.g. RepCV or bootstrapping?
We currently only allow cv for learnercv and therefore this should not be a problem.
A more future proof version would be doing aggregation, but how correct aggregation would look like is unclear and depends on the situation. Therefore I would argue to not tackle this now.
PipeOpFeatureUnion could under some circumstances want to unite tasks that have differing row IDs, e.g. after PipeOpSubsample on two different paths sampled (and
$filter()
ed) different sets of rows.mlr-org/mlr3#309 could solve part of this, but the problem goes deeper:
The text was updated successfully, but these errors were encountered: