You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With grouped data it is important that if one row of a group is in the training set other then other rows in that group cannot be in the test set That is instead of sampling individual rows sample groups. This link shows an example and there is another example further down here.
Perhaps allow the holdout= argument to be a vector of indexes or provide for a group= argument. The first possibility would allow other schemes as well whereas the second is easier for the user in this situation but does not allow for unanticipated sampling schemes. It would be possible to have both, of course.
I am currently kludging it using this where the example is iris assuming each successive 10 rows forms a group.
# iris where each successive 10 rows forms a group
library(qeML)
set.seed(123)
# create grouping variable
grp <- rep(1:15, each = 10)
# set holdout indexes so that if a row is in test or is in train then others in group are too
holdout <- which(grp %in% sample(15, 3))
# kludge it by redefining sample within qeKNN to return the indexes we want
trace(qeKNN, quote(sample <- function(x, holdout) holdout))
qeKNN(iris, "Species", holdout = holdout)
untrace(qeKNN)
The text was updated successfully, but these errors were encountered:
With grouped data it is important that if one row of a group is in the training set other then other rows in that group cannot be in the test set That is instead of sampling individual rows sample groups. This link shows an example and there is another example further down here.
https://stackoverflow.com/questions/71087864/how-to-keep-grouped-variables-together-in-training-and-test-data
Perhaps allow the
holdout=
argument to be a vector of indexes or provide for agroup=
argument. The first possibility would allow other schemes as well whereas the second is easier for the user in this situation but does not allow for unanticipated sampling schemes. It would be possible to have both, of course.I am currently kludging it using this where the example is iris assuming each successive 10 rows forms a group.
The text was updated successfully, but these errors were encountered: