-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How are predictions calculated from the individual trees? #42
Comments
For what it's worth, the shrinkage parameter appears to be a no-op from the perspective of
What I'm having a bit of trouble discerning just yet is why the following gives such different results, when -- from my reading of the prediction code in gbmentry.cpp -- it should be identical as well:
I'll keep looking into this. |
Thanks for your reply @cunningjames. Not sure if I've understood you correctly, but I get different results if I change the value of |
@markseeto and @cunningjames . Sorry I'm super late to the party. A couple of things to note. The prediction obtained from library(gbm)
set.seed(1)
shr <- 0.1 # shrinkage value
iris$Species <- ifelse(iris$Species == "setosa", 1, 0)
gbm.iris <- gbm(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width,
data = iris, distribution = "bernoulli",
shrinkage = shr, bag.fraction = 1, n.trees = 10)
new.data <- data.frame(Sepal.Length = c(4.7, 4.9, 6.2, 7.1),
Sepal.Width = c(2.7, 2.3, 3.1, 3.2),
Petal.Length = c(2.5, 2.5, 3.5, 4.9),
Petal.Width = c(0.5, 0.8, 1.8, 1.6))
# How is predict(gbm.iris, newdata = new.data, n.trees = 2) calculated?
predict(gbm.iris, newdata = new.data, n.trees = 2)
# [1] -0.9861826 -0.9861826 -0.9861826 -0.9861826
# Should be the same
p.setosa <- mean(iris$Species)
init <- log(p.setosa / (1 - p.setosa)) # boosting starts from the logit of P(Y = 1)
predict(gbm.iris, newdata = new.data, n.trees = 1, single.tree = TRUE) +
predict(gbm.iris, newdata = new.data, n.trees = 2, single.tree = TRUE) +
init
# [1] -0.9861826 -0.9861826 -0.9861826 -0.9861826 Hope this helps clear up some confusion. |
How are the predictions from
predict()
calculated from the individual trees?If I understand the documentation correctly, using
single.tree=TRUE
inpredict()
gives the prediction from an individual tree or trees. But I can't see how to combine the individual predictions. I thought they would be added together with shrinkage applied to each subsequent tree, but that doesn't appear to be correct.Example:
I'm using gbm version 2.1.5.
Thanks.
The text was updated successfully, but these errors were encountered: