Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How are predictions calculated from the individual trees? #42

Closed
markseeto opened this issue Jun 29, 2019 · 3 comments
Closed

How are predictions calculated from the individual trees? #42

markseeto opened this issue Jun 29, 2019 · 3 comments

Comments

@markseeto
Copy link

markseeto commented Jun 29, 2019

How are the predictions from predict() calculated from the individual trees?

If I understand the documentation correctly, using single.tree=TRUE in predict() gives the prediction from an individual tree or trees. But I can't see how to combine the individual predictions. I thought they would be added together with shrinkage applied to each subsequent tree, but that doesn't appear to be correct.

Example:

library(gbm)

set.seed(1)

shr <- 0.1  # shrinkage value

gbm.iris <- gbm(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width,
                data = iris, distribution = "multinomial",
                shrinkage = shr, bag.fraction = 1, n.trees = 10)

new.data <- data.frame(Sepal.Length = c(4.7, 4.9, 6.2, 7.1),
                       Sepal.Width = c(2.7, 2.3, 3.1, 3.2),
                       Petal.Length = c(2.5, 2.5, 3.5, 4.9),
                       Petal.Width = c(0.5, 0.8, 1.8, 1.6))

# How is predict(gbm.iris, newdata = new.data, n.trees = 2) calculated?

predict(gbm.iris, newdata = new.data, n.trees = 2)

## , , 2
##          setosa  versicolor  virginica
## [1,] -0.2903287  0.13823604 -0.2523327
## [2,] -0.2903287  0.13823604 -0.2523327
## [3,] -0.2903287 -0.06420611  0.5207189
## [4,] -0.2903287  0.13823604 -0.2523327

# Not the same:
predict(gbm.iris, newdata = new.data, n.trees = 1) +
  shr*predict(gbm.iris, newdata = new.data, n.trees = 2, single.tree=TRUE)

# Not the same:
predict(gbm.iris, newdata = new.data, n.trees = 1) +
  predict(gbm.iris, newdata = new.data, n.trees = 2, single.tree=TRUE)

I'm using gbm version 2.1.5.

Thanks.

@cunningjames
Copy link
Collaborator

For what it's worth, the shrinkage parameter appears to be a no-op from the perspective of gbm.fit. For this example:

library(gbm)

set.seed(1)

gbm.iris <- gbm(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width,
                data = iris, distribution = "multinomial",
                # shrinkage = shr,
                bag.fraction = 1, n.trees = 10)

new.data <- data.frame(Sepal.Length = c(4.7, 4.9, 6.2, 7.1),
                       Sepal.Width = c(2.7, 2.3, 3.1, 3.2),
                       Petal.Length = c(2.5, 2.5, 3.5, 4.9),
                       Petal.Width = c(0.5, 0.8, 1.8, 1.6))

predict(gbm.iris, newdata = new.data, n.trees = 2)

# Same results:

## , , 2
##          setosa  versicolor  virginica
## [1,] -0.2903287  0.13823604 -0.2523327
## [2,] -0.2903287  0.13823604 -0.2523327
## [3,] -0.2903287 -0.06420611  0.5207189
## [4,] -0.2903287  0.13823604 -0.2523327

What I'm having a bit of trouble discerning just yet is why the following gives such different results, when -- from my reading of the prediction code in gbmentry.cpp -- it should be identical as well:

predict(gbm.iris, newdata = new.data, n.trees = 1) +
  predict(gbm.iris, newdata = new.data, n.trees = 2, single.tree = TRUE)

## , , 1
##           setosa versicolor   virginica
## [1,] -0.01176396 -0.1773327 -0.40307460
## [2,] -0.01176396 -0.1773327 -0.40307460
## [3,] -0.21420611  0.5957189  0.01550818
## [4,] -0.01176396 -0.1773327 -0.40307460

I'll keep looking into this.

@markseeto
Copy link
Author

Thanks for your reply @cunningjames. Not sure if I've understood you correctly, but I get different results if I change the value of shr.

@bgreenwell
Copy link
Contributor

bgreenwell commented Jun 2, 2021

@markseeto and @cunningjames . Sorry I'm super late to the party. A couple of things to note. The prediction obtained from predict(..., single.tree = TRUE) are already shrunk by the factor shr. Second, boosting starts from an initial value (e.g., the mean response for LS loss in regression, and something close to the logit for binary outcomes) and this initial value also needs to be added to get the final prediction from the ensemble. It's more complicated in the case of multinomial and it's possible it's bugged in gbm (hence the new warning), but it's easy to see in the binary case:

library(gbm)

set.seed(1)

shr <- 0.1  # shrinkage value
iris$Species <- ifelse(iris$Species == "setosa", 1, 0)
gbm.iris <- gbm(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width,
                data = iris, distribution = "bernoulli",
                shrinkage = shr, bag.fraction = 1, n.trees = 10)

new.data <- data.frame(Sepal.Length = c(4.7, 4.9, 6.2, 7.1),
                       Sepal.Width = c(2.7, 2.3, 3.1, 3.2),
                       Petal.Length = c(2.5, 2.5, 3.5, 4.9),
                       Petal.Width = c(0.5, 0.8, 1.8, 1.6))

# How is predict(gbm.iris, newdata = new.data, n.trees = 2) calculated?

predict(gbm.iris, newdata = new.data, n.trees = 2)
# [1] -0.9861826 -0.9861826 -0.9861826 -0.9861826

# Should be the same
p.setosa <- mean(iris$Species)
init <- log(p.setosa / (1 - p.setosa))  # boosting starts from the logit of P(Y = 1)

predict(gbm.iris, newdata = new.data, n.trees = 1, single.tree = TRUE) +
  predict(gbm.iris, newdata = new.data, n.trees = 2, single.tree = TRUE) +
  init
# [1] -0.9861826 -0.9861826 -0.9861826 -0.9861826

Hope this helps clear up some confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants