Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent predictions in 2.1.4 vs 2.1.3 #28

Closed
WRobertLong opened this issue Oct 8, 2018 · 11 comments
Closed

Inconsistent predictions in 2.1.4 vs 2.1.3 #28

WRobertLong opened this issue Oct 8, 2018 · 11 comments

Comments

@WRobertLong
Copy link

@WRobertLong WRobertLong commented Oct 8, 2018

Here is a minimum working example to reproduce this problem, which did not occur in 2.1.3

mydata <- structure(list(Count = c(1L, 3L, 1L, 4L, 1L, 0L, 1L, 2L, 0L, 0L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 0L, 2L, 3L, 1L, 4L, 3L, 0L, 4L, 1L, 2L, 1L, 1L, 0L, 2L, 1L, 4L, 1L, 5L, 3L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 2L, 0L, 0L, 1L, 1L, 1L, 0L, 3L, 1L, 1L, 0L, 3L, 1L, 1L, 1L, 1L, 2L, 3L, 2L, 2L, 0L, 0L, 3L, 5L, 1L, 2L, 1L, 1L, 0L, 0L, 1L, 2L, 1L, 3L, 1L, 1L, 0L, 2L, 2L, 1L, 3L, 3L, 2L, 0L, 0L, 1L, 2L, 1L, 0L, 2L, 0L, 0L, 4L, 4L, 2L), Treat1 = structure(c(10L, 14L, 8L, 2L, 3L, 12L, 1L, 10L, 6L, 2L, 11L, 11L, 15L, 1L, 8L, 3L, 13L, 9L, 9L, 11L, 1L, 8L, 14L, 5L, 10L, 8L, 15L, 11L, 7L, 6L, 13L, 11L, 7L, 1L, 1L, 2L, 7L, 12L, 5L, 1L, 8L, 1L, 9L, 8L,12L, 14L, 12L, 7L, 8L, 14L, 3L, 3L, 5L, 1L, 1L, 11L, 6L, 5L, 5L, 13L, 9L, 3L, 8L, 9L, 13L, 9L, 7L, 9L, 2L, 6L, 10L, 3L, 11L, 4L, 3L, 15L, 12L, 6L, 4L, 3L, 8L, 8L, 11L, 1L, 11L, 2L, 11L, 5L, 12L, 6L, 8L, 14L, 1L, 9L, 9L, 10L, 10L, 5L, 14L, 3L), .Label = c("D", "U", "R", "E", "C", "Y", "L", "O", "G", "T", "N", "J", "V", "X", "A"), class = "factor"), Treat2 = structure(c(15L, 13L, 7L, 8L, 2L, 5L, 15L, 4L, 2L, 7L, 6L, 2L, 3L, 14L, 10L, 7L, 7L, 14L, 11L, 7L, 6L, 1L, 5L, 13L, 11L, 6L, 10L, 5L, 3L, 1L, 7L, 9L, 6L, 10L, 5L, 11L, 15L, 9L, 7L, 11L, 10L, 2L, 3L, 3L, 5L, 11L, 8L, 6L,4L, 5L, 15L, 8L, 8L, 2L, 2L, 10L, 4L, 1L, 10L, 11L, 10L, 8L, 7L, 7L, 8L, 14L, 16L, 11L, 10L, 9L, 3L, 15L, 13L, 1L, 11L, 11L, 9L, 7L, 10L, 9L, 3L, 7L, 5L, 13L, 3L, 14L, 10L, 10L, 15L, 13L, 15L, 12L, 14L, 11L, 5L, 4L, 2L, 3L, 11L, 10L), .Label = c("B", "X", "R", "H", "L", "D", "U", "Q", "K", "C", "T", "V", "J", "E", "F", "A"), class = "factor"), Near = c(0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0), Co1 = c(2, 5, 1, 1, 0, 1, 1, 2, 1, 2, 5, 2, 1, 0, 1, 2, 6, 3, 3, 1, 2, 2, 3, 0, 1, 0, 1, 0, 2, 1, 0, 1, 2, 3, 1, 2, 2, 0, 0, 2, 3, 3, 1, 1, NA, 2, 0, 2, 1, NA, 1, 1, 0, 1, 2, 0, 2, 1, 1, 1, 2, 3, 1, 0, 4, 0, 0, 0, 2, 2, 1, 1,2, 0, 1, 2, 1, 0, 0, 0, 0, 2, 1, 2, 2, 2, 2, 1, 0, 1, 1, 1, 1, 1, 0, 2, 0, 0, 5, 1), Co2 = c(1, 1, 2, 2, 4, 1, 3, 0, 5, 2, 2, 4, 1, 1, 2, 1, 2, 3, 0, 2, 3, 3, 0, 3, 1, 0, 1, 1, 1, 2, 0, 1, 1, 1, 2, 3, 2, 2, 3, 0, 0, 0, 1, 2, NA, 1, 1, 1, 0, 2, 1, 1, 2, 5, 0, 2, 1, 4, 1, 1, 3, 0, 1, 1, 1, 1, NA, 0, 2, 1, 1, 3, 2, 1, 2, 1, 3, 1, 2, 0, 1, 5, 2, 2, 1, 2, 3, 4, 3, 1, 1, 0, 5, 1, 1, 0, 1, 1, 2, 0)), .Names = c("Count", "Treat1", "Treat2", "Near", "Co1", "Co2"), row.names = c(1759L, 959L, 1265L, 1504L, 630L, 1905L, 1885L, 1140L, 1187L, 1792L, 1258L, 1125L, 756L, 778L, 1718L, 1797L, 388L, 715L, 63L, 311L, 1492L, 1128L, 629L, 536L, 503L, 651L, 1684L, 1893L, 721L, 1440L, 1872L, 1444L, 1593L, 143L, 1278L, 1558L, 1851L, 1168L, 1829L, 386L, 365L, 849L, 429L, 155L, 11L, 1644L, 101L, 985L, 72L, 459L, 1716L, 844L, 1313L, 77L, 1870L, 744L, 219L, 513L, 644L, 831L, 338L, 284L, 211L, 1096L,243L, 1717L, 1881L, 1784L, 1017L, 992L, 45L, 707L, 489L, 1267L, 1152L, 1819L, 995L, 510L, 1350L, 1700L, 56L, 1754L, 725L, 1625L, 319L, 1818L, 1287L, 1634L, 953L, 1351L, 1787L, 923L, 917L, 484L, 886L, 390L, 1531L, 679L, 1811L, 1736L), class = "data.frame")

set.seed(12345)
require(gbm)

n.trees <- 10000

m1.gbm <- gbm(Count ~ Treat1 + Treat2 + Near + Co1 + Co2, data = mydata, distribution = "poisson", n.trees = n.trees)

head(predict(m1.gbm, newdata = mydata, n.trees = n.trees, type = "response"))
predict(m1.gbm, newdata = head(mydata), n.trees = n.trees, type = "response")

The output from the last 2 lines is expected to be the same, and it is under 2.1.3, but not 2.1.4

@bgreenwell
Copy link
Contributor

@bgreenwell bgreenwell commented Oct 8, 2018

@WRobertLong Thanks for reporting the issue! The predictions are the same, but out of order. Should be an easy fix; however, it'll probably have to wait until this weekend or early next week.

@WRobertLong
Copy link
Author

@WRobertLong WRobertLong commented Oct 10, 2018

No worries - thank you !

@patel643
Copy link

@patel643 patel643 commented Oct 17, 2018

The predictions are no longer accurate for the project I am working on. The predictions are varying based on the number of entries in the dataframe.

Example:
Prediction for one entry:
0.2287335

Predictions for 22 entries (1st entry is same as for one entry):
0.4799269 0.4846480 0.4911518 0.4800993 0.3232137 0.3232137 0.5234675 0.3244430 0.3461877 0.4864271 0.3301743 0.3301743 0.4526724 0.3244430 0.3244430 0.3244430 0.5234675 0.2429707 0.3301743 0.3232137 0.5725072 0.5739171

Predictions for 84 entries (First 22 entries are same as before but results are different. This is same as what I got for the previous version):
0.5840900 0.5886760 0.6351799 0.6248663 0.4937054 0.4937054 0.5701077 0.4485763 0.4728177 0.6307872 0.4973961 0.4973961 0.3725723 0.4485763 0.4485763 0.4485763 0.5701077 0.2429707 0.4973961 0.4937054 0.7575828 0.7261662 0.7148046 0.7362220 0.7205555 0.7148046 0.7668554 0.7398776 0.7668554 0.7575828 0.7668554 0.7362220 0.7702189 0.7668554 0.7362220 0.7575828 0.7575828 0.7575828 0.7261662 0.7575828 0.7398776 0.7702189 0.7702189 0.7261662 0.7523963 0.7702189 0.7702189 0.7398776 0.7668554 0.7575828 0.7575828 0.7668554 0.2478886 0.4935844 0.4346638 0.5170595 0.5178051 0.4485763 0.4983111 0.4973961 0.4485763 0.4484464 0.5217794 0.4484464 0.4728177 0.2478886 0.5217794 0.4973961 0.4377021 0.4579584 0.3461877 0.4758983 0.4580888 0.4776751 0.4776751 0.4516342 0.3048697 0.4758983 0.4441189 0.4711845 0.2429707 0.4758983 0.4516342 0.2237648

So instead of getting 0.5840900 for even that same single entry, I am getting a drastic difference of around 0.33, given that I am assigning a score between 0 and 1.

@bgreenwell
Copy link
Contributor

@bgreenwell bgreenwell commented Oct 17, 2018

@patel643 We think we found the issue; seems to be related this patch. Does the training data in your project include factors? If so, then this patch is likely the culprit and we can push a fix quickly.

@patel643
Copy link

@patel643 patel643 commented Oct 17, 2018

@bgreenwell Yes, it does.

@bgreenwell
Copy link
Contributor

@bgreenwell bgreenwell commented Oct 17, 2018

@patel643 thanks for reporting the issue, we'll try to have a patch in the next week or so!

@bgreenwell
Copy link
Contributor

@bgreenwell bgreenwell commented Oct 25, 2018

@WRobertLong and @patel643 I Just pushed a fix for this issue, can both of you confirm that the issue has been resolved?

@patel643
Copy link

@patel643 patel643 commented Oct 30, 2018

@bgreenwell I am still facing the issue.

@bgreenwell
Copy link
Contributor

@bgreenwell bgreenwell commented Oct 30, 2018

@patel643 It's difficult to diagnose the cause of your issue without a reproducible example. Could you post one so that we could better diagnose the cause of your issue?

@bgreenwell bgreenwell closed this Oct 30, 2018
@bgreenwell bgreenwell reopened this Oct 30, 2018
@bgreenwell
Copy link
Contributor

@bgreenwell bgreenwell commented Oct 31, 2018

Also, I'm referring to the current development version of gbm:

# install.packages("devtools")
devtools::install_github("gbm-developers/gbm")
library(gbm)
# <your-code>
@patel643
Copy link

@patel643 patel643 commented Nov 13, 2018

@bgreenwell The inconsistency issue has been resolved. Thank you!

@bgreenwell bgreenwell closed this Nov 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.