Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect equation for mean residual deviance for Gradient Boosting Machines using Poisson distributions #6995

Closed
exalate-issue-sync bot opened this issue May 11, 2023 · 4 comments

Comments

@exalate-issue-sync
Copy link

I have noticed that when using an H2OGradientBoostingEstimator with a Poisson distribution the reported mean residual deviance is different when compared to the results I get applying Scikit-Learn's mean_poisson_deviance function on the true and predicted values. This is not the case when I use H2OGeneralizedLinearEstimator instead. I found the following definition of deviance for Poisson distributions, nevertheless it is not the same as the one used for GLMs:

public double deviance(double w, double y, double f) {

As someone pointed out in the following post, similar discrepancies seem to be present for the Gamma deviance as well:

https://stackoverflow.com/questions/53485434/mean-residual-deviance-formula-in-h2o

@exalate-issue-sync
Copy link
Author

Paul Donnelly commented: I made an issue awhile back with a similar situation, for Gamma GBMs: [https://h2oai.atlassian.net/browse/PUBDEV-7999|https://h2oai.atlassian.net/browse/PUBDEV-7999|smart-link]
I eventually found that DistributionFactory script you linked and used it to find that h2o’s deviance calculation is dropping a constant term, I suspect to save computation. In the case of Gamma, I found it to be: {{2 * sum(2*w + log(y))/sum(w)}}
(where {{w}} is the weight column, and {{y}} is the actual)

Looking into Poisson, I think the dropped Poisson constant term is: {{2 * w * y *(log(y) - 1)}}

I personally wish the returned model metrics included the constant term, for consistency with other calculations and reporting accuracy.

@exalate-issue-sync
Copy link
Author

Eduardo Aguilar Moreno commented: Thanks for the info Paul. I tried to look first for tickets regarding this issue but could not find yours. In any case I just found weird that I came up with different results when calculating the deviance “by hand”. I would also prefer to have consistency in the calculation of this metric among different models to make performance comparisons between them.

@h2o-ops
Copy link
Collaborator

h2o-ops commented May 14, 2023

JIRA Issue Details

Jira Issue: PUBDEV-8708
Assignee: Yuliia Syzon
Reporter: N/A
State: Closed
Fix Version: 3.40.0.2
Attachments: N/A
Development PRs: Available

@h2o-ops
Copy link
Collaborator

h2o-ops commented May 14, 2023

Linked PRs from JIRA

#6430
#6495
#6653
#6655

@h2o-ops h2o-ops closed this as completed May 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants