You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have noticed that when using an H2OGradientBoostingEstimator with a Poisson distribution the reported mean residual deviance is different when compared to the results I get applying Scikit-Learn's mean_poisson_deviance function on the true and predicted values. This is not the case when I use H2OGeneralizedLinearEstimator instead. I found the following definition of deviance for Poisson distributions, nevertheless it is not the same as the one used for GLMs:
Paul Donnelly commented: I made an issue awhile back with a similar situation, for Gamma GBMs: [https://h2oai.atlassian.net/browse/PUBDEV-7999|https://h2oai.atlassian.net/browse/PUBDEV-7999|smart-link]
I eventually found that DistributionFactory script you linked and used it to find that h2o’s deviance calculation is dropping a constant term, I suspect to save computation. In the case of Gamma, I found it to be: {{2 * sum(2*w + log(y))/sum(w)}}
(where {{w}} is the weight column, and {{y}} is the actual)
Looking into Poisson, I think the dropped Poisson constant term is: {{2 * w * y *(log(y) - 1)}}
I personally wish the returned model metrics included the constant term, for consistency with other calculations and reporting accuracy.
Eduardo Aguilar Moreno commented: Thanks for the info Paul. I tried to look first for tickets regarding this issue but could not find yours. In any case I just found weird that I came up with different results when calculating the deviance “by hand”. I would also prefer to have consistency in the calculation of this metric among different models to make performance comparisons between them.
I have noticed that when using an H2OGradientBoostingEstimator with a Poisson distribution the reported mean residual deviance is different when compared to the results I get applying Scikit-Learn's mean_poisson_deviance function on the true and predicted values. This is not the case when I use H2OGeneralizedLinearEstimator instead. I found the following definition of deviance for Poisson distributions, nevertheless it is not the same as the one used for GLMs:
h2o-3/h2o-core/src/main/java/hex/DistributionFactory.java
Line 281 in 9c4d343
As someone pointed out in the following post, similar discrepancies seem to be present for the Gamma deviance as well:
https://stackoverflow.com/questions/53485434/mean-residual-deviance-formula-in-h2o
The text was updated successfully, but these errors were encountered: