New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Handling of Count Data, e.g. Poisson Regression #515
Comments
Hello, two more boosting algorithms which can be used with count data:
Best |
i am a bit confused about why this is necessary. most (almost all?) regression methods for counts produce positive real-valued predictions. i don't see how this wouldn't work as a normal regression task. any count specific measures could be added without adding any other new structure. is there some other reason we should have count tasks that i've missed? If there is could it not be solved by adding additional structure to a regression task and maybe an additional property to regression learners (e.g., check/enforce non-negativity). |
Hi, |
As per our conversation from the weekend, this is to ping @berndbischl ... |
Hi, I was wondering if there is any way to do poisson regression using xgboost in mlr? I also need to be able to offset for exposure using xgboost setinfo "base_margin". Taking a guess , I would need to edit the trainLearner.regr.xgboost function in RLearner_regr_xgboost.R
Below is what some non-mlr code looks like, followed by a mlrMBO implementation I believe works, but doesnt use the makeLearner function. appendix 1 : Here is a sample non-mlr code. The goal would be to maximise cv$evaluation_log[, max(test_poisson_nloglik_mean)]
appendix 2: Here is some code based on mlrMBO that I believe works, but doesnt use the makeLearner function:
|
This would be a very useful addition, and I think it could be done with an addition of an Does anyone know why this might not be a good idea? |
If someone comes up with a PR for this, we are happy to review this. For now, I close here and advise future enhancements to be added to mlr3. |
I intended to use mlr for a benchmark of Poisson Regression models but realized, that support for count data models hasn't actually been implemented yet.
While successfully working around these limitations in my specific use case, I talked to @berndbischl and this issue/feature request is supposed to track and coordinate any efforts towards a proper implementation.
As to specific performance measures, these StackExchange threads might be helpful:
I also compiled a (likely incomplete) list of learners that might be considered (I'm personally not familiar with most of the more complex models, these might not be appropriate/a priority):
Models for Count Data
General Implementations (including models for count data)
Further Extensions (to the classical glm, including count data models)
Apparently Outdated
see also https://cran.r-project.org/web/packages/pscl/vignettes/countreg.pdf
Cheers,
Johannes
The text was updated successfully, but these errors were encountered: