Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH Add Poisson datafit #78

Merged
merged 15 commits into from
Oct 11, 2022
Merged

Conversation

PABannier
Copy link
Collaborator

Closes #62

@PABannier
Copy link
Collaborator Author

Note to self: Poisson datafit is not Lipschitz, need for a line search

Copy link
Collaborator

@Badr-MOUFAD Badr-MOUFAD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can instead use PN solver as it doesn't require datafits to be Lipschitz and performs a backtracking line search to find a suitable step.
We should implement two functions raw_gradient and raw_hessian as in the Logisitc datafit

def raw_grad(self, y, Xw):
"""Compute gradient of datafit w.r.t ``Xw``."""
return -y / (1 + np.exp(y * Xw)) / len(y)
def raw_hessian(self, y, Xw):
"""Compute Hessian of datafit w.r.t ``Xw``."""
exp_minus_yXw = np.exp(-y * Xw)
return exp_minus_yXw / (1 + exp_minus_yXw) ** 2 / len(y)

Did I get what you meant @PABannier?

@PABannier PABannier marked this pull request as ready for review October 9, 2022 13:57
@mathurinm
Copy link
Collaborator

Casting y to positive values, I get convergence for all designs I tried (even 5000 x 3000). I think @PABannier you encountered issues because for negative ys, the loss is no longer convex.

It remains to add an example, test against sklearn.linear_model.PoissonRegressor and we are good to go on my side

test_poisson.py Outdated Show resolved Hide resolved
@mathurinm
Copy link
Collaborator

Good idea ! But we don't want statsmodel as a dependency for the main code, so it will need to be a test dependency only

I thought sklearn supported regularization, but it's only L2 apparently.

To discuss: it we match sklearn on Poisson + L2, is it enough or do we need to add a statsmodel test requirement to test Poisson + L1 ? @QB3 @Klopfe @PABannier thoughts on this tradeoff ?

@Badr-MOUFAD
Copy link
Collaborator

Yes absolutely, it will be a test dependency as we did in square root Lasso

@PABannier
Copy link
Collaborator Author

I'd opt for comparing l1-regularized models. I've added statsmodels as a test dependency.

@PABannier
Copy link
Collaborator Author

For the example, do you think about something in particular @mathurinm ?

@PABannier PABannier changed the title WIP Add Poisson datafit ENH Add Poisson datafit Oct 9, 2022
Copy link
Collaborator

@mathurinm mathurinm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR looks great, once again a nice show of force from our design and solvers: it costs 20 lines to add a whole new model !!!

We only need to be a bit more careful with the statsmodel dependecy and that will be a green light for me

.github/workflows/main.yml Outdated Show resolved Hide resolved
@@ -3,12 +3,13 @@

from sklearn.linear_model import HuberRegressor
from numpy.testing import assert_allclose, assert_array_less
from statsmodels.discrete.discrete_model import Poisson as PoissonRegressor
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See discussion here: https://github.com/scikit-learn-contrib/skglm/pull/57/files#r990803934

we need a more advanced scheme to avoid imposing statsmodel for every user, yet to make it clear that it's a test dependency

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've changed it

@mathurinm
Copy link
Collaborator

For the example, do you think about something in particular @mathurinm ?

It can be anything really, but to quote the great man "if it's not in the doc, it does not exist". Maybe check out what sklearn has ?

@mathurinm mathurinm merged commit 182cae8 into scikit-learn-contrib:main Oct 11, 2022
@mathurinm
Copy link
Collaborator

Thanks @PABannier

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH add Poisson datafit
3 participants