ENH Add Poisson datafit #78

PABannier · 2022-10-03T22:36:00Z

Closes #62

PABannier · 2022-10-03T22:36:52Z

Note to self: Poisson datafit is not Lipschitz, need for a line search

Badr-MOUFAD

You can instead use PN solver as it doesn't require datafits to be Lipschitz and performs a backtracking line search to find a suitable step.
We should implement two functions raw_gradient and raw_hessian as in the Logisitc datafit

skglm/skglm/datafits/single_task.py

Lines 132 to 139 in 19a5a6f

    
           def raw_grad(self, y, Xw): 
        
               """Compute gradient of datafit w.r.t ``Xw``.""" 
        
               return -y / (1 + np.exp(y * Xw)) / len(y) 
        
           def raw_hessian(self, y, Xw): 
        
               """Compute Hessian of datafit w.r.t ``Xw``.""" 
        
               exp_minus_yXw = np.exp(-y * Xw) 
        
               return exp_minus_yXw / (1 + exp_minus_yXw) ** 2 / len(y)

Did I get what you meant @PABannier?

mathurinm · 2022-10-09T14:57:36Z

Casting y to positive values, I get convergence for all designs I tried (even 5000 x 3000). I think @PABannier you encountered issues because for negative ys, the loss is no longer convex.

It remains to add an example, test against sklearn.linear_model.PoissonRegressor and we are good to go on my side

test_poisson.py

mathurinm · 2022-10-09T15:08:01Z

Good idea ! But we don't want statsmodel as a dependency for the main code, so it will need to be a test dependency only

I thought sklearn supported regularization, but it's only L2 apparently.

To discuss: it we match sklearn on Poisson + L2, is it enough or do we need to add a statsmodel test requirement to test Poisson + L1 ? @QB3 @Klopfe @PABannier thoughts on this tradeoff ?

Badr-MOUFAD · 2022-10-09T15:12:43Z

Yes absolutely, it will be a test dependency as we did in square root Lasso

PABannier · 2022-10-09T20:19:55Z

I'd opt for comparing l1-regularized models. I've added statsmodels as a test dependency.

PABannier · 2022-10-09T20:55:35Z

For the example, do you think about something in particular @mathurinm ?

mathurinm

PR looks great, once again a nice show of force from our design and solvers: it costs 20 lines to add a whole new model !!!

We only need to be a bit more careful with the statsmodel dependecy and that will be a green light for me

.github/workflows/main.yml

mathurinm · 2022-10-10T06:38:55Z

skglm/tests/test_datafits.py

@@ -3,12 +3,13 @@

 from sklearn.linear_model import HuberRegressor
 from numpy.testing import assert_allclose, assert_array_less
+from statsmodels.discrete.discrete_model import Poisson as PoissonRegressor


See discussion here: https://github.com/scikit-learn-contrib/skglm/pull/57/files#r990803934

we need a more advanced scheme to avoid imposing statsmodel for every user, yet to make it clear that it's a test dependency

i've changed it

mathurinm · 2022-10-10T06:42:21Z

For the example, do you think about something in particular @mathurinm ?

It can be anything really, but to quote the great man "if it's not in the doc, it does not exist". Maybe check out what sklearn has ?

skglm/tests/test_datafits.py

mathurinm · 2022-10-11T11:58:13Z

Thanks @PABannier

initial commit

64d0782

PABannier added the Work In Progress label Oct 3, 2022

Badr-MOUFAD reviewed Oct 4, 2022

View reviewed changes

PABannier added 2 commits October 9, 2022 14:27

computation of raw grad and hessian

014e542

lint

fcdee5a

PABannier marked this pull request as ready for review October 9, 2022 13:57

PABannier requested a review from mathurinm October 9, 2022 13:58

add debug script, using positive y works

d91a6f2

Badr-MOUFAD reviewed Oct 9, 2022

View reviewed changes

test_poisson.py Outdated Show resolved Hide resolved

PABannier added 6 commits October 9, 2022 21:18

add docstrings

9d2c043

added test with statsmodels

ebe95c0

rm MM's toy file

a507213

linter

638b7a0

added statsmodels as dependency

b73c557

pydocstyle

fe0eacf

PABannier changed the title ~~WIP Add Poisson datafit~~ ENH Add Poisson datafit Oct 9, 2022

PABannier added Ready for review and removed Work In Progress labels Oct 9, 2022

mathurinm requested changes Oct 10, 2022

View reviewed changes

mathurinm mentioned this pull request Oct 10, 2022

ENH add support for GammaRegressor #81

Closed

PABannier added 2 commits October 10, 2022 09:19

revert dependencies

ba583a1

enh tests poisson

0ebbff2

mathurinm reviewed Oct 10, 2022

View reviewed changes

skglm/tests/test_datafits.py Outdated Show resolved Hide resolved

PABannier and others added 3 commits October 10, 2022 13:11

add message

7295944

cosmit use GLE in test

79c9c35

flake

42e4ffe

mathurinm approved these changes Oct 11, 2022

View reviewed changes

mathurinm merged commit 182cae8 into scikit-learn-contrib:main Oct 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH Add Poisson datafit #78

ENH Add Poisson datafit #78

PABannier commented Oct 3, 2022

PABannier commented Oct 3, 2022

Badr-MOUFAD left a comment

mathurinm commented Oct 9, 2022

mathurinm commented Oct 9, 2022

Badr-MOUFAD commented Oct 9, 2022

PABannier commented Oct 9, 2022

PABannier commented Oct 9, 2022

mathurinm left a comment

mathurinm Oct 10, 2022

PABannier Oct 10, 2022

mathurinm commented Oct 10, 2022

mathurinm commented Oct 11, 2022

	def raw_grad(self, y, Xw):
	"""Compute gradient of datafit w.r.t ``Xw``."""
	return -y / (1 + np.exp(y * Xw)) / len(y)

	def raw_hessian(self, y, Xw):
	"""Compute Hessian of datafit w.r.t ``Xw``."""
	exp_minus_yXw = np.exp(-y * Xw)
	return exp_minus_yXw / (1 + exp_minus_yXw) ** 2 / len(y)

ENH Add Poisson datafit #78

ENH Add Poisson datafit #78

Conversation

PABannier commented Oct 3, 2022

PABannier commented Oct 3, 2022

Badr-MOUFAD left a comment

Choose a reason for hiding this comment

mathurinm commented Oct 9, 2022

mathurinm commented Oct 9, 2022

Badr-MOUFAD commented Oct 9, 2022

PABannier commented Oct 9, 2022

PABannier commented Oct 9, 2022

mathurinm left a comment

Choose a reason for hiding this comment

mathurinm Oct 10, 2022

Choose a reason for hiding this comment

PABannier Oct 10, 2022

Choose a reason for hiding this comment

mathurinm commented Oct 10, 2022

mathurinm commented Oct 11, 2022