ENH add Huber datafit #14

EnLAI111 · 2022-05-04T11:25:39Z

When calling the function construct_grad(X, y, w, Xw, datafit, ws), I get the follow error.

This error may have been caused by the following argument(s):
- argument 4: Cannot determine Numba type of <class '__main__.Huber'>

So maybe there are bugs in the class Huber somewhere, but I can't figure out where.

PABannier · 2022-05-04T11:28:57Z

Hello @EnLAI111 , you need to decorate your Huber datafit with @jitclass decorator and specify the data type of your class attributes. Look at Quadratic for an example!

skglm/datafits/Huber.py

mathurinm

Thanks a lot for the PR @EnLAI111 ! I did a first pass of comments, don't hesitate to ping me if something is not explicit enough or if you need feedback.

skglm/datafits/Huber.py

mathurinm · 2022-05-04T11:39:23Z

skglm/datafits/Huber.py

+        n_features = X.shape[1]
+        self.lipschitz = np.zeros(n_features, dtype=X.dtype)
+        for j in range(n_features):
+            self.lipschitz[j] = (np.where(np.abs(y) < self.delta,


I think the lipschitz constant should just be norm(X_j) ** 2?

I would have said norm(X_j) ** 2 / delta.

This makes me think on how do we want to solve this optimization problem?
If we want to use algorithms like in http://proceedings.mlr.press/v84/massias18a/massias18a.pdf and in https://arxiv.org/pdf/1902.02509.pdf, I think we will need a custom _cd_epoch function

Sure for delta ? https://github.com/scikit-learn-contrib/skglm/pull/14/files#diff-518236c0559dd6839714e8c437731d42952421ed7fed1319945a7d9bbe9f315eR20

There is no optimization over delta so far so no need for this solver, but if we need to optimize over this variable a dedicated solver will be needed

ok I see, there is no delta for this parametrization!

skglm/datafits/Huber.py

EnLAI111 · 2022-05-04T14:05:57Z

Thanks a lot for the PR @EnLAI111 ! I did a first pass of comments, don't hesitate to ping me if something is not explicit enough or if you need feedback.

Thanks for the comments and suggestion! I have fixed the problems. Now with WeightedL1 there is no more posted errors of if __name__ == '__main__' execution.

skglm/datafits/huber.py

tomMoral · 2022-05-04T15:07:41Z

Hello @EnLAI111 , you need to decorate your Huber datafit with @jitclass decorator and specify the data type of your class attributes. Look at Quadratic for an example!

Note that this is not explicit in the doc. When reading it, I though it would be automatized by some jit_factaory you have (which was surprising indeed ^^)

Adding the spec_quadratic in the example and explicitely mentionning the need for a jitclass (with a check in the subsequent class) would help I think.

…datafit

PABannier · 2022-06-15T09:25:05Z

To finalize this PR, unit tests should be added. Initially I wanted to compare it to sklearn's HuberRegressor but it turns out sklearn optimizes an additional parameter sigma. Any ideas on the unit tests we could write? @mathurinm @QB3

Klopfe · 2022-06-21T07:33:54Z

To finalize this PR, unit tests should be added. Initially I wanted to compare it to sklearn's HuberRegressor but it turns out sklearn optimizes an additional parameter sigma. Any ideas on the unit tests we could write? @mathurinm @QB3

if we just want to test the convergence, we could use cvxpy to solve the optimization problem and check if the betas are the same. What do you think?

mathurinm · 2022-07-13T09:34:51Z

@Klopfe using cvxpy would introduce heavy dependency, instead we can fit a sklearn Huber regressor, find its scale, and then fit our model with delta = clf.scale_ * clf.epsilon.

I have pushed a script that does it. Beware that sklearn uses squared L2 regularization, and that the penalty is not scaled by 1/2 (their docstring is not very clear). I had to use more samples than features otherwise it fits perfectly and scale goes to 0

We don't have a way to handle unpenalized problems so I have used WeightedL1 with 0 weights. For future works...

tomMoral · 2022-07-13T09:42:04Z

another solution could be to use scipy.optimize to solve the problem ?

mathurinm · 2022-07-13T09:43:23Z

It's what HuberRegressor does already, so why rewrite this code ?

mathurinm · 2022-07-13T15:29:10Z

Did my pass, merge when green if LGTY @QB3 @PABannier

mathurinm · 2022-07-14T07:53:27Z

Thanks @EnLAI111 !

add Huber datafits

8e04e2f

add L1 penalty without the 1st term

481c5a2

PABannier reviewed May 4, 2022

View reviewed changes

skglm/datafits/Huber.py Outdated Show resolved Hide resolved

PABannier reviewed May 4, 2022

View reviewed changes

skglm/datafits/Huber.py Outdated Show resolved Hide resolved

mathurinm reviewed May 4, 2022

View reviewed changes

EnLAI111 added 4 commits May 4, 2022 14:31

add jitclassr

5ac2aeb

rename file

896658c

fix pbm

2bdaa4f

fix pbm

fa16781

mathurinm reviewed May 4, 2022

View reviewed changes

skglm/datafits/huber.py Outdated Show resolved Hide resolved

skglm/datafits/huber.py Outdated Show resolved Hide resolved

skglm/datafits/huber.py Outdated Show resolved Hide resolved

skglm/datafits/huber.py Outdated Show resolved Hide resolved

EnLAI111 and others added 14 commits May 5, 2022 14:38

correct grad sparse

b485b21

correct grad sparse

ed6a85c

change parameter of grad sparse

18bd658

get_sys_info

be24161

delete get_sys_info

3aea12b

fix issue

c6bdadd

fix issue

c56db9d

fix error

bd018d6

fix error

3eccc3c

fix error

4d66e12

move huber to single_task and init

8bcfa83

explicit loop computations'

5c45e47

Merge branch 'main' of https://github.com/PABannier/skglm into Huber_…

4ffb8b6

…datafit

linter happy

a05e578

add test script for Huber

8403b0c

mathurinm added 2 commits July 13, 2022 11:48

add Huber to doc

c4db07d

test huber in test_datafits

71ed6a4

mathurinm changed the title ~~add Huber datafits~~ ENH add Huber datafit Jul 13, 2022

import assert_array_less

b4c41f0

QB3 merged commit f5e8154 into scikit-learn-contrib:main Jul 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH add Huber datafit #14

ENH add Huber datafit #14

EnLAI111 commented May 4, 2022

PABannier commented May 4, 2022

mathurinm left a comment

mathurinm May 4, 2022

QB3 May 4, 2022

mathurinm May 4, 2022

QB3 May 4, 2022

EnLAI111 commented May 4, 2022

tomMoral commented May 4, 2022 •

edited

PABannier commented Jun 15, 2022

Klopfe commented Jun 21, 2022

mathurinm commented Jul 13, 2022

tomMoral commented Jul 13, 2022

mathurinm commented Jul 13, 2022

mathurinm commented Jul 13, 2022

mathurinm commented Jul 14, 2022

ENH add Huber datafit #14

ENH add Huber datafit #14

Conversation

EnLAI111 commented May 4, 2022

PABannier commented May 4, 2022

mathurinm left a comment

Choose a reason for hiding this comment

mathurinm May 4, 2022

Choose a reason for hiding this comment

QB3 May 4, 2022

Choose a reason for hiding this comment

mathurinm May 4, 2022

Choose a reason for hiding this comment

QB3 May 4, 2022

Choose a reason for hiding this comment

EnLAI111 commented May 4, 2022

tomMoral commented May 4, 2022 • edited

PABannier commented Jun 15, 2022

Klopfe commented Jun 21, 2022

mathurinm commented Jul 13, 2022

tomMoral commented Jul 13, 2022

mathurinm commented Jul 13, 2022

mathurinm commented Jul 13, 2022

mathurinm commented Jul 14, 2022

tomMoral commented May 4, 2022 •

edited