Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal Generalized linear models implementation (L2 + lbfgs) #14300

Merged
merged 292 commits into from Mar 4, 2020
Merged
Show file tree
Hide file tree
Changes from 212 commits
Commits
Show all changes
292 commits
Select commit Hold shift + click to select a range
919912c
Smarter intercept initialization and docstring improvements
Feb 17, 2019
01033e3
Fix false formula in starting_mu and improve start_params
Feb 20, 2019
4071a8a
Improve argument handling of P1 and P2
Feb 20, 2019
757bc3c
Fix doctest, test_poisson_enet, change IRLS to use lstsq, fix input c…
Feb 20, 2019
ed8e74f
Use pytest decorators and pytest.raises
Feb 23, 2019
fe876da
Add Logistic regression=Binomial + Logit
Feb 24, 2019
2993e03
More efficient sparse matrices and refactor of irls and cd solver
Apr 7, 2019
a6f9f13
Treat the intercept separately, i.e. X, P1, P2 never include intercept
Apr 20, 2019
c9a7a95
Revised option start_params
Apr 21, 2019
a7755de
Fix a few typos
rth Jun 4, 2019
9aa1fc4
Make module private
rth Jun 4, 2019
ca3eae2
Working on tests
rth Jun 4, 2019
61bc6b8
Improve tests
rth Jun 5, 2019
b24a7ca
Remove unused dec parameter in tests
rth Jun 5, 2019
f95b390
ENH: add Generalized Linear Models, issue #5975
Jul 18, 2017
09176b4
MAINT: merge branch 'GLM-impr' of https://github.com/rth/scikit-learn
Jun 9, 2019
def12ae
[MAINT] make glm private, fix typos, improve tests
Jun 9, 2019
9b574bd
Fix docstrings for the new print_changed_only=True by default
rth Jun 11, 2019
90299fd
Increase coverage
rth Jun 12, 2019
e3a5a9a
More tests and addressing some review comments
rth Jun 12, 2019
54b80b8
TST More specific checks of error messages in tests
rth Jun 13, 2019
e962859
Merge branch 'master' into GLM
rth Jun 27, 2019
7db0320
Add PoissonRegressor alias
rth Jun 14, 2019
dcfe9ed
TST Simplify comparison with ridge
rth Jun 27, 2019
4879bb6
EXA Add plot_tweedie_regression_insurance_claims.py
rth Jun 28, 2019
56069e5
EXA Fix issues with older pandas versions in example
rth Jun 28, 2019
53f3c5f
DOC Add second poisson regression example
rth Jul 9, 2019
ac1fef3
Merge remote-tracking branch 'upstream/master' into GLM-minimal
rth Jul 9, 2019
be5a3c4
Add GeneralizedHyperbolicSecant and BinomialDistributions
rth Jul 9, 2019
e67fecb
Remove start params option
rth Jul 9, 2019
62f4448
Remove L1 penalty and CD solver
rth Jul 9, 2019
d25042e
Remove newton CG algorithm
rth Jul 9, 2019
07ee495
Remove fisher_matrix, _observed_information and _eta_mu_score_fisher
rth Jul 9, 2019
d0eb285
Remove matrix L2 penalty and IRLS solver
rth Jul 9, 2019
1e4b538
Remove plot_poisson_spline_regression.py example
rth Jul 9, 2019
3265148
Remove random_state parameter
rth Jul 9, 2019
1862ab6
Lint
rth Jul 9, 2019
4154074
Fix docstring
rth Jul 10, 2019
c5d77d7
Remove unused core
rth Jul 10, 2019
9ab5ac2
Update examples/linear_model/plot_poisson_regression_non_normal_loss.py
rth Jul 13, 2019
e4d0be1
Update examples/linear_model/plot_poisson_regression_non_normal_loss.py
rth Jul 13, 2019
6ff4d58
Update doc/modules/linear_model.rst
rth Jul 13, 2019
13102d5
Update doc/modules/linear_model.rst
rth Jul 13, 2019
af89e52
Update doc/modules/linear_model.rst
rth Jul 13, 2019
3802420
Merge remote-tracking branch 'upstream/master' into GLM-minimal
rth Jul 13, 2019
ddc4b71
Use scipy.optimize.minimize interface for LBFGS optimizer
rth Jul 13, 2019
426ae1d
EXA wording and score in plot_tweedie_regression_insurance_claims.html
Jul 14, 2019
a404384
Address review comments
rth Jul 15, 2019
65796a3
Review comments on the documentation
rth Jul 16, 2019
e44afe7
Split the implementation into several files
rth Jul 16, 2019
5927379
Fix CI
rth Jul 16, 2019
a6df2a7
Add test_deviance_derivative
rth Jul 16, 2019
5af89a7
Fix sklearn/linear_model/setup.py
rth Jul 16, 2019
cd347d4
Remove variance and variance_derivative methods from distributions
rth Jul 17, 2019
0d7f9cd
Improve coverage
rth Jul 17, 2019
dbffad8
Remove mentions of the binomial distribution
rth Jul 17, 2019
d914ab2
Merge remote-tracking branch 'upstream/master' into GLM-minimal
rth Jul 19, 2019
3187204
Use common simple weight validation
rth Jul 19, 2019
cc03c1a
Simplify comments formatting
rth Jul 19, 2019
4d433d1
Merge remote-tracking branch 'upstream/master' into GLM-minimal
rth Jul 19, 2019
aa52b4a
Refactor to use TweedieDistribition in metrics
rth Jul 22, 2019
816aa8f
WIP
rth Jul 25, 2019
6500c81
Use Poisson deviance in examples
rth Jul 25, 2019
59a6d9d
Use PoissonRegressor and GammaRegressor in examples
rth Jul 25, 2019
03a8a2d
Improve documentation wording
rth Jul 26, 2019
bbf7f38
Use dataframe OpenML fetcher
rth Jul 26, 2019
49a3a8e
Refactor distibution bounds
rth Jul 26, 2019
228e8c8
Move deviance checks under destribution
rth Jul 26, 2019
09a57c9
Expose TweedieRegressor
rth Jul 26, 2019
4b485ca
Improve documentation
rth Jul 26, 2019
aa0adf1
Lint
rth Jul 26, 2019
abd47d7
Fix __init__
rth Jul 30, 2019
c65ac12
Merge remote-tracking branch 'upstream/master' into GLM-minimal
rth Jul 31, 2019
7a9d067
Update doc/modules/linear_model.rst
rth Aug 2, 2019
18b4503
Update doc/modules/linear_model.rst
rth Aug 2, 2019
29658d6
Update doc/modules/linear_model.rst
rth Aug 2, 2019
1ea70d3
Fix typos in documentation
Aug 7, 2019
efdcb5b
Update doc/modules/linear_model.rst
rth Aug 9, 2019
ef0d063
Update doc/modules/linear_model.rst
rth Aug 9, 2019
0125e1c
Update doc/modules/linear_model.rst
rth Aug 9, 2019
6a8a600
Update examples/linear_model/plot_poisson_regression_non_normal_loss.py
rth Aug 9, 2019
73f3bd1
Rename inverse.gaussian to inverse-gaussian
rth Aug 9, 2019
11b178f
Remove sample_weight parameter from predict
rth Aug 9, 2019
3806fbe
Remove redundant check_array in predict
rth Aug 9, 2019
ae1c672
Update doc/modules/linear_model.rst
Aug 11, 2019
f07c831
Remove dispersion
Aug 11, 2019
e34fb57
Merge branch 'upstream/master' into GLM-minimal
Aug 13, 2019
ebbbe9c
Update doc/modules/linear_model.rst
Aug 13, 2019
918e257
Update doc/modules/linear_model.rst
Aug 13, 2019
5236cd8
Merge branch 'GLM-minimal' of https://github.com/rth/scikit-learn int…
Aug 13, 2019
37d0f47
Use double `` when necessary
rth Aug 16, 2019
9c337f2
ax -> axes in plot_poisson_regression_non_normal_loss.py
rth Aug 16, 2019
5e05935
Update sklearn/linear_model/_glm/distribution.py
rth Aug 16, 2019
4a68213
Remove solver=auto
rth Aug 16, 2019
8ee5c85
Update sklearn/linear_model/_glm/glm.py
rth Aug 16, 2019
b106c25
Merge branch 'GLM-minimal' of github.com:rth/scikit-learn into GLM-mi…
rth Aug 16, 2019
a1f8aab
More review comments
rth Aug 16, 2019
c0999ea
Addressing reviews in tests
rth Aug 16, 2019
e09e336
More comments in tests
rth Aug 16, 2019
6601d30
Update linear_model.rst
Aug 17, 2019
81eabe3
Merge upstream/master into GLM-minimal
Aug 17, 2019
5174dae
Address check_is_fitted deprication of attributes
Aug 17, 2019
61dc13f
No LaTeX in docstrings
Aug 17, 2019
44524ca
Replace Tweedie p->power
Aug 17, 2019
58d2409
Replace Tweedie p->power
Aug 17, 2019
ee351e1
Fix tests due to Tweedie p->power
Aug 17, 2019
33fe9be
Simplify super(...)
Aug 18, 2019
94272e7
Replace Link.link(..) by __call__(..)
Aug 18, 2019
2457039
Replace 1. -> 1
Aug 18, 2019
6396d2c
Fix table in TweedieRegressor
Aug 18, 2019
8be0387
Improve docstring in plot_tweedie_regression_insurance_claims.py
rth Aug 22, 2019
da66fd5
Use train_test_split in tests
rth Aug 22, 2019
b9bc170
Fix TODO in test_warm_start
rth Aug 22, 2019
ab6c5d8
Revert "No LaTeX in docstrings"
rth Aug 22, 2019
b424a07
Remove n_iter_ check when warm start.
rth Aug 22, 2019
95a9058
Rename variable L2 -> coef_scaled
rth Aug 22, 2019
59eceb4
Minor fixes
rth Aug 22, 2019
12a5067
Merge remote-tracking branch 'upstream/master' into GLM-minimal
rth Aug 28, 2019
04f30f4
Better wording in example
rth Aug 28, 2019
3630b52
Improvements in plot_poisson_regression_non_normal_loss.py
rth Aug 28, 2019
516eadb
Improvements in plot_tweedie_regression_insurance_claims.py
rth Aug 28, 2019
5e14928
Drop unused ExponentialDispersionModel._upper_bound
rth Aug 28, 2019
6cc1df5
Move notes and references from docstrings to user manual
rth Aug 28, 2019
752d6aa
More explanatory comments in the code
rth Aug 28, 2019
38a4ad4
Fix requires_positive_y tag
rth Aug 28, 2019
c15a1cc
Remove Link.inverse_derivative2
rth Aug 28, 2019
37de07b
Rename p to power parameter in mean_tweedie_deviance
rth Aug 30, 2019
adbf997
Rename predicted mean mu to y_pred
rth Aug 30, 2019
47dbc84
Fix link parameter documentation in TweedieRegression
rth Aug 30, 2019
3b526e9
EXA Use a simpler pipeline for GBDT in poisson regression example
rth Aug 30, 2019
b1eb611
Minor fixes for user guide
Sep 1, 2019
d964c01
EXA Poisson: minor changes
Sep 1, 2019
a1844b8
Fix mu->y_pred and p->power
Sep 2, 2019
f513392
EXA Tweedie: some improvements
Sep 3, 2019
84229a6
Fix doc test
Sep 3, 2019
dd22699
Merge branch 'master' into GLM-minimal
rth Sep 11, 2019
059aeb7
Merge remote-tracking branch 'upstream/master' into GLM-minimal
rth Sep 11, 2019
8c6c255
Fix test
rth Sep 11, 2019
0a23313
EXA Use Ridge and remove eps
rth Sep 12, 2019
29964af
Merge remote-tracking branch 'upstream/master' into GLM-minimal
rth Sep 16, 2019
976b436
Address comments in plot_poisson_regression_non_normal_loss.py
rth Sep 16, 2019
7c850d1
Lint
rth Sep 16, 2019
f64dc4a
Simplify plot_tweedie_regression_insurance_claims.py example
rth Sep 16, 2019
b1f5bde
Add "lift curve" for model validation in Poisson example
rth Sep 18, 2019
a9ab4e4
Various improvements to the model comparison example
ogrisel Sep 25, 2019
be7bb67
Add cumulated claims plot
ogrisel Sep 25, 2019
4125c20
Improve the cumulated nb claims plot
ogrisel Sep 26, 2019
0070d52
Fix wrong xlabel in histogram plot
ogrisel Sep 26, 2019
9d6bb52
More example improvements (preprocessors + plots)
ogrisel Sep 26, 2019
b353b2d
Simplify dataset + use more data
ogrisel Sep 26, 2019
88757fd
Remove solver parameter from {Poisson,Gamma,Tweedie}Regression
rth Sep 26, 2019
6d119d4
Revert some accidental changes from 88757fdb99cc516be230fe08ec1ebfb7b…
rth Sep 26, 2019
b735eb7
Additional comment about the use of properties with setters
rth Sep 26, 2019
2d91114
Add additional tests for link derivatives
rth Sep 26, 2019
89103bc
cosmits + typos
agramfort Sep 29, 2019
4f28a44
Address some of Alex's comments
rth Sep 30, 2019
d4dfd0b
Removing unnecessary comments / asarray call
rth Sep 30, 2019
64d6fbd
Update doc/modules/linear_model.rst
rth Oct 3, 2019
82ace9f
Remove unused solver parameter in tests
rth Oct 3, 2019
5288a0f
Add test for sample_weight consistency
rth Oct 3, 2019
499e8d2
Move GLM losses under sklearn._loss.glm_distribution
rth Oct 3, 2019
f4aa839
Update sklearn/linear_model/_glm/glm.py
rth Oct 3, 2019
48fcbe6
Add missing config.add_subpackage in setup.py
rth Oct 3, 2019
d71fb9f
Address Nicolas comments in the documentation (partial)
rth Oct 3, 2019
fa90272
More cleanups in the plot_tweedie_regression_insurance_claims.py example
rth Oct 3, 2019
4d16f31
Typos and text improvement in poisson example
Oct 6, 2019
15eb1d3
EXA sharey for histograms
Oct 6, 2019
3d097c6
Plot y_pred histograms on the test set
ogrisel Oct 8, 2019
6372287
Merge remote-tracking branch 'origin/master' into GLM-minimal
ogrisel Oct 8, 2019
b117856
Merge remote-tracking branch 'upstream/master' into GLM-minimal
rth Oct 8, 2019
31f5b3d
Compound Poisson => Compound Poisson Gamma
ogrisel Oct 9, 2019
a498ff5
Compound Poisson => Compound Poisson Gamma
ogrisel Oct 9, 2019
3fae28a
Various improvement in Tweedie regression example
ogrisel Oct 9, 2019
a2b6841
Merge remote-tracking branch 'origin/master' into GLM-minimal
ogrisel Oct 9, 2019
a47798a
Update doc/modules/linear_model.rst
rth Oct 10, 2019
83391dd
Use latest docstring conventions everywhere
rth Oct 10, 2019
3bfb54e
Drop check_input parameter
rth Oct 10, 2019
d325fe2
Use keyword only arguments SLEP009
rth Oct 10, 2019
661cf56
Move _y_pred_deviance_derivative from losses as a private function
rth Oct 10, 2019
560c180
Fix cumulated claim amount curve in Tweedie regression example
ogrisel Oct 10, 2019
0ea2dce
PEP8
ogrisel Oct 10, 2019
4ca2e95
MNT remove function body in abstract methods
rth Oct 14, 2019
89b429d
Improvements to Pure Premium example
ogrisel Oct 14, 2019
2d0b195
s/ordered Lorenz/Lorenz/
ogrisel Oct 14, 2019
ea6a3e8
More doc improvements to the pure premium example
ogrisel Oct 14, 2019
ddae396
doc update, simplification and fixes
NicolasHug Oct 15, 2019
21572d9
put back doc for BaseLink removed by mistake
NicolasHug Oct 15, 2019
d7ff6f4
TST GLM/Ridge comparison with sample_weight (xfail for now)
rth Oct 17, 2019
939d240
TST More invariance checks for sample_weight
rth Oct 17, 2019
da0d2a6
Remove copy_X parameter
rth Oct 17, 2019
162fb3b
Minor doc improvements
Oct 17, 2019
cafc92f
DOC consistent coef_=w for OMP in user guide
Oct 17, 2019
d3235db
EXA typos
Oct 17, 2019
9401230
EXA set less extreme Tweedie power=1.9
Oct 17, 2019
5bc48e5
TST fix normal_ridge_comparison with sample_weight
Oct 17, 2019
e572c31
DOC advice against cv for Tweedie power in UG
Oct 20, 2019
8d70042
DOC improve advice for power cv
Oct 20, 2019
6dc5e34
Merger master into GLM-minimal
Nov 10, 2019
c479889
EXA rely on default of FunctionTransformer
Nov 10, 2019
3628cff
Merge remote-tracking branch 'rth/GLM-minimal' into GLM-minimal
Nov 10, 2019
88d150e
EXA print all columns of DataFrame with max_columns option
Nov 10, 2019
0d31f47
EXA improve wording Poisson target
Nov 10, 2019
3ab2877
EXA increase digits in scores
Nov 10, 2019
87de01b
EXA convergence issue solve with max_iter
Nov 10, 2019
072f4fb
Merge branch 'master' into GLM-minimal
Jan 29, 2020
fbc22d8
Update sklearn/metrics/_regression.py
rth Jan 29, 2020
27ae4a2
DOC Add what's new entry
rth Jan 29, 2020
51be7e1
Removed myself from what's new
NicolasHug Jan 30, 2020
25f0e53
CLN Minor link clean up
thomasjpfan Feb 4, 2020
5eddf9c
CLN one word too much
Feb 5, 2020
81068ce
Merge branch 'master' of github.com:scikit-learn/scikit-learn into GL…
NicolasHug Feb 28, 2020
c700acf
minor typo
NicolasHug Feb 28, 2020
a126f4a
remove unused symbol
NicolasHug Feb 28, 2020
82c4483
use j instead of i to index features
NicolasHug Feb 28, 2020
d3083db
removed redundant variable
NicolasHug Feb 28, 2020
f63f795
removed unused var
NicolasHug Feb 28, 2020
45de110
Added comment and basic test to family attribute
NicolasHug Feb 28, 2020
b8459b0
Added test for link='auto'
NicolasHug Feb 28, 2020
581b4d7
more comment about OneHotEncoder vs OrdinalEncoder for forests
NicolasHug Feb 28, 2020
bb75435
minor typos or formulations
NicolasHug Feb 28, 2020
94dfc00
Remove unused ExponentialDispersionModel.unit_variance_derivative
rth Feb 29, 2020
0810bf3
DOC more detailed description of the dataset used in examples
rth Feb 29, 2020
a90a0aa
EXA fix minor typo
Feb 29, 2020
f74ab96
EXA compare metric by metric
Feb 29, 2020
a349be7
Add examples of use-cases
rth Feb 29, 2020
497a76c
Add figure with Poisson, Gamma and Tweedie distributions
rth Feb 29, 2020
c8c902f
DOC fix minor typo
Feb 29, 2020
bda7ad6
DOC add point mass to plot
Feb 29, 2020
83c2ba6
TST remove futile arguments
Feb 29, 2020
7498f3e
TST increase rtol
Feb 29, 2020
90b1213
TST add fit_intercept to test_glm_identity_regression
Feb 29, 2020
697bda2
TST ignore one specific ConvergenceWarning
Feb 29, 2020
578408c
TST add fit_intercept to test_glm_log_regression
Feb 29, 2020
2668910
EXA comment on penalty strenght GLM vs Ridge
Feb 29, 2020
d1c3dc9
EXA fix nitpicks
Feb 29, 2020
a9686f6
EXA remove empty line
Feb 29, 2020
04e7aca
EXA add blank line after function definition E305
Feb 29, 2020
21a739c
Gamma -> Poisson
NicolasHug Feb 29, 2020
79ada1e
Used X @ coef as suggested by Roman
NicolasHug Feb 29, 2020
39eeb44
minimal addition to clearly separate links in example
NicolasHug Feb 29, 2020
e3cf69d
Added a few definitions from the insurance jargon
NicolasHug Feb 29, 2020
56aa0d7
minor comment
NicolasHug Feb 29, 2020
27a344c
maybe fixed doc
NicolasHug Mar 1, 2020
e817b2c
forgot these
NicolasHug Mar 1, 2020
0fdc518
Update comment about read-only family attribute
rth Mar 1, 2020
6d4ecb2
Update figures to illustrate that they are not defined for Y<0
rth Mar 1, 2020
b96e021
updated remaining comment about property
NicolasHug Mar 1, 2020
293214c
Compare perfs of models side by side in example
NicolasHug Mar 1, 2020
987239a
Shorten text for df to fit fully in width
NicolasHug Mar 1, 2020
edba3b8
Use context manager instead?
NicolasHug Mar 1, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
16 changes: 16 additions & 0 deletions doc/modules/classes.rst
Expand Up @@ -854,6 +854,22 @@ Any estimator using the Huber loss would also be robust to outliers, e.g.
linear_model.RANSACRegressor
linear_model.TheilSenRegressor

Generalized linear models (GLM) for regression
----------------------------------------------

A generalization of linear models that allows for response variables to
have error distribution other than a normal distribution is implemented
in the following models,

.. autosummary::
:toctree: generated/
:template: class.rst

linear_model.PoissonRegressor
linear_model.TweedieRegressor
linear_model.GammaRegressor


Miscellaneous
-------------

Expand Down
164 changes: 163 additions & 1 deletion doc/modules/linear_model.rst
Expand Up @@ -875,7 +875,7 @@ with 'log' loss, which might be even faster but requires more tuning.
It is possible to obtain the p-values and confidence intervals for
coefficients in cases of regression without penalization. The `statsmodels
package <https://pypi.org/project/statsmodels/>` natively supports this.
Within sklearn, one could use bootstrapping instead as well.
Within sklearn, one could use bootstrapping instead as well.


:class:`LogisticRegressionCV` implements Logistic Regression with built-in
Expand All @@ -897,6 +897,168 @@ to warm-starting (see :term:`Glossary <warm_start>`).
.. [9] `"Performance Evaluation of Lbfgs vs other solvers"
<http://www.fuzihao.org/blog/2016/01/16/Comparison-of-Gradient-Descent-Stochastic-Gradient-Descent-and-L-BFGS/>`_

.. _Generalized_linear_regression:
NicolasHug marked this conversation as resolved.
Show resolved Hide resolved

Generalized Linear Regression
=============================

Generalized Linear Models (GLM) extend linear models in two ways
NicolasHug marked this conversation as resolved.
Show resolved Hide resolved
[10]_. First, the predicted values :math:`\hat{y}` are linked to a linear
combination of the input variables :math:`X` via an inverse link function
:math:`h` as
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is an inverse function, would it makes sense to denote it

$$h^{-1}$$

It would also be nice to suggest what this link function is doing. From what I saw this is a link between the linear predictor and the mean of the distribution?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, this is standard literature nomenclature: link function g: g(E[Y]) = Xw and inverse link function h: h(Xw) = g^{-1}(Xw) = E[Y].

Second, we had this info/equation in the user guide, but some reviewer wanted it gone 😢

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the link function g could be introduced e.g. below in the usage part.

I don't like using E[Y] because:

  • we don't talk about E[Y] for the rest of the linear models, so why here
  • it's actually E[Y/X], right? It seems to me that calling that E[Y] is another implicit nomenclature of GLMs, which we want to avoid in the UG precisely because it's implicit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is standard literature nomenclature

We should go with it then :) (my remarks are from somebody which is not familiar with GLMs)

Second, we had this info/equation in the user guide, but some reviewer wanted it gone

I was not thinking to go until this level of detail but more about defining with a couple of words what is the link function. Regarding the mathematical section, I'm fine with the majority of reviewers thought it was an overkill. I will not be pushy on this :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NicolasHug First excuse - it was late 😊 Second excuse, GLM literature is lazy and skips the conditional most of the time, e.g. E[Y] really means E[Y|X].

More seriously: Maybe, I missed that piece in the actual user guide, but I can't find a reference as to what regressors do actually estimate/predict (most of the time E[Y|X]). That's why I bring this up again (last time, promise). Should we add that info somewhere in the UG?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add that info somewhere in the UG?

That's a possibility, yes. Though I would prefer doing that in another PR to make sure the UG is overall consistent, since this would be a non-trivial update.

In the mean time, I still think we can have a reasonable UG for the GLMs that doesn't mention E[Y/X], we can just talk about predictions.


.. math:: \hat{y}(w, X) = h(x^\top w) = h(w_0 + w_1 X_1 + ... + w_p X_p).

Secondly, the squared loss function is replaced by the unit deviance :math:`d`
of a reproductive exponential dispersion model (EDM) [11]_. The minimization
problem becomes

.. math:: \min_{w} \frac{1}{2 \sum_i s_i} \sum_i s_i \cdot d(y_i, \hat{y}(w, X_i)) + \frac{\alpha}{2} ||w||_2

with sample weights :math:`s`, and L2 regularization penalty :math:`\alpha`.
rth marked this conversation as resolved.
Show resolved Hide resolved
The unit deviance is defined by the log of the :math:`\mathrm{EDM}(\mu, \phi)`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read this sentence as "the unite deviance == log likelihood". That can't be right?

likelihood as

.. math:: d(y, \mu) = -2\phi\cdot
\left( \log p(y|\mu,\phi)
- \log p(y|y,\phi)\right).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, what is p(y|y,phi)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pdf of the EDM where you plug in y for the expectation mu. This way, the best possible value of the deviance is 0.


The following table lists some specific EDM distributions—all are instances of Tweedie
distributions—and some of their properties.

================= =============================== ====================================== ============================================
Distribution Target Domain Unit Variance Function :math:`v(\mu)` Unit Deviance :math:`d(y, \mu)`
================= =============================== ====================================== ============================================
Normal :math:`y \in (-\infty, \infty)` :math:`1` :math:`(y-\mu)^2`
Poisson :math:`y \in [0, \infty)` :math:`\mu` :math:`2(y\log\frac{y}{\mu}-y+\mu)`
rth marked this conversation as resolved.
Show resolved Hide resolved
Gamma :math:`y \in (0, \infty)` :math:`\mu^2` :math:`2(\log\frac{\mu}{y}+\frac{y}{\mu}-1)`
Inverse Gaussian :math:`y \in (0, \infty)` :math:`\mu^3` :math:`\frac{(y-\mu)^2}{y\mu^2}`
================= =============================== ====================================== ============================================


Usage
-----

A GLM loss different from the classical squared loss might be appropriate in
the following cases:

* If the target values :math:`y` are counts (non-negative integer valued) or
frequencies (non-negative), you might use a Poisson deviance with log-link.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relative frequencies?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the term "relative frequencies"; i.e. relative to what?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the term "relative frequencies"

Me neither ^^

But I think in English a frequency is usually a count, whereas in French it's... A (relative) frequency, aka a ratio.

@jnothman maybe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think in English a frequency is usually a count, whereas in French it's... A (relative) frequency, aka a ratio

Are you sure? At least in physics frequency has the same meaning in English and in French as far as I know https://en.wikipedia.org/wiki/Frequency i.e. number of occurrences per unit of time, not just a count.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, I didn't know about that one. So according to wikipedia for statistics, in English frequency = absolute frequency, while in French frequency = relative frequency. I think the English version makes more sense with respect to the commonly defined definition outside of statistics.

Absolute frequency : number of occurrences during the experiment (presumably of some duration), so the units are still [occurrences / time]. It's just happens that it out case users have a different duration of observation (or exposure), so you normalize by that to compare users. That's still an absolute frequency, as far as I understand.

A relative frequency would be to normalize all those by the sum of frequencies, i.e. scale y by a constant factor, the norm, it shouldn't matter much but I don't see the point in this use case.

So I think this applies to both absolute and relative frequencies, but saying just "frequency" is probably more general.


* If the target values are positive valued and skewed, you might try a
Gamma deviance with log-link.

* If the target values seem to be heavier tailed than a Gamma distribution,
you might try an Inverse Gaussian deviance (or even higher variance powers
of the Tweedie family).

Since the linear predictor :math:`x^\top w` can be negative and
Poisson, Gamma and Inverse Gaussian distributions don't support negative values,
it is convenient to apply a link function different from the identity link
:math:`h(x^\top w)=x^\top w` that guarantees the non-negativeness, e.g. the
log-link `link='log'` with :math:`h(x^\top w)=\exp(x^\top w)`.

:class:`TweedieRegressor` implements a generalized linear model
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That part should be at the very beginning of the Usage section, directly followed by the small code example below

for the Tweedie distribution, that allows to model any of the above mentioned
distributions using the appropriate ``power`` parameter, i.e. the exponent
of the unit variance function:

- ``power = 0``: Normal distribution. Specialized solvers such as
:class:`Ridge`, :class:`ElasticNet` are generally
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we really mention ElasticNet considering that GLMs have L2 penalty only?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an aim to add L1 penalty in the future for this implementation, and I don't see the harm of mentioning ElasticNet there in any case.

more appropriate in this case.

- ``power = 1``: Poisson distribution. :class:`PoissonRegressor` is exposed for
convenience. However, it is strictly equivalent to
`TweedieRegressor(power=1)`.

- ``power = 2``: Gamma distribution. :class:`GammaRegressor` is exposed for
convenience. However, it is strictly equivalent to
`TweedieRegressor(power=2)`.

- ``power = 3``: Inverse Gamma distribution.


.. note::

* The feature matrix `X` should be standardized before fitting. This
ensures that the penalty treats features equally.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Standardized might be problematic if:

  • one expects non-linear threshold effects in which case binning might be interesting;
  • one expects heavy tailed marginal distribution in which case PowerTransformer or QuantileTransformer might be more suitable.

Maybe one could just say "scaled before fitting to ensure that the penalty treats features approximately equally"

* If you want to model a relative frequency, i.e. counts per exposure (time,
volume, ...) you can do so by a Poisson distribution and passing
:math:`y=\frac{\mathrm{counts}}{\mathrm{exposure}}` as target values
together with :math:`s=\mathrm{exposure}` as sample weights.

As an example, consider Poisson distributed counts z (integers) and
weights s=exposure (time, money, persons years, ...). Then you fit
y = z/s, i.e. ``PoissonRegressor.fit(X, y, sample_weight=s)``.
The weights are necessary for the right (finite sample) mean.
Considering :math:`\bar{y} = \frac{\sum_i s_i y_i}{\sum_i s_i}`,
in this case one might say that y has a 'scaled' Poisson distribution.
The same holds for other distributions.

* The fit itself does not need Y to be from an EDM, but only assumes
the first two moments to be :math:`E[Y_i]=\mu_i=h((Xw)_i)` and
:math:`Var[Y_i]=\frac{\phi}{s_i} v(\mu_i)`.

The estimator can be used as follows::

>>> from sklearn.linear_model import TweedieRegressor
>>> reg = TweedieRegressor(power=1, alpha=0.5, link='log')
>>> reg.fit([[0, 0], [0, 1], [2, 2]], [0, 1, 2])
TweedieRegressor(alpha=0.5, link='log', power=1)
>>> reg.coef_
array([0.2463..., 0.4337...])
>>> reg.intercept_
-0.7638...


.. topic:: Examples:

* :ref:`sphx_glr_auto_examples_linear_model_plot_tweedie_regression_insurance_claims.py`
* :ref:`sphx_glr_auto_examples_linear_model_plot_poisson_regression_non_normal_loss.py`

Mathematical formulation
------------------------
NicolasHug marked this conversation as resolved.
Show resolved Hide resolved

In the unpenalized case, the assumptions are the following:

* The target values :math:`y_i` are realizations of random variables
:math:`Y_i \overset{i.i.d}{\sim} \mathrm{EDM}(\mu_i, \frac{\phi}{s_i})`
with expectation :math:`\mu_i=\mathrm{E}[Y]`, dispersion parameter
:math:`\phi` and sample weights :math:`s_i`.
* The aim is to predict the expectation :math:`\mu_i` with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this to be redundant with the beginning?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is indeed partially redundant. I tried to find a balance of mathematical rigour when introducing GLMs and ended up writing this subsection after the introduction. There are, however, some more detailed information here.
Would you prefer to remove this subsection?

:math:`\hat{y}_i = h(\eta_i)`, linear predictor
:math:`\eta_i=(Xw)_i` and inverse link function :math:`h`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not x_i^Tw?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way, it becomes more obvious that eta can be seen as a vector?


Note that the first assumption implies
:math:`\mathrm{Var}[Y_i]=\frac{\phi}{s_i} v(\mu_i)` with unit variance
function :math:`v(\mu)`. Specifying a particular distribution of an EDM is the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is a unit variance function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variance of Y depends on a function of its mean mu=E[Y], i.e. v(mu). This is called unit variance function in the literature on EDMs and Tweedie distributions. We should be more precise in the table below and write there "unit variance function", too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using "unit variance function" in the table would help yes

same as specifying a unit variance function (they are one-to-one).

A few remarks:

* The deviance is independent of :math:`\phi`. Therefore, also the estimation
of the coefficients :math:`w` is independent of the dispersion parameter of
the EDM.
* The minimization is equivalent to (penalized) maximum likelihood estimation.
* The deviances for at least Normal, Poisson and Gamma distributions are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this remark (then again I haven't read [12]). Is this relevant for a scikit-learn user guide?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just says that you can be confident to get good (asymptotic) estimators for the expectation when using these deviances. I find that reassuring.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth using your sentence above then. A scoring function has an entirely different meaning in scikit-learn

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, that this property is more important for the deviance as a scoring function, introduced in #14263. Nevertheless, I added a sentence as I think this is also relevant for estimation/fitting.

strictly consistent scoring functions for the mean :math:`\mu`, see Eq.
(19)-(20) in [12]_. This means that, given an appropriate feature matrix `X`,
you get good (asymptotic) estimators for the expectation when using these
deviances.


.. topic:: References:

.. [10] McCullagh, Peter; Nelder, John (1989). Generalized Linear Models,
Second Edition. Boca Raton: Chapman and Hall/CRC. ISBN 0-412-31760-5.

.. [11] Jørgensen, B. (1992). The theory of exponential dispersion models
and analysis of deviance. Monografias de matemática, no. 51. See also
`Exponential dispersion model.
<https://en.wikipedia.org/wiki/Exponential_dispersion_model>`_

.. [12] Gneiting, T. (2010). `Making and Evaluating Point Forecasts.
<https://arxiv.org/pdf/0912.0902.pdf>`_

Stochastic Gradient Descent - SGD
=================================
Expand Down
6 changes: 3 additions & 3 deletions doc/whats_new/v0.22.rst
Expand Up @@ -420,10 +420,10 @@ Changelog
:user:`Mohamed Maskani <maskani-moh>`, and :user:`Thomas Fan <thomasjpfan>`.

- |Feature| Add :class:`metrics.mean_tweedie_deviance` measuring the
Tweedie deviance for a power parameter ``p``. Also add mean Poisson deviance
:class:`metrics.mean_poisson_deviance` and mean Gamma deviance
Tweedie deviance for a power parameter ``power``. Also add mean Poisson
deviance :class:`metrics.mean_poisson_deviance` and mean Gamma deviance
:class:`metrics.mean_gamma_deviance` that are special cases of the Tweedie
deviance for `p=1` and `p=2` respectively.
deviance for `power=1` and `power=2` respectively.
:pr:`13938` by :user:`Christian Lorentzen <lorentzenchr>` and
`Roman Yurchak`_.

Expand Down