Minimal Generalized linear models implementation (L2 + lbfgs) #14300
This only includes L2 penalty with the lbfgs solver. In other words, in excludes L1 penalties (or CD solver), matrix penalties, warm start, some distributions (e.g.
The goals is to get an easier to review initial implementation. Benchmarks were done in #9405 (comment)
* Fixed pep8 * Fixed flake8 * Rename GeneralizedLinearModel as GeneralizedLinearRegressor * Use of six.with_metaclass * PEP257: summary should be on same line as quotes * Docstring of class GeneralizedLinearRegressor: \ before mu * Arguments family and link accept strings * Use of ConvergenceWarning
* fixed bug: init parameter max_iter * fix API for family and link: default parameter changed to string non public variables self._family_instance and self._link_instance * fixed bug in score, minus sign forgotten * added check_is_fitted to estimate_phi and score * added check_array(X) in predict * replaced lambda functions in TweedieDistribution * some documentation
* make raw docstrings where appropriate * make ExponentialDispersionModel (i.e. TweedieDistribution) pickable: ExponentialDispersionModel has new properties include_lower_bound, method in_y_range is not abstract anymore. * set self.intercept_=0 if fit_intercept=False, such that it is always defined. * set score to D2, a generalized R2 with deviance instead of squared error, as does glmnet. This also solves issues with check_regressors_train(GeneralizedLinearRegressor), which assumes R2 score. * change of names: weight to weights in ExponentialDispersionModel and to sample_weight in GeneralizedLinearRegressor * add class method linear_predictor
* added L2 penalty * api change: alpha, l1_ratio, P1, P2, warm_start, check_input, copy_X * added entry in user guide * improved docstrings * helper function _irls_step
* added test: ridge poisson with log-link compared to glmnet * fix ValueError message for l1_ratio * fix ValueError message for P2 * string comparison: use '==' and '!=' instead of 'is' and 'is not' * fix RuntimeWarnings in unit_deviance of poisson: x*log(x) as xlogy * added test for fisher matrix * added test for family argument
* put arguments P1, P2 and check_input from fit to __init__ * added check_input test: is P2 positive definite? * added solver option: 'auto'
* added coordinate descent solver * skip doctest for GeneralizedLinearRegressor example * symmetrize P2 => use P2 = 1/2 (P2+P2') * better validation of parameter start_params
* bug for sparse matrices for newton-cg solver, function grad_hess * reduce precision for solver newton-cg in test_poisson_ridge * remedy doctest issues in linear_model.rst for example of GeneralizedLinearRegressor * remove unused import of xrange from six
* bug in cd solver for sparse matrices * higer precision (smaller tol) in test_normal_ridge for sparse matrices * for each solver a separate precision (tol) in test_poisson_ridge
* improved documentation * additional option 'zero' for argument start_params * validation of sample_weight in function predict * input validation of estimate_phi * set default fit_dispersion=None * bug in estimate_phi because of weight rescaling * test for estimate_phi in normal ridge regression * extended tests for elastic net poisson
* new helper function _check_weights for validation of sample_weight * fix white space issue in doctest of linear_model.rst
* fit_dispersion default=None also in docs. * improved docs. * fixed input validation of predict * fixed bug for sample_weight in estimate_phi
* improved input validation and testing of P1 * test case for validation of argument P2 * test case for validation of argument copy_X
* fix doctest failure in example of linear_model.rst * fix dtype issue in test_glm_P2_argument
@ogrisel @rth Very interesting/hard questions about the Poisson Example!
Second, the histograms are on the training set and have different scales on y-axis. A changed that with
Third, evaluating the Poisson deviance on observations with
This indicates, that the most part of the difference in Poisson deviance comes from
I draw the following conclusions/hypothesis:
That's a good point. By increasing the number of bins (e.g
However the plots tend to be too noisy and less readable so I prefer to keep
The test samples actually have approximately the same distribution (random shuffle split with hundreds of thousands of samples). The test set is smaller and the ylim threshold at 1e2 would hide the extreme values:
I will push a fix for this.
I tried to change the
It would be interesting to explore the "learned representation" of the RF model by projecting the data into a one-hot encoded high dim binary code with one dimension per leaf in the RF and then use PCA or PCA+t-SNE/UMAP to visualize the distribution of the test set any introspect the nature of the cluster identified by the RF model and how they relate to the risky-ness of the policyholders. But this example is already complex enough like this I think.
Interesting hypothesis. I have tried binning those features but it did not change much the results (slightly worse deviance, although probably not significant) and similar histogram for