FAQ/ENH: cross validation performance measures (GLM) #2572

josef-pkt · 2015-08-04T06:54:20Z

So far we have very little support for cross-validation build in. GAM PR #2435 adds cross-validation for penalized GLM.

related issues #1577 #2027 #1282
#2435 (comment)

General question: What are the performance criteria for general model, especially generalized linear models?

We have tools.eval_measures which only includes measures targeted toward continuous variables.
sklearn http://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics has a large number for classification and others, but not count variables.

a few links to related discussions
https://stat.ethz.ch/R-manual/R-devel/library/boot/html/cv.glm.html doesn't say much about cost, use thresholded 0-1 prediction for Binomial/Binary
http://stats.stackexchange.com/questions/48811/cost-function-for-validating-poisson-regression-models related question for Poisson, comment mentions cv.glmnet uses deviance
http://stackoverflow.com/questions/23904179/calculate-cross-validation-for-generalized-linear-model-in-matlab matlab question, not discussion about cost function (example for interface)
http://www.uni-kiel.de/psychologie/rexrepos/posts/crossvalidation.html binary only uses Brier score
(looks like a nice collection of examples)

general
(my intuition, no references, I don't remember reading anything specific)

objective: estimation of model versus prediction

In prediction like scikit learn or Kaggle competition, the objective for cross-validation is based on a subject/application specific cost/utility function. This does not need to be consistent with the objective function of the estimator. (But then maybe we should change the objective of the estimator also for parameters and not just for hyper parameters.)

For model estimation, I think we should use the same criteria for the hyper parameters as for the standard parameters. In the GLM case this would be deviance or loglike. In the linear model case it is mean squared error which is equivalent to the loglike (for fixed standard deviation ?).
Another possibility (I'm not sure) would be to look at the score_obs/estimating equations.

Implementation for GLM

In general, we don't have access to loglike_obs to evaluated at observations that are not in the estimation sample. In the case of GLM, we have it defined in the family which, I think, could be easily reused.

The text was updated successfully, but these errors were encountered:

josef-pkt · 2021-07-02T21:14:36Z

If we have get_distribution, then we can evaluate any properties of the fully specified distribution, e.g. logpdf for out of sample observations.

josef-pkt · 2021-09-06T16:02:08Z

temporary bump for 0.14, even though it's labelled as FAQ

We should get some examples for out of sample eval_measures with GLM (using families) and others using get_distribution.
For newer count models like ZI, I used in-sample predictive distribution fit measures as model selection criteria.
Something similar could be applied to out of sample predictive fit measures.
My notebook examples have model/distribution selection and not variable selection or hyperparameter selection.

josef-pkt added type-enh comp-base design comp-genmod comp-discrete FAQ labels Aug 4, 2015

josef-pkt mentioned this issue Aug 28, 2017

diagnostic, gof for Poisson and count data #1537

Open

josef-pkt mentioned this issue May 4, 2018

SUMM/Design/ENH: API and structure for hyper parameter selection #4585

Open

josef-pkt mentioned this issue Jul 2, 2021

ENH: add estimated variance of endog to GLMResults #7528

Open

josef-pkt added the topic-diagnostic label Sep 6, 2021

josef-pkt added this to the 0.14 milestone Sep 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FAQ/ENH: cross validation performance measures (GLM) #2572

FAQ/ENH: cross validation performance measures (GLM) #2572

josef-pkt commented Aug 4, 2015

josef-pkt commented Jul 2, 2021

josef-pkt commented Sep 6, 2021

FAQ/ENH: cross validation performance measures (GLM) #2572

FAQ/ENH: cross validation performance measures (GLM) #2572

Comments

josef-pkt commented Aug 4, 2015

josef-pkt commented Jul 2, 2021

josef-pkt commented Sep 6, 2021