You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
general
(my intuition, no references, I don't remember reading anything specific)
objective: estimation of model versus prediction
In prediction like scikit learn or Kaggle competition, the objective for cross-validation is based on a subject/application specific cost/utility function. This does not need to be consistent with the objective function of the estimator. (But then maybe we should change the objective of the estimator also for parameters and not just for hyper parameters.)
For model estimation, I think we should use the same criteria for the hyper parameters as for the standard parameters. In the GLM case this would be deviance or loglike. In the linear model case it is mean squared error which is equivalent to the loglike (for fixed standard deviation ?).
Another possibility (I'm not sure) would be to look at the score_obs/estimating equations.
Implementation for GLM
In general, we don't have access to loglike_obs to evaluated at observations that are not in the estimation sample. In the case of GLM, we have it defined in the family which, I think, could be easily reused.
The text was updated successfully, but these errors were encountered:
temporary bump for 0.14, even though it's labelled as FAQ
We should get some examples for out of sample eval_measures with GLM (using families) and others using get_distribution.
For newer count models like ZI, I used in-sample predictive distribution fit measures as model selection criteria.
Something similar could be applied to out of sample predictive fit measures.
My notebook examples have model/distribution selection and not variable selection or hyperparameter selection.
So far we have very little support for cross-validation build in. GAM PR #2435 adds cross-validation for penalized GLM.
related issues #1577 #2027 #1282
#2435 (comment)
General question: What are the performance criteria for general model, especially generalized linear models?
We have
tools.eval_measures
which only includes measures targeted toward continuous variables.sklearn http://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics has a large number for classification and others, but not count variables.
a few links to related discussions
https://stat.ethz.ch/R-manual/R-devel/library/boot/html/cv.glm.html doesn't say much about
cost
, use thresholded 0-1 prediction for Binomial/Binaryhttp://stats.stackexchange.com/questions/48811/cost-function-for-validating-poisson-regression-models related question for Poisson, comment mentions
cv.glmnet
uses deviancehttp://stackoverflow.com/questions/23904179/calculate-cross-validation-for-generalized-linear-model-in-matlab matlab question, not discussion about
cost
function (example for interface)http://www.uni-kiel.de/psychologie/rexrepos/posts/crossvalidation.html binary only uses Brier score
(looks like a nice collection of examples)
general
(my intuition, no references, I don't remember reading anything specific)
objective: estimation of model versus prediction
In prediction like scikit learn or Kaggle competition, the objective for cross-validation is based on a subject/application specific cost/utility function. This does not need to be consistent with the objective function of the estimator. (But then maybe we should change the objective of the estimator also for parameters and not just for hyper parameters.)
For model estimation, I think we should use the same criteria for the hyper parameters as for the standard parameters. In the GLM case this would be deviance or loglike. In the linear model case it is mean squared error which is equivalent to the loglike (for fixed standard deviation ?).
Another possibility (I'm not sure) would be to look at the score_obs/estimating equations.
Implementation for GLM
In general, we don't have access to loglike_obs to evaluated at observations that are not in the estimation sample. In the case of GLM, we have it defined in the
family
which, I think, could be easily reused.The text was updated successfully, but these errors were encountered: