You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I didn't realize that my original data is ill conditioned, it's badly scaled and has high multicollinearity? What can I do?
current context: multicollinearity diagnostics for the model.exog which might be badly parameterized.
i.e. we don't want "theoretical" diagnostics that use some transformed data, but we want to know what the problems with our actual exog is. #2380
example: vif based on correlation matrix ignores multicollinearity or problems with the constant. OLS cond. number is based on exog and sensitive to scaling.
Belsley 1980 has an example where the constant is almost collinear with a exog with large mean and small variance.
Nist test cases has a similar example with small coefficient of variation. And another example with badly scaled polynomials.
We want to alert users to those problems or make it easy for them to check. e.g. #1908 for adding sanity checks
However, Belsley argues that fundamental ill conditioning cannot be removed by transformation. If the transformation removes ill conditioning, then the transformation itself is ill conditioned, i.e. we just shift the ill conditioning from the model to the transformation.
second: numerical problems in optimization, example Poisson with large values of exog #1131 #1715 #3925 #1131 #1699#4577 StandardizeTransform fit_transformed ?
Note: in nonlinear model like GLM and discrete X could be reasonably well behave, but the nonlinear transform creates numerical problems, e.g. exp under or overflow.
In this case, transforming exog improves convergence of the optimizer and doesn't just shilft ill conditioning.
current plans
functions to identify multicollinearity, condition index, vif and similar
quick summary including bad scaling, e.g. min/max, coefficient of variation of data.
multicollinearity measure both on data/exog and on model and results, score_obs, hessian and cov_params
later:
look into fit_transformed for GLM and discrete again.
alternative estimator, eg. ridge, penalized, Firth
note: Firth includes properties of data including endog
As in perfect separation, (almost) empty cells, there is the additional separate issue whether there is enough (or too much) information about the relationship between exog and endog.
related: perfectly collinear variables or unidentified parameters. What can we still infer?
e.g. is_estimable#6271
The text was updated successfully, but these errors were encountered:
I didn't realize that my original data is ill conditioned, it's badly scaled and has high multicollinearity? What can I do?
current context: multicollinearity diagnostics for the model.exog which might be badly parameterized.
i.e. we don't want "theoretical" diagnostics that use some transformed data, but we want to know what the problems with our actual
exog
is. #2380example: vif based on correlation matrix ignores multicollinearity or problems with the constant. OLS cond. number is based on exog and sensitive to scaling.
Belsley 1980 has an example where the constant is almost collinear with a exog with large mean and small variance.
Nist test cases has a similar example with small coefficient of variation. And another example with badly scaled polynomials.
We want to alert users to those problems or make it easy for them to check. e.g. #1908 for adding sanity checks
However, Belsley argues that fundamental ill conditioning cannot be removed by transformation. If the transformation removes ill conditioning, then the transformation itself is ill conditioned, i.e. we just shift the ill conditioning from the model to the transformation.
second: numerical problems in optimization, example Poisson with large values of exog
#1131
#1715
#3925
#1131
#1699 #4577 StandardizeTransform
fit_transformed
?#2062
Note: in nonlinear model like GLM and discrete X could be reasonably well behave, but the nonlinear transform creates numerical problems, e.g. exp under or overflow.
In this case, transforming exog improves convergence of the optimizer and doesn't just shilft ill conditioning.
current plans
functions to identify multicollinearity, condition index, vif and similar
quick summary including bad scaling, e.g. min/max, coefficient of variation of data.
multicollinearity measure both on data/exog and on model and results, score_obs, hessian and cov_params
later:
look into
fit_transformed
for GLM and discrete again.alternative estimator, eg. ridge, penalized, Firth
note: Firth includes properties of data including endog
As in perfect separation, (almost) empty cells, there is the additional separate issue whether there is enough (or too much) information about the relationship between exog and endog.
related: perfectly collinear variables or unidentified parameters. What can we still infer?
e.g.
is_estimable
#6271The text was updated successfully, but these errors were encountered: