Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SUMM/ENH: get_prediction across models #7833

Open
josef-pkt opened this issue Nov 1, 2021 · 14 comments
Open

SUMM/ENH: get_prediction across models #7833

josef-pkt opened this issue Nov 1, 2021 · 14 comments

Comments

@josef-pkt
Copy link
Member

josef-pkt commented Nov 1, 2021

get_prediction outside tsa

current:

  • OLS, linear_model: linear prediction with confidence and prediction interval
    • code in regression._prediction
  • GLM linear prediction and link.inverse transformed
    • code in genmod._prediction
  • discrete.Poisson: follows GLM pattern reuses genmod code
  • GAM: delegates to GLM, except of spline handling

no which yet: get_prediction so far is only for mean, expectation of endog E(y | x)

algorithm or method cases, all based on cov_params

  • linear prediction: simple linear transformation of params
    • prediction interval includes var_resid
  • link.inverse function: delta method for se, endpoint transformation for conf_int
    • current code in genmod._prediction
    • this can be extended to multi-link models where mean is orthogonal to other params
    • can be extended to other univariate monotonic functions beside link_mean, e.g. prob in count models
      no generic function yet
  • arbitrary function, non-separable multi-link models: need to use delta method for asymptotic results
  • prediction interval for models with non-additive error term (not sure yet)
    • distribution interval without parameter uncertainty. i.e. get_distribution(...).ppf(alpha)
    • distribution interval with parameter uncertainty: ??? messy
  • bootstrap, simulated (no specific plans yet)
    • standard bootstrap, e.g. resample obs or parametric residual bootstrap
    • simulate from asymptotic distribution of params (I think I have old experiments for this in Poisson case)
  • bias correction: (nothing specific yet)
    • correct mean prediction of nonlinear function for curvature bias (Jensen's inequality), e.g. use second order correction using var. Related: lognormal mean prediction and "smearing" adjustment in nonlinear/log predictions. (aside in TSA the results for correction of mean prediction for log-models is not overwhelmingly in favor.)

For monotonic functions and links, we need reusable get_prediction for linear prediction.
Reusable means both no code duplication and no recomputation. The latter would mean that we can have different monotonic predictions using the same linear get_prediction. However, get_prediction should be relatively cheap.

To check: there might be some overlap with margins (after the at part, margins and predict_at add construction of exog_predict.

related issues and comments:
TBC

cases to start with

  • which in Poisson for monotonic, especially which="prob"
  • mean and which in NBP (and GPP) for non-monotonic, multi-parameter models (using NonlinearDeltaCov)
  • multi-link models, might be also mostly the same as using NonlinearDeltaCov
@josef-pkt
Copy link
Member Author

related usecase: test for prediction with non-zero value

e.g. test/pvalue, diff and confint that average prob(y=0) is equal to observed relative frequency at zero
This would be a wald test version for zero inflation or deflation with wald confidence intervals.
We can do score_test (at no inflation hypothesis: diff=0), but we don't have score_test confidence intervals for Prob(y=0) or diff

@josef-pkt
Copy link
Member Author

current get_prediction does not have a good repr.
maybe just add a text and point to summary_frame()

pred = res_poi.get_prediction(res_poi.model.exog[:5]).summary_frame()

recently added PoissonResults.get_prediction has no docstring
help(res_poi.get_prediction) only shows signature

@josef-pkt
Copy link
Member Author

example: Poisson prediction confint with endpoint transformation versus delta-method

pred = res_poi.get_prediction(res_poi.model.exog[:5])

print(pred.summary_frame())
       mean   mean_se  mean_ci_lower  mean_ci_upper
0  2.479438  0.045983       2.390931        2.57122
1  2.479438  0.045983       2.390931        2.57122
2  2.479438  0.045983       2.390931        2.57122
3  2.479438  0.045983       2.390931        2.57122
4  2.479438  0.045983       2.390931        2.57122

def f_pred(p):
    ex = res_poi.model.exog[:5]
    pred1 = res_poi.model.predict(p, ex)#, which="mean")
    return pred1
​
nlpm = res_poi._get_wald_nonlinear(f_pred)
print(nlpm.summary())
                             Test for Constraints                             
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
c0             2.4794      0.046     53.921      0.000       2.389       2.570
c1             2.4794      0.046     53.921      0.000       2.389       2.570
c2             2.4794      0.046     53.921      0.000       2.389       2.570
c3             2.4794      0.046     53.921      0.000       2.389       2.570
c4             2.4794      0.046     53.921      0.000       2.389       2.570
==============================================================================

@josef-pkt
Copy link
Member Author

great,
complete code reuse, generic get_prediction that delegates to model.predict and numdiff
(I didn't verify the numbers yet)

The model in this example is ZeroInflatedPoisson with constant inflation
(we wouldn't need the ex for insample prediction)

res_ = res_zi
use_mean = True
for which in ["mean", 'mean-main', 'mean-nonzero']:
​
    def f_pred(p):
        ex = res_.model.exog
        pred = res_.model.predict(p, ex, which=which)
        pred = pred.mean(0)
        return pred
​
    nlpm = res_zi._get_wald_nonlinear(f_pred)
    print("\n", which)
    print(nlpm.summary())

 mean
                             Test for Constraints                             
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
c0             2.7839      0.018    154.147      0.000       2.748       2.819
==============================================================================

 mean-main
                             Test for Constraints                             
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
c0             3.9169      0.017    225.684      0.000       3.883       3.951
==============================================================================

 mean-nonzero
                             Test for Constraints                             
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
c0             4.0143      0.016    248.421      0.000       3.983       4.046
==============================================================================

@josef-pkt
Copy link
Member Author

expected (relative) frequency "prob" also works, but we currently have no option to choose for which y=k to predict nor to limit those with an upper bound

but 2-d case, e.g. predict prob without the mean(0) only works partially

res_ = res_zi
def f_pred(p):
    ex = res_.model.exog.copy()
    pred1 = res_.model.predict(p, ex, which="prob")
    diff = pred1.mean(0)[:15
res_ = res_zi
def f_pred(p):
    ex = res_.model.exog.copy()
    pred1 = res_.model.predict(p, ex, which="prob")
    diff = pred1.mean(0)[:15]
    return diff
​
nlpm = res_zi._get_wald_nonlinear(f_pred)
print(nlpm.summary())
                             Test for Constraints                             
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
c0             0.3091      0.003     93.799      0.000       0.303       0.316
c1             0.0667      0.001     73.540      0.000       0.065       0.068
c2             0.1155      0.001    113.861      0.000       0.113       0.117
c3             0.1371      0.001    178.182      0.000       0.136       0.139
c4             0.1260      0.001    180.489      0.000       0.125       0.127
c5             0.0962      0.001    129.210      0.000       0.095       0.098
c6             0.0639      0.001     96.940      0.000       0.063       0.065
c7             0.0383      0.000     79.055      0.000       0.037       0.039
c8             0.0215      0.000     68.047      0.000       0.021       0.022
c9             0.0116      0.000     59.423      0.000       0.011       0.012
c10            0.0062      0.000     50.330      0.000       0.006       0.006
c11            0.0034   8.31e-05     40.523      0.000       0.003       0.004
c12            0.0019   5.88e-05     31.768      0.000       0.002       0.002
c13            0.0011   4.24e-05     25.111      0.000       0.001       0.001
c14            0.0006   3.04e-05     20.410      0.000       0.001       0.001
==============================================================================

@josef-pkt
Copy link
Member Author

more design choices

We can use endpoint transformation conf_int in models that have a mean function that depends on only mean parameters even if there are extra params. That follows from the gradient that has zeros at the extra params.
(Although I didn't check whether that is also true for higher order effects.)

So it would be convenient to have the option to compute both endpoint transformation and delta method confints in one function or method.

The main problem is that we need the nonlinear predict function in terms of linpred, appropriate linpred in multilink cases (e.g. BetaBinomial precision function only depends directly on linpred_precision)
For delta method we can just reuse the model.predict as in the previous issue.

One way to implement this is to go through dfamilies, i.e. we have a predict method that only depends on the distribution/family dargs. In GLM that's mu, even though we need to take scale as auxiliary parameter into account in some cases.

We could delegate to a version of statsmodels.statsmodels.genmod._prediction.PredictionResult. We need to generalize it so it doesn't use a link instance. The usual problem, links are back-to-front.

The main practical difference between PredictionResult and NonlinearDeltaCov is that the former is for scalar parameter, the latter is for multivariate/vector parameters.
I guess it's better to keep them separate.

Also, we have to decide whether only predict which="mean" gets endpoint transformation are all monotonic function of a single linpred. Extending it to all monotonic one parameter functions will be a lot more work.
One disadvantage of delta method is that the confint might not stay in the valid range of the nonlinear function, e.g. negative probabilities or counts.

Decision for now:

  • for which = "mean", we return EP confint, we can use link and existing genmod PredictionResults (as I added to Poisson)
  • for all other which we NonlinearDeltaCov.
  • Models that don't have a simple, single index mean function, like ZI models, only get NonlinerDeltaCov
    (for now, some predict functions are still monotonic one parameter functions.)

Aside: ZI has submodels, and we could delegate predict/confint for, e.g. prob-main and mean-main to the submodels

@josef-pkt
Copy link
Member Author

josef-pkt commented Nov 6, 2021

And now: where do we add the methods?

  • which predict option is currently only available in CountModels (except NB), in BetaModel and in OrderedModel
  • theoretically get_prediction should LikelihoodModelResults, but that will not work for many models for now
  • adding to DiscreteModelResults requires that we also support BinaryModel (MNLogit ?)
    • what other which do we have for BinaryModel.predict? (*)
    • mean and linpred don't need NonlinearDeltaCov
  • GLM currently has only mean and linear prediction
  • OrderedModel predict default is prob which is multivariate. I could override an inherited get_prediction and raise NotImplementedError as long as this is not adjusted and verified for it.

current plans
make standalone functions (one for linpred/mean-link and one for NonlinearDeltaCov) and start with adding it to

  • GenericLikelihoodModelResults for BetaReg and maybe OrderedModel
  • add to CountModelResults (and later move to DiscreteModel), ZI needs to override for which="mean"

standalone function and explicit methods with delegation look less fragile than a specific Mixin.
update I think I will collect those function in base._prediction or base._predict_inference, consolidating and generalizing code that is in genmod and regression _prediction.
NonlinearDeltaCov and stay in stats. It might be useful even outside of models.
Maybe we keep linear model separate, That's the only one with conf_int_obs prediction intervals

(*) just an idea:
diagnostic plot for dispersion: plot squared pearson residual against fittedvalues with lowess curve, and model implied confidence interval. (for families with fixed scale)

residuals are a nonlinear function of params, so we should be able to compute confint for statistics of them.
I guess delta method on an "overall" or mean over observations prediction takes correlation coming from params into account, i.e. resid_pearson are correlated across observations, but deltamethod on resid_pearson2 mean() or resid_pearson2 .sum()/df_resid should be appropriate.

@josef-pkt
Copy link
Member Author

josef-pkt commented Nov 6, 2021

Also we should remove duplication in handling predict keywords, offset, exposure, ... That's still messy with several predict, get_distribution, ... methods.

edit both get_prediction (regression) and get_prediction_glm (in genmod) have duplicate or similar code to handle exog, transform, weights, ...,
The part except formula transform is similar in model.predict (e.g. offset, ... in GLM)
Formula/transform overlaps with base.model.Results.predict

Also, I haven't looked into pandas return and getting row index for get_prediction.
Even once we have the correct numbers, there is still some api work to do.

@josef-pkt
Copy link
Member Author

For GenericLikelihood we need a flag to decide whether we can use endpoint transformation for predict if which="mean".
Also "mean" might not be a supported and allowed which value, and model.predict will raise.

@josef-pkt
Copy link
Member Author

unit testing for delta method based results

Stata:
predict does not add params_table, se, confint
margins adds params_table with conf_int, includes predicted mean as option
predict_nl also uses delta method for params_table, but requires user specified function

Currently, I'm using unit tests mainly against other cases or methods that already exist in statsmodels and have their own unit tests.

@josef-pkt
Copy link
Member Author

josef-pkt commented Nov 7, 2021

exog handling in _prediction.get_prediction_glm looks good. That part had several bug fixes for pandas support.

genmod get_prediction has a few problems #7864 , but I think it can be used for a more general version.
We can move genmod._prediction to base and adjust.

The only backwards incompatible refactoring is that I want to drop mean post fix in PredictionResults methods and attribute. It should be for any type of prediction, not just mean.
Although, the endpoint transformation is for now only for the mean. ???

I guess I keep it for mean.
And check it for consistency with the results class for NonlinearDeltaCov, which doesn't have pandas handling for row_names. Or write a PredictionResults wrapper that delegates to NonlinearDeltaCov

@josef-pkt
Copy link
Member Author

usecase, example https://stackoverflow.com/questions/70038929/confidence-interval-of-probability-prediction-from-logistic-regression-with-inte

get_prediction as a function of a continuous exog, for each level of a binary exog

@josef-pkt
Copy link
Member Author

josef-pkt commented Nov 23, 2021

more design decisions: get_prediction Mixin class or separate functions to delegate to:

next round is get_prediction for BetaReg.
The get_prediction method of DiscreteResults needs only minor changes to make it work for BetaReg, so I need to add get_prediction to GenericLikelihoodModelResults.

If we add more get_prediction_xxx #7899, then a Mixin would be better.
To use it with a model that does not support it yet, the standalone function would be more convenient (e.g. I often check get_xxx(res) with arbitrary res, or class like MLEInfluence(res), PoissonDiagnostic(res) where res is not from a supported Model)

???
no PredictMixin for now. I just move DiscreteResults.get_prediction to a private method in base._prediction_inference

Also, to avoid code duplication, there is now a lot of indirection and delegation, especially for prediction with NonlinearDeltaCov.
Once everything works, we might want to check whether we can streamline this and avoid some detours.

@josef-pkt
Copy link
Member Author

josef-pkt commented Jan 18, 2022

we also need predict_nonlinear or better get_prediction_nonlinear, mainly to have a results class that is consistent with get_prediction

example right now: diagnostic statistic based on two averages
in truncated model we want to compare P(y=0 | zero_model) with P(y=0 | count model)
but I think now to best statistic would be the ratio of averages and not the average ratio, i.e.
mean(P(y_i=0 | zero_model)) / mean(P(y=0 | count model))

current average option in get_prediction only allows average over observation of a single statistic, e.g.
mean(P(y_i=0 | zero_model) / P(y=0 | count model))

That's similar to average risk ratio in the literature which is defined as ratio of averages, and not as average of ratios

If we have a generic predict_nonlinear that takes user specified functions, then we could delegate to it in get_prediction for hardcode options of which, which = prob-zero-ratio

We could delegate to the generic _get_wald_nonlinear, but that returns a different results class, and not a PredictionResults class.
unless we do something different from PredictioResults when we only have a single statistic to predict
PredictionResults works for vector and for scalar predictions in the same way.


Alternative option:
Should model.predict also optionally return aggregated variables similar to get_prediction average=True?
I think, then we don't need to change get_prediction at all (except adding keywords or which options), and model predict would handle how a statistic is aggregated/averaged.
I guess pandas handling in results.predict assumes simple vectorization, i.e. index of predict return corresponds to index of predict exog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant