Prevent zero-variance instability in BaseProbaRegressor.predict_proba#956
Prevent zero-variance instability in BaseProbaRegressor.predict_proba#956kindler-king wants to merge 1 commit intosktime:mainfrom
Conversation
There was a problem hiding this comment.
I would say this is a hack. Instead of clipping it, I would instead return a Delta distribution if the variance is below machine epsilon (possibly times a factor).
Also, code formatting tests are failing. Please look at the dev guide, and pre-commit.
8fc8178 to
8c79ee4
Compare
|
Hello @fkiraly , thanks for the suggestion, I’ve updated the implementation.
For the mixed case, I explored returning per-row heterogeneous distributions, but according to my understanding, skpro doesn’t support that (no concat in So for now, I fall back to Also fixed formatting issues using pre-commit. Would love your feedback on this approach. |
Reference Issues/PRs
Fixes #955
What does this implement/fix?
This PR fixes a numerical instability in
BaseProbaRegressor.predict_proba.When
predict_varreturns 0, the fallback Normal distribution is constructed withsigma=0, which leads to divide-by-zero warnings and NaN values when evaluating pdf or log_pdf.To prevent this, the predicted variance is clipped to machine epsilon before computing the standard deviation:
pred_var = np.clip(pred_var, np.finfo(float).eps, None)This ensures the resulting Normal distribution always has a strictly positive scale while leaving normal model outputs effectively unchanged.
Does your contribution introduce a new dependency?
No.
What should a reviewer concentrate their feedback on?
Did you add any tests for the change?
Yes.
A regression test was added that uses a mock regressor returning zero variance and verifies that:
predict_proba().pdf()andlog_pdf()remain finiteno numerical warnings are raised