documentation: np.mean(y_pred) does not work #3

murphyk · 2019-02-16T18:45:32Z

I'm following along with this intro example.. However this line fails

(numpy.mean(y_pred) * 2).shape

Error below (seems to be because Distribution objects don't support the mean() function but instead insist on obscurely calling it point!)

np.mean(y_pred)
Traceback (most recent call last):

  File "<ipython-input-38-19819be87ab5>", line 1, in <module>
    np.mean(y_pred)

  File "/home/kpmurphy/anaconda3/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 2920, in mean
    out=out, **kwargs)

  File "/home/kpmurphy/anaconda3/lib/python3.7/site-packages/numpy/core/_methods.py", line 75, in _mean
    ret = umr_sum(arr, axis, dtype, out, keepdims)

TypeError: unsupported operand type(s) for +: 'Distribution' and 'Distribution'

The text was updated successfully, but these errors were encountered:

frthjf · 2019-02-16T22:12:36Z

Thanks for flagging this up. You are right, the problem is that np.mean(y_pred) forwards to y_pred.mean which is not implemented in the baseline distribution object. np.std(y_pred) works as expected, I'll update that.

I guess the other question is why the abstract Distribution object implements an slightly obscure point prediction method rather than calling it mean in the first place to make it automatically compatible with numpy. I think the reasoning was to suggest that it corresponds to the non-probabilistic point prediction of a standard sklearn estimator and to avoid presuming the existence of a mean (if it happens to be a Cauchy etc). On the other hand, it also implements a std method which might also not exist, so renaming point to mean might be more straightforward after all. @fkiraly what do you think?

fkiraly · 2019-02-16T23:20:42Z

Regarding point vs mean - I think this is a valid point. I mean, no pun intended, it's just the means to point this out.
Since, e.g., for Cauchy you wouldn't have a mean but only a median which might be the reasonable point prediction. Or, discrete predictions would only have a mode.

However, I personally think that all continuous distributions which support it should have a mean for the reason that this is an obvious method one would be looking for, especially if one is familiar with the scipy.stats distributions, see e.g.,
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.html
Point may then call the mean, or another method, by default, but it's neat to have it as it automatically gives you a point prediction in all cases.

More generally to this point: as @murphyk points out in separate communication, for future development one might consider using scipy.stats distributions as return objects - similar to https://github.com/alan-turing-institute/MLJ.jl which uses https://github.com/JuliaStats/Distributions.jl as return objects (upon suggestion based on the skpro design, ahem,
JuliaAI/MLJ.jl#34
) without major problems so far (unless @ablaom from MLJ.jl wants to contradict this).

frthjf · 2019-02-18T22:11:13Z

Fixed in version v1.0.1. On using scipy.stats, how would this look like for non-standard distributions, e.g. the Bayesian case where we only obtain a posterior sample? Can scipy stats represent that? Feel free to open another issue to continue the discussion on that topic.

frthjf added a commit that referenced this issue Feb 18, 2019

Add regression test for #3

fadf991

frthjf closed this as completed Feb 18, 2019

fkiraly mentioned this issue Feb 18, 2019

Distributions as return objects #4

Closed

frthjf mentioned this issue Apr 11, 2020

skpro-refactoring (version-2) #6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

documentation: np.mean(y_pred) does not work #3

documentation: np.mean(y_pred) does not work #3

murphyk commented Feb 16, 2019

frthjf commented Feb 16, 2019

fkiraly commented Feb 16, 2019

frthjf commented Feb 18, 2019

documentation: np.mean(y_pred) does not work #3

documentation: np.mean(y_pred) does not work #3

Comments

murphyk commented Feb 16, 2019

frthjf commented Feb 16, 2019

fkiraly commented Feb 16, 2019

frthjf commented Feb 18, 2019