Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

documentation: np.mean(y_pred) does not work #3

Closed
murphyk opened this issue Feb 16, 2019 · 3 comments
Closed

documentation: np.mean(y_pred) does not work #3

murphyk opened this issue Feb 16, 2019 · 3 comments

Comments

@murphyk
Copy link

murphyk commented Feb 16, 2019

I'm following along with this intro example.. However this line fails

(numpy.mean(y_pred) * 2).shape

Error below (seems to be because Distribution objects don't support the mean() function but instead insist on obscurely calling it point!)

np.mean(y_pred)
Traceback (most recent call last):

  File "<ipython-input-38-19819be87ab5>", line 1, in <module>
    np.mean(y_pred)

  File "/home/kpmurphy/anaconda3/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 2920, in mean
    out=out, **kwargs)

  File "/home/kpmurphy/anaconda3/lib/python3.7/site-packages/numpy/core/_methods.py", line 75, in _mean
    ret = umr_sum(arr, axis, dtype, out, keepdims)

TypeError: unsupported operand type(s) for +: 'Distribution' and 'Distribution'
@frthjf
Copy link
Collaborator

frthjf commented Feb 16, 2019

Thanks for flagging this up. You are right, the problem is that np.mean(y_pred) forwards to y_pred.mean which is not implemented in the baseline distribution object. np.std(y_pred) works as expected, I'll update that.

I guess the other question is why the abstract Distribution object implements an slightly obscure point prediction method rather than calling it mean in the first place to make it automatically compatible with numpy. I think the reasoning was to suggest that it corresponds to the non-probabilistic point prediction of a standard sklearn estimator and to avoid presuming the existence of a mean (if it happens to be a Cauchy etc). On the other hand, it also implements a std method which might also not exist, so renaming point to mean might be more straightforward after all. @fkiraly what do you think?

@fkiraly
Copy link
Collaborator

fkiraly commented Feb 16, 2019

Regarding point vs mean - I think this is a valid point. I mean, no pun intended, it's just the means to point this out.
Since, e.g., for Cauchy you wouldn't have a mean but only a median which might be the reasonable point prediction. Or, discrete predictions would only have a mode.

However, I personally think that all continuous distributions which support it should have a mean for the reason that this is an obvious method one would be looking for, especially if one is familiar with the scipy.stats distributions, see e.g.,
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.html
Point may then call the mean, or another method, by default, but it's neat to have it as it automatically gives you a point prediction in all cases.

More generally to this point: as @murphyk points out in separate communication, for future development one might consider using scipy.stats distributions as return objects - similar to https://github.com/alan-turing-institute/MLJ.jl which uses https://github.com/JuliaStats/Distributions.jl as return objects (upon suggestion based on the skpro design, ahem,
JuliaAI/MLJ.jl#34
) without major problems so far (unless @ablaom from MLJ.jl wants to contradict this).

frthjf added a commit that referenced this issue Feb 18, 2019
@frthjf
Copy link
Collaborator

frthjf commented Feb 18, 2019

Fixed in version v1.0.1. On using scipy.stats, how would this look like for non-standard distributions, e.g. the Bayesian case where we only obtain a posterior sample? Can scipy stats represent that? Feel free to open another issue to continue the discussion on that topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants