Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stats: Added a standard function mean to compute arithmetic average #16314

Closed
wants to merge 1 commit into from

Conversation

supreet11agrawal
Copy link
Contributor

Till now, mean in Sympy could be found using E(expectation).
Keeping end users in mind, added a function mean which essentially
does the same thing but can be more easy to understand for new users.

References to other Issues or PRs

Brief description of what is fixed or changed

Other comments

I am not sure about the change in init.py https://github.com/supreet11agrawal/sympy/blob/7561a336c4923e0dfcac8c11475c9efd5acaf52e/sympy/stats/__init__.py#L20
This can be removed as well. This is added here to show functioning

Release Notes

  • stats
    • Added a standard function mean for computing arithmetic average

Till now, mean in Sympy could be found using `E`(expectation).
Keeping end users in mind, added a function `mean` which essentially
does the same thing but can be more easy to understand for new users.
@sympy-bot
Copy link

Hi, I am the SymPy bot (v142). I'm here to help you write a release notes entry. Please read the guide on how to write release notes.

Your release notes are in good order.

Here is what the release notes will look like:

This will be added to https://github.com/sympy/sympy/wiki/Release-Notes-for-1.4.

Note: This comment will be updated with the latest check if you edit the pull request. You need to reload the page to see it.

Click here to see the pull request description that was parsed.

Till now, mean in Sympy could be found using `E`(expectation).
Keeping end users in mind, added a function `mean` which essentially
does the same thing but can be more easy to understand for new users.

<!-- Your title above should be a short description of what
was changed. Do not include the issue number in the title. -->

#### References to other Issues or PRs
<!-- If this pull request fixes an issue, write "Fixes #NNNN" in that exact
format, e.g. "Fixes #1234". See
https://github.com/blog/1506-closing-issues-via-pull-requests . Please also
write a comment on that issue linking back to this pull request once it is
open. -->


#### Brief description of what is fixed or changed


#### Other comments
I am not sure about the change in `init.py` https://github.com/supreet11agrawal/sympy/blob/7561a336c4923e0dfcac8c11475c9efd5acaf52e/sympy/stats/__init__.py#L20
This can be removed as well. This is added here to show functioning

#### Release Notes

<!-- Write the release notes for this release below. See
https://github.com/sympy/sympy/wiki/Writing-Release-Notes for more information
on how to write release notes. The bot will check your release notes
automatically to see if they are formatted correctly. -->

<!-- BEGIN RELEASE NOTES -->
* stats
  * Added a standard function `mean` for computing arithmetic average
<!-- END RELEASE NOTES -->

@supreet11agrawal supreet11agrawal changed the title Stats: Added a standar function mean to compute arithmetic average Stats: Added a standard function mean to compute arithmetic average Mar 18, 2019
@codecov
Copy link

codecov bot commented Mar 18, 2019

Codecov Report

Merging #16314 into master will increase coverage by 0.015%.
The diff coverage is 100%.

@@              Coverage Diff              @@
##            master    #16314       +/-   ##
=============================================
+ Coverage   73.257%   73.272%   +0.015%     
=============================================
  Files          618       618               
  Lines       158200    158201        +1     
  Branches     37175     37175               
=============================================
+ Hits        115893    115918       +25     
+ Misses       36783     36761       -22     
+ Partials      5524      5522        -2

@@ -34,6 +35,8 @@
35/6
>>> simplify(P(Z>1)) # Probability of Z being greater than 1
1/2 - erf(sqrt(2)/2)/2
>>> mean(X) # Average value of outcome of dice
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would merge this with the E(X+y) as

E(X + Y)  # or mean(X + Y), the expected average of two die

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this can be done. But a specific example should also be present, I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tell me if it is better this way

@oscarbenjamin
Copy link
Contributor

There are other things that we might want use the name mean for in future e.g. somthing analogous to np.mean. It doesn't seem very useful to use the name for something redundant like this.

@supreet11agrawal
Copy link
Contributor Author

There are other things that we might want use the name mean for in future e.g. somthing analogous to np.mean. It doesn't seem very useful to use the name for something redundant like this.

Hmm.., maybe mean will be more useful in that case. But I think a function similar to this must be present in stats. 'Mean', the word itself is quite important in statistics field.

@supreet11agrawal
Copy link
Contributor Author

Maybe we can have some difference like Mean and mean. Can be confusing though

@oscarbenjamin
Copy link
Contributor

There already is E for expectation which is the standard name for this sort of thing. "Mean" isn't normally used in an algebraic context. Functions called mean are widely used when working with data though i.e. for the sample mean.

@supreet11agrawal
Copy link
Contributor Author

Functions called mean are widely used when working with data though i.e. for the sample mean.

I agree with you. Although, don't you think that 'data' processing should be part of stats module as well? In that case, mean would have the same purpose. For general data, we can say that it is uniformly distributed and can thus calculate its mean.
I am not sure though.

@smichr
Copy link
Member

smichr commented Mar 19, 2019

I think I, too, am leaning towards not making this change in favor of "preferably one way" and (as @oscarbenjamin points out) "expectation" is the standard word in this context. A note in the docstring about what E is might be helpful: "E - the expectation (mean) value of the distribution" or words to that effect.

@oscargus
Copy link
Contributor

I had a similar feeling and strictly it looks like it is OK to use mean for expectation value, but as noted, it is maybe more common to use mean for average value of a series of samples (now, there is an alternative word to use for mean, average, although I guess most people would expect mean to be the average value of samples as in e.g. Matlab). On the other hand, one can easily check the argument to determine if it is samples or a distribution.

With that said: feel free to write a function that computes the average value of a list (or Matrix in a given dimension). It may be useful for certain situations I guess, even from a symbolic perspective.

@supreet11agrawal
Copy link
Contributor Author

So, I will close this PR. We can add the change in the docs in another PR.

@oscargus
Copy link
Contributor

Naturally, this mean function can return the expectation value if the input is not a list but a distribution. But it would make sense to primarily use it for computing the average of a list/matrix(iterable?).

@supreet11agrawal
Copy link
Contributor Author

I'll wait for 24Hrs if anyone has to say anything.

@oscarbenjamin
Copy link
Contributor

I think mean of an array makes sense but not mean of a matrix.

@supreet11agrawal
Copy link
Contributor Author

Yes, we can add that. But where would this function go?

@supreet11agrawal
Copy link
Contributor Author

@oscarbenjamin It might make sense if we implement it dimension wise. For eg. one may find mean of all the rows. (The result will be a list in that case)

@supreet11agrawal
Copy link
Contributor Author

See this. Might be of some help in this case
https://www.mathworks.com/help/matlab/ref/mean.html

@asmeurer
Copy link
Member

Although, don't you think that 'data' processing should be part of stats module as well?

This tends to be out of scope for SymPy. Generally data is purely numeric, in which case a library like scipy.stats is much better. SymPy should focus on symbolic manipulations. See also #14261.

@supreet11agrawal
Copy link
Contributor Author

Ok then I think we can close this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants