Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: stats.trap - adding trapezoidal distribution closes #6028 #6030

Merged
merged 12 commits into from
Apr 9, 2016

Conversation

andyfaff
Copy link
Contributor

@andyfaff andyfaff commented Apr 4, 2016

Adds a trapezoidal distribution to scipy.stats. For those au-fait with the stats module please inform me of the relevant tests I need to add. The testing framework seems to be automatic, but the test_stats.py is mammoth, so I don't know where to start.
Possible work to be done (to satisfaction of stats guru):

  • figure out the relevant statistics for the _stats method.
  • check that the trapezoidal distribution reduces to the uniform and triangular distributions.
  • add other tests
  • add further documentation.

@andyfaff andyfaff changed the title ENH: stats.trap - adding trapezoidal distribution ENH: stats.trap - adding trapezoidal distribution closes #6028 Apr 4, 2016
@andyfaff
Copy link
Contributor Author

andyfaff commented Apr 4, 2016

I noticed that _distr_params.py has all the continuous distributions listed. I can add trap there, but it's not clear what one is supposed to add. The file states "Sane parameters for stats.distributions.", but that comment is fairly cryptic.

@ev-br
Copy link
Member

ev-br commented Apr 4, 2016

Parameters: adding an entry to _distr_params.py runs this battery of tests: https://github.com/scipy/scipy/blob/master/scipy/stats/tests/test_continuous_basic.py#L72
Re what to add: ['trap', (1, 2)] would mean c=1, d=2.
These values will also be used for constructing the docstring example (this is what fails on Travis now).
(If you have an idea of how to document it better, please go ahead)

Additional tests (+1 for limiting cases) go to test_distributions.py, which has roughly a test case class per distribution.

"""
def _rvs(self, c, d):
rands = self._random_state.rand(self._size)
return self._ppf(rands, c, d)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ev-br
Copy link
Member

ev-br commented Apr 4, 2016

figure out the relevant statistics for the _stats method.

You can just define _munp(self, n, c, d) to compute n-th central moment. Or leave it be --- this would be nice to have, but not absolutely necessary IMO.

@ev-br ev-br added the enhancement A new feature or improvement label Apr 4, 2016
x**2 / c / (d - c + 1),
(c + 2 * (x - c)) / (d - c + 1),
1 - ((1 - x)**2 / (d - c + 1) / (1 - d))]
return np.select(condlist, choicelist)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.select is cute --- does it handle x, c or d being arrays?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best solution found so far. It handles x being array or scalar, but c and d have to be scalar.
I had a few mental gymnastics figuring out how to do this without examining if x was array or scalar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One simple option is of course just two calls to np.where.

@andyfaff
Copy link
Contributor Author

andyfaff commented Apr 5, 2016

@ev-br I added sane values to the trap entry in _distr_params.py (so the doctests would pass) but now I'm getting a weird error, see below. Why on earth is it generating a shape tuple of (1.8589710689636982, 1.461789732784883) instead of (0.2, 0.8) like I specified in _distr_params.py?

ERROR: test_distributions.test_all_distributions('trap', (1.8589710689636982, 1.461789732784883), 0.01)

Traceback (most recent call last):
File "/Users/anz/miniconda3/envs/dev3/lib/python3.4/site-packages/nose/case.py", line 198, in runTest
self.test(_self.arg)
File "/Users/anz/Documents/Andy/programming/scipy/build/testenv/lib/python3.4/site-packages/scipy/stats/tests/test_distributions.py", line 61, in check_distribution
D, pval = stats.kstest(dist, '', args=args, N=1000)
File "/Users/anz/Documents/Andy/programming/scipy/build/testenv/lib/python3.4/site-packages/scipy/stats/stats.py", line 3904, in kstest
vals = np.sort(rvs(_args, **kwds))
File "/Users/anz/Documents/Andy/programming/scipy/build/testenv/lib/python3.4/site-packages/scipy/stats/_distn_infrastructure.py", line 855, in rvs
raise ValueError("Domain error in arguments.")
ValueError: Domain error in arguments.

@ev-br
Copy link
Member

ev-br commented Apr 5, 2016

(As well hidden in the traceback), this is a different set of tests, which generate random sets of parameters:
https://github.com/scipy/scipy/blob/master/scipy/stats/tests/test_distributions.py#L70

@ev-br
Copy link
Member

ev-br commented Apr 6, 2016

You seem to be hitting quite a few ugly corners of the distributions framework...

The last remaining failure can be taken care of by just listing trap as a failing fit, see the list in stats/tests/test_fit.py.

I've played with it a bit to check how it fares for array-valued shapes, and it seems OK -- there's an additional simple test in https://github.com/ev-br/scipy/tree/pr/6030, can you take it over?
In that branch, I also removed extraneous cases from the select: the out-of-bounds values are handled automatically by the framework, and _pdf does not even see x < 0.

From my POV, this is ready modulo blacklisting the test and renaming it to trapz.

@codecov-io
Copy link

@@            master   #6030   diff @@
======================================
  Files          238     238       
  Stmts        43831   43849    +18
  Branches      8215    8215       
  Methods          0       0       
======================================
+ Hit          34262   34280    +18
  Partial       2602    2602       
  Missed        6967    6967       

Review entire Coverage Diff as of 0605410

Powered by Codecov. Updated on successful CI builds.

@ev-br
Copy link
Member

ev-br commented Apr 9, 2016

Ok, I' m going to merge it. Edge cases, c=0 and d=1, might warrant additional care, but that can be taken care of in a sequel when this sees some usage. Thanks @andyfaff.

@ev-br ev-br merged commit 543aca9 into scipy:master Apr 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants