Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Browse files

DOC: merge some more wiki edits.

  • Loading branch information...
commit 6beea2cdf3cdb7ed8999ff6ed79ff1b0b519ce1e 1 parent b904ccc
@rgommers rgommers authored
View
42 doc/source/tutorial/stats.rst
@@ -2,6 +2,7 @@ Statistics (`scipy.stats`)
==========================
.. sectionauthor:: Travis E. Oliphant
+.. sectionauthor:: Josef Perktold
Introduction
------------
@@ -108,7 +109,7 @@ The main public methods are:
* moment: non-central moments of the distribution
The main additional methods of the not frozen distribution are related to the estimation
-of distrition parameters:
+of distribution parameters:
* fit: maximum likelihood estimation of distribution parameters, including location
and scale
@@ -152,6 +153,43 @@ for 11 d.o.f. and the 1% tail for 12 d.o.f. by
>>> stats.t.isf([0.1, 0.05, 0.01], [10, 11, 12])
array([ 1.37218364, 1.79588482, 2.68099799])
+In the case of continuous distribution the cumulative distribution function
+is in most standard cases strictly monotonic in the bounds (a,b) and has
+therefore a unique inverse. The cdf of a discrete distribution is however
+a step function and the inverse of it, the percent point function, requires
+a different definition :ref:`(ppf of discrete random variables) <discrete-ppf>`:
+
+See the docs `here <http://docs.scipy.org/doc/scipy/reference/tutorial/stats/discrete.html#percent-point-function-inverse-cdf>`__.
+
+::
+
+ ppf(q) = min{x : cdf(x) >= q, x integer}
+
+We can look at the hypergeometric distribution as an example
+
+ >>> from scipy.stats import hypergeom
+ >>> [M, n, N] = [20, 7, 12]
+
+If we use the cdf at some integer points and then evaluate the ppf at those
+cdf values, we get the initial integers back, for example
+
+ >>> x = np.arange(4)*2
+ >>> x
+ array([0, 2, 4, 6])
+ >>> prb = hypergeom.cdf(x, M, n, N)
+ >>> prb
+ array([ 0.0001031991744066, 0.0521155830753351, 0.6083591331269301,
+ 0.9897832817337386])
+ >>> hypergeom.ppf(prb, M, n, N)
+ array([ 0., 2., 4., 6.])
+
+If we use values that are not at the kinks of the cdf step function, we get
+the next higher integer back:
+
+ >>> hypergeom.ppf(prb+1e-8, M, n, N)
+ array([ 1., 3., 5., 7.])
+ >>> hypergeom.ppf(prb-1e-8, M, n, N)
+ array([ 0., 2., 4., 6.])
Performance and Remaining Issues
@@ -313,7 +351,7 @@ the Student t distribution:
>>> x = stats.t.rvs(10, size=1000)
Here, we set the required shape parameter of the t distribution, which
-in statistics corresponds to the degrees of freedom, to 10. Using size=100 means
+in statistics corresponds to the degrees of freedom, to 10. Using size=1000 means
that our sample consists of 1000 independently drawn (pseudo) random numbers.
Since we did not specify the keyword arguments `loc` and `scale`, those are
set to their default values zero and one.
View
1  doc/source/tutorial/stats/discrete.rst
@@ -91,6 +91,7 @@ The survival function is just
the probability that the random variable is strictly larger than :math:`k` .
+.. _discrete-ppf:
Percent Point Function (Inverse CDF)
------------------------------------
View
6 scipy/interpolate/fitpack2.py
@@ -480,7 +480,8 @@ def __init__(self, x, y, t, w=None, bbox = [None]*2, k=3):
class BivariateSpline(object):
- """ Bivariate spline s(x,y) of degrees kx and ky on the rectangle
+ """
+ Bivariate spline s(x,y) of degrees kx and ky on the rectangle
[xb,xe] x [yb, ye] calculated from a given set of data points
(x,y,z).
@@ -488,10 +489,11 @@ class BivariateSpline(object):
--------
bisplrep, bisplev : an older wrapping of FITPACK
UnivariateSpline : a similar class for univariate spline interpolation
- SmoothUnivariateSpline :
+ SmoothBivariateSpline :
to create a BivariateSpline through the given points
LSQUnivariateSpline :
to create a BivariateSpline using weighted least-squares fitting
+
"""
def get_residual(self):
View
2  scipy/interpolate/ndgriddata.py
@@ -92,7 +92,7 @@ def griddata(points, values, xi, method='linear', fill_value=np.nan):
simplices, and interpolate linearly on each simplex. See
`LinearNDInterpolator` for more details.
- - ``cubic`` (1-D): return the value detemined from a cubic
+ - ``cubic`` (1-D): return the value determined from a cubic
spline.
- ``cubic`` (2-D): return the value determined from a
View
2  scipy/optimize/minpack.py
@@ -224,7 +224,7 @@ def leastsq(func, x0, args=(), Dfun=None, full_output=0,
estimate of the jacobian around the solution. ``None`` if a
singular matrix encountered (indicates very flat curvature in
some direction). This matrix must be multiplied by the
- residual standard deviation to get the covariance of the
+ residual variance to get the covariance of the
parameter estimates -- see curve_fit.
infodict : dict
a dictionary of optional outputs with the key s::
Please sign in to comment.
Something went wrong with that request. Please try again.