From dae872d712dda53bad49f9bce6e4ccd2b403ff13 Mon Sep 17 00:00:00 2001 From: Raymond Hettinger Date: Thu, 5 Sep 2019 18:21:41 -0700 Subject: [PATCH 1/3] Fix typo --- Doc/library/statistics.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Doc/library/statistics.rst b/Doc/library/statistics.rst index 27d0518b4a7923..7422c9fefa0450 100644 --- a/Doc/library/statistics.rst +++ b/Doc/library/statistics.rst @@ -522,7 +522,7 @@ However, for reading convenience, most of the examples show sorted sequences. cut-point will evaluate to ``104``. The *method* for computing quantiles can be varied depending on - whether the data in *data* includes or excludes the lowest and + whether the *data* includes or excludes the lowest and highest possible values from the population. The default *method* is "exclusive" and is used for data sampled from From 4425f3482ebcbe1a689da2362f87442d0c113a14 Mon Sep 17 00:00:00 2001 From: Raymond Hettinger Date: Thu, 5 Sep 2019 22:28:01 -0700 Subject: [PATCH 2/3] Various edits to improve appearance and grammar. Add a statement of purpose taken from the statistics PEP. Add a more familiar and intelligible example for harmonic_mean(). --- Doc/library/statistics.rst | 58 +++++++++++++++++++++----------------- 1 file changed, 32 insertions(+), 26 deletions(-) diff --git a/Doc/library/statistics.rst b/Doc/library/statistics.rst index 7422c9fefa0450..e7178df453249d 100644 --- a/Doc/library/statistics.rst +++ b/Doc/library/statistics.rst @@ -19,17 +19,21 @@ -------------- This module provides functions for calculating mathematical statistics of -numeric (:class:`Real`-valued) data. - -.. note:: - - Unless explicitly noted otherwise, these functions support :class:`int`, - :class:`float`, :class:`decimal.Decimal` and :class:`fractions.Fraction`. - Behaviour with other types (whether in the numeric tower or not) is - currently unsupported. Collections with a mix of types are also undefined - and implementation-dependent. If your input data consists of mixed types, - you may be able to use :func:`map` to ensure a consistent result, for - example: ``map(float, input_data)``. +numeric (:class:`~numbers.Real`-valued) data. + +The module is not intended to be a competitor to such third-party libraries +as `NumPy `_, `SciPy `_, or +proprietary full-featured statistics packages aimed at professional +statisticians such as Minitab, SAS and Matlab. It is aimed at the level of +graphing and scientific calculators. + +Unless explicitly noted otherwise, these functions support :class:`int`, +:class:`float`, :class:`~decimal.Decimal` and :class:`~fractions.Fraction`. +Behaviour with other types (whether in the numeric tower or not) is +currently unsupported. Collections with a mix of types are also undefined +and implementation-dependent. If your input data consists of mixed types, +you may be able to use :func:`map` to ensure a consistent result, for +example: ``map(float, input_data)``. Averages and measures of central location ----------------------------------------- @@ -107,7 +111,7 @@ However, for reading convenience, most of the examples show sorted sequences. :func:`median` and :func:`mode`. The sample mean gives an unbiased estimate of the true population mean, - which means that, taken on average over all the possible samples, + so that when taken on average over all the possible samples, ``mean(sample)`` converges on the true mean of the entire population. If *data* represents the entire population rather than a sample, then ``mean(data)`` is equivalent to calculating the true population mean μ. @@ -163,8 +167,16 @@ However, for reading convenience, most of the examples show sorted sequences. will be equivalent to ``3/(1/a + 1/b + 1/c)``. The harmonic mean is a type of average, a measure of the central - location of the data. It is often appropriate when averaging quantities - which are rates or ratios, for example speeds. For example: + location of the data. It is often appropriate when averaging + rates or ratios, for example speeds. + + Suppose a car travels 10 km at 40 km/hr, then another 10 km at 60 km/hr. + What is the average speed? + + .. doctest:: + + >>> harmonic_mean([40, 60]) + 48.0 Suppose an investor purchases an equal value of shares in each of three companies, with P/E (price/earning) ratios of 2.5, 3 and 10. @@ -175,9 +187,6 @@ However, for reading convenience, most of the examples show sorted sequences. >>> harmonic_mean([2.5, 3, 10]) # For an equal investment portfolio. 3.6 - Using the arithmetic mean would give an average of about 5.167, which - is well over the aggregate P/E ratio. - :exc:`StatisticsError` is raised if *data* is empty, or any element is less than zero. @@ -190,9 +199,9 @@ However, for reading convenience, most of the examples show sorted sequences. middle two" method. If *data* is empty, :exc:`StatisticsError` is raised. *data* can be a sequence or iterator. - The median is a robust measure of central location, and is less affected by - the presence of outliers in your data. When the number of data points is - odd, the middle data point is returned: + The median is a robust measure of central location and is less affected by + the presence of outliers. When the number of data points is odd, the + middle data point is returned: .. doctest:: @@ -210,13 +219,10 @@ However, for reading convenience, most of the examples show sorted sequences. This is suited for when your data is discrete, and you don't mind that the median may not be an actual data point. - If your data is ordinal (supports order operations) but not numeric (doesn't - support addition), you should use :func:`median_low` or :func:`median_high` + If the data is ordinal (supports order operations) but not numeric (doesn't + support addition), consider using :func:`median_low` or :func:`median_high` instead. - .. seealso:: :func:`median_low`, :func:`median_high`, :func:`median_grouped` - - .. function:: median_low(data) Return the low median of numeric data. If *data* is empty, @@ -319,7 +325,7 @@ However, for reading convenience, most of the examples show sorted sequences. desired instead, use ``min(multimode(data))`` or ``max(multimode(data))``. If the input *data* is empty, :exc:`StatisticsError` is raised. - ``mode`` assumes discrete data, and returns a single value. This is the + ``mode`` assumes discrete data and returns a single value. This is the standard treatment of the mode as commonly taught in schools: .. doctest:: From b5a9d4469e8a9c8c545ad6f05cd56a6c89decf12 Mon Sep 17 00:00:00 2001 From: Raymond Hettinger Date: Thu, 5 Sep 2019 22:35:03 -0700 Subject: [PATCH 3/3] Fix awkward wording --- Doc/library/statistics.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Doc/library/statistics.rst b/Doc/library/statistics.rst index e7178df453249d..0798ae29118e33 100644 --- a/Doc/library/statistics.rst +++ b/Doc/library/statistics.rst @@ -21,13 +21,13 @@ This module provides functions for calculating mathematical statistics of numeric (:class:`~numbers.Real`-valued) data. -The module is not intended to be a competitor to such third-party libraries +The module is not intended to be a competitor to third-party libraries such as `NumPy `_, `SciPy `_, or proprietary full-featured statistics packages aimed at professional statisticians such as Minitab, SAS and Matlab. It is aimed at the level of graphing and scientific calculators. -Unless explicitly noted otherwise, these functions support :class:`int`, +Unless explicitly noted, these functions support :class:`int`, :class:`float`, :class:`~decimal.Decimal` and :class:`~fractions.Fraction`. Behaviour with other types (whether in the numeric tower or not) is currently unsupported. Collections with a mix of types are also undefined