Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type handling policy for the statistics module #64774

Open
oscarbenjamin mannequin opened this issue Feb 9, 2014 · 3 comments
Open

Type handling policy for the statistics module #64774

oscarbenjamin mannequin opened this issue Feb 9, 2014 · 3 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@oscarbenjamin
Copy link
Mannequin

oscarbenjamin mannequin commented Feb 9, 2014

BPO 20575
Nosy @ncoghlan, @stevendaprano, @wm75

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = 'https://github.com/stevendaprano'
closed_at = None
created_at = <Date 2014-02-09.13:29:43.135>
labels = ['type-feature', 'library']
title = 'Type handling policy for the statistics module'
updated_at = <Date 2019-03-15.22:55:47.677>
user = 'https://bugs.python.org/oscarbenjamin'

bugs.python.org fields:

activity = <Date 2019-03-15.22:55:47.677>
actor = 'BreamoreBoy'
assignee = 'steven.daprano'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2014-02-09.13:29:43.135>
creator = 'oscarbenjamin'
dependencies = []
files = []
hgrepos = []
issue_num = 20575
keywords = []
message_count = 3.0
messages = ['210762', '235960', '236045']
nosy_count = 4.0
nosy_names = ['ncoghlan', 'steven.daprano', 'oscarbenjamin', 'wolma']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue20575'
versions = ['Python 3.5']

@oscarbenjamin
Copy link
Mannequin Author

oscarbenjamin mannequin commented Feb 9, 2014

As of bpo-20481, the statistics module for Python 3.4 will disallow any mixing of numeric types with the exception of int that can mix with any other type (but only one at a time). My understanding is that this change was not necessarily considered to be a permanent policy but rather a quick fix for Python 3.4 in order to explicitly prevent certain confusing situations arising from mixing Decimal with other stdlib numeric types.

bpo-20499 has a lot of discussion about different ways to improve accuracy and speed for the mean, variance etc. functions in the statistics module. It's tricky though to come up with a concrete implementation without having a clear specification for how the module should handle different numeric types.

There are several related issues to do with type handling. Should the statistics module

  1. Use the same coercion rules as the numeric tower (pep-3141)?
  2. Allow Decimal to mix with any types from the numeric tower?
  3. Allow non-stdlib types that don't use the numeric tower?
  4. Allow any mixing of types at all?
  5. Strive to achieve the maximum possible accuracy for every type that it accepts?

I don't personally see much of a use-case for mixing e.g. Decimal and Fraction. I don't think it's unreasonable to require users to choose a numeric type and stick to it. The common cases will almost certainly be either all int or all float so those should be the main targets of any speed optimisation.

If a user is using Fraction/Decimal then they must have gone out of their way to do so and they may as well do so consistently for all of their data. When choosing to use Fraction you do so because you want perfect accuracy. Mixing those Fractions with floating point types such as float and Decimal doesn't make any sense. Although there is a sense in which Decimals are also exact since they are always exact in their constructor. However I don't think there's any case where the Decimal constructor can be used but the Fraction constructor cannot so this mixing of types is unnecessary.

As with Fraction a user who chooses to use Decimal is going out of their way to do so because of the kind of accuracy guarantees that the type provides. It doesn't make any sense to mix these with floats that are inherently tainted with the wrong kind of rounding error. So mixing Decimal and float doesn't make any sense either.

Note that ordinary arithmetic prohibits the mixing of Decimal with Fraction/float so that on this point the statistics module is essentially maintaining a consistent position with respect to the policy of the Decimal type.

On the other hand ordinary arithmetic allows all of int, float, Fraction and complex and indeed any other type subscribing to the ABCs in the numeric tower to be mixed. As of bpo-20481 the statistics module does not allow any type mixing except for int:
http://hg.python.org/cpython/rev/5db74cd953ab
Note also that it uses type identity rather than subclass relationships or ABCs so that it is not even possible to mix e.g. float with a float subclass.

The most common case of mixing will almost certainly be int and float which will work. However I doubt that the current policy would be considered to be in keeping with Python's general policy on numeric types and anticipate that there will be a desire to change it in the future. The obvious candidate for a policy is the numeric tower and ABCs of PEP-3141. In that case the statistics module has a partial precedent on which to base its policy. The only tricky part is that Decimal is not part of the numeric tower. So there needs to be a special rule for Decimal such as "it only mixes with int/Integral".

Basing the policy on the numeric tower is attractive but it is worth noting that the std lib types int, float, Fraction and Decimal are the only types that actually implement and register with these ABCs. So it's not much different from saying that those particular types (and subclasses of) are accepted but I think that that is better than the current policy.

Third party numeric types don't implement the interfaces described in PEP-3141. However one thing that is implemented by every third-party numeric type that I know of is __float__. So if there was to be a desire to support those in the statistics module then the simplest extension of the policy on types is to say that any non-numeric-tower types will simply be coerced with float. This still leaves the issue about how type mixing works there but, again, perhaps the safest option before the need arises is just to say that no type mixing is allowed if any input object is not from the numeric tower.

What do you think?

@oscarbenjamin oscarbenjamin mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Feb 9, 2014
@BreamoreBoy
Copy link
Mannequin

BreamoreBoy mannequin commented Feb 14, 2015

@steven would you please comment on this issue, thanks.

@stevendaprano
Copy link
Member

Thanks for the note Mark. I need to give Oscar's comments some careful and distraction-free thought, but off the top of my head I think Oscar's suggestion to require consistent types seems reasonable, except that mixing int with any other type should be allowed. Otherwise, I think that the coercion rules for mixed float/Decimal/Fraction rapidly become intractable.

@stevendaprano stevendaprano self-assigned this Feb 15, 2015
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

1 participant