Add a function for computing binomial coefficients to the math module #79612

KellerFuchs · 2018-12-06T21:18:02Z

BPO	35431
Nosy	@tim-one, @rhettinger, @mdickinson, @jwilk, @stevendaprano, @serhiy-storchaka, @MojoVampire, @pablogsal, @KellerFuchs, @FR4NKESTI3N, @Radcliffe
PRs	bpo-35431: Add a function computing binomial numbers to the math module #11018 bpo-35431: Implemented math.comb #11414 bpo-35431: Refactor math.comb() implementation. #13725 bpo-37128: Add math.perm(). #13731 bpo-35431: Put math.comb() docs in correct place alphabetically #13734 bpo-35431: Drop the k <= n requirement #13798 bpo-35431: Fix grammar in docs #13801 bpo-35431: Simplify negativity checks in math.comb and math.perm. #13870 [3.8] bpo-35431: Simplify negativity checks in math.comb and math.perm. (GH-13870) #14125 bpo-35431: Test math.comb() and math.perm() for OverflowError only on CPython. #14146 [3.8] bpo-35431: Test math.comb() and math.perm() for OverflowError only on CPython. (GH-14146) #14226

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/rhettinger'
closed_at = <Date 2019-06-05.14:38:26.241>
created_at = <Date 2018-12-06.21:18:02.072>
labels = ['type-feature', '3.8']
title = 'Add a function for computing binomial coefficients to the math module'
updated_at = <Date 2019-06-23.14:50:07.415>
user = 'https://github.com/KellerFuchs'

bugs.python.org fields:

activity = <Date 2019-06-23.14:50:07.415>
actor = 'serhiy.storchaka'
assignee = 'rhettinger'
closed = True
closed_date = <Date 2019-06-05.14:38:26.241>
closer = 'rhettinger'
components = []
creation = <Date 2018-12-06.21:18:02.072>
creator = 'kellerfuchs'
dependencies = []
files = []
hgrepos = []
issue_num = 35431
keywords = ['patch']
message_count = 85.0
messages = ['331251', '331255', '331256', '331257', '331280', '331281', '331293', '331295', '331296', '331308', '331309', '331312', '331318', '331323', '331325', '331335', '331336', '331337', '331339', '331369', '331402', '331743', '331748', '331859', '332817', '332826', '332838', '332844', '332933', '332957', '332960', '332987', '332988', '332990', '334457', '334458', '334460', '334461', '334474', '334493', '334494', '334499', '334501', '334502', '334683', '334684', '336384', '336458', '336750', '342253', '342257', '344154', '344155', '344165', '344170', '344176', '344186', '344199', '344218', '344219', '344221', '344229', '344230', '344231', '344232', '344235', '344236', '344239', '344243', '344257', '344258', '344292', '344366', '344370', '344371', '344375', '344414', '344420', '344436', '344448', '344450', '344452', '344531', '345860', '346328']
nosy_count = 11.0
nosy_names = ['tim.peters', 'rhettinger', 'mark.dickinson', 'jwilk', 'steven.daprano', 'serhiy.storchaka', 'josh.r', 'pablogsal', 'kellerfuchs', 'FR4NKESTI3N', 'David Radcliffe']
pr_nums = ['11018', '11414', '13725', '13731', '13734', '13798', '13801', '13870', '14125', '14146', '14226']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue35431'
versions = ['Python 3.8']

KellerFuchs · 2018-12-06T21:18:02Z

A recuring pain point, for me and for many others who use Python for mathematical computations, is that the standard library does not provide a function for computing binomial coefficients.

I would like to suggest adding a function, in the math module, with the following signature:

binomial(n: Integral, k: Integral) -> Integral

A simple native Python implementation would be:

  from functools import reduce
  from math import factorial
  from numbers import Integral

  def binomial(n: Integral, k: Integral) -> Integral:
      if k < 0 or n < 0:
          raise ValueError("math.binomial takes non-negative parameters.")

      k = min(k, n-k)
      num, den = 1, 1
      for i in range(k):
          num = num * (n - i)
          den = den * (i + 1)

  return num//den

As far as I can tell, all of the math module is implemented in C, so this should be done in C too, but the implemented behaviour should be equivalent.

I will submit a Github pull request once I have a ready-to-review patch.

Not starting a PEP, per PEP-1:

Small enhancements or patches often don't need a PEP and can be injected into the Python development workflow with a patch submission to the Python issue tracker.

stevendaprano · 2018-12-06T22:31:54Z

You import reduce but never use it :-)

+1 for this, I certainly miss it too.

KellerFuchs · 2018-12-06T23:22:43Z

Yes, that was a copypasta mistake (and I also import factorial needlessly) as the file I prototyped it in had some other code for testing my proposed implementation. :)

rhettinger · 2018-12-07T00:04:44Z

+1 I have wanted this a number of times.

FWIW, most recently I wrote it like this:

    def comb(n, r):
        'Equivalent to factorial(n) // (factorial(r) * factorial(n-r))'
        c = 1
        r = min(r, n-r)
        for i in range(1, r+1):
            c = c * (n - r + i) // i
        return c

I'm not sure is this is better than a single divide, but it kept the intermediate values from exploding in size, taking advantage of cancellation at each step.

Also, I'm not sure what the predominant choice for variable names should be, "n things taken r at a time" or "n things taken k at time".

Also, it's worth considering whether the function should be called "binomial", "choose", "combinations", or "comb". The word "binomial" seems too application specific but would make sense if we ever introduced a "multinomial" counterpart. The word "choose" is how we usually pronounce it. The word "combinations" fits nicely with the related counting functions, "combinations, permutations, and factorial". The word "comb" is short, works well with "perm" and "fact", and nicely differentiates itself as the integer counterparts of the combinatoric functions in the itertools module.

Wolfram uses both choose and Binomial[n,m]
SciPy uses comb(n, k).
Maple uses both numbcomb(n,m) and binomial(n,m).
TeX uses {n \choose k}

mdickinson · 2018-12-07T08:42:05Z

For some ranges of inputs, it may make sense to use the apparently naive implementation factorial(n) // factorial(k) // factorial(n - k). The current factorial implementation is significantly optimised, and using it directly may be faster than using an iterative solution.

Here are some timings (Python 3.7.1, macOS 10.13), using Raymond's comb function from msg331257:

In [5]: %timeit comb(1000, 600)
186 µs ± 442 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [6]: %timeit factorial(1000) // factorial(600) // factorial(400)
97.8 µs ± 256 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [7]: %timeit factorial(1000) // (factorial(600) * factorial(400))
91.1 µs ± 789 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

But that's just one set of inputs, on one system; your results may vary.

mdickinson · 2018-12-07T09:03:06Z

There's also the question of what inputs should be considered valid: binomial(n, k) for k > n should either return 0 or be a ValueError, but which? Same question for k < 0. There's a symmetry argument for allowing k < 0 if you allow k > n, but I can't think of pragmatic reasons to allow k < 0, while allowing k > n does seem potentially useful.

Note that this needs fixing with both of the code snippets shown so far: they both return 1 for k > n.

KellerFuchs · 2018-12-07T11:25:43Z

@Raymond Hettinger:

it's worth considering whether the function should be called "binomial", "choose", "combinations", or "comb"

Here goes the bike shed! :)
Kidding, this is a pretty important ergonomics/discoverability concern, and I hadn't given it much thought yet.

I'd rather not call it comb, because it collides with a completely-unrelated English word, and it's not obvious what it stands for unless one already knows.

The word "combinations" fits nicely with the related counting functions, "combinations, permutations, and factorial".

That's a good point; math.permutations doesn't exist, but itertools.permutations does, so I would expect (by analogy) a “combinations” functions to produce all possible k-sized subsets (rather than counting them), and that's exactly what itertools.combinations does.

On the other hand, combinations and permutations are names that make it perhaps more obvious what is being counted, so perhaps math.{combinations,permutations} should be aliases for math.{binomial,factorial} ? Is the name collision with itertools a problem?

TL;DR: “binomial” is more consistent with the current naming in math and itertools, but perhaps it makes sense to introduce math.{combination,permutations} as aliases?

(Note that I used math.binomial as the name in the PR so far, but that's only because I needed a name, not because I consider the discussion ended.)

@mark Dickinson:

The current factorial implementation is significantly optimised, and using it directly may be faster than using an iterative solution.

Yes, I avoided pushing specifically for a given algorithm (rather than getting upfront agreement on the functional behaviour) because the performance characteristics will likely be quite different once implemented in C in the math module.

(Unless I'm mistaken and there is a way to add pure-Python functions to the math module?)

binomial(n, k) for k > n should either return 0 or be a ValueError, but which?

From a mathematical standpoint, (n choose k) is defined for all non-negative k, n, with (n chooze k) = 0 when k>n or k=0.

It's necessary behaviour for the usual equations to hold (like (n + 1 choose k + 1) = (n choose k) + (n choose k + 1)).
As such, I'd argue that returning 0 is both more likely to be the thing the user wants (as in, it's necessary behaviour for combinatorics) and easier to reason about.

Negative k and n, on the other hand, should clearly be a ValueError, and so should non-integers inputs; this is consistent with factorial's behaviour.

I started a pull request and (for now) only added tests which document that (proposed) behaviour, so we can more easily discuss whether that's what we want.

Note that this needs fixing with both of the code snippets shown so far: they both return 1 for k > n.

Yes :)
I noticed last night, as I wrote Hypothesis tests for the snippet, but didn't think it was super important to send an updated version.

serhiy-storchaka · 2018-12-07T11:30:06Z

I think that it is better to add this function in a new module imath, which could contain other integer functions: factorial, gcs, as_integer_ration, isqrt, isprime, primes.

See https://mail.python.org/pipermail/python-ideas/2018-July/051917.html

serhiy-storchaka · 2018-12-07T11:32:22Z

Mathematically, binomial(n, k) for k > n is defined as 0.

mdickinson · 2018-12-07T13:08:26Z

Mathematically, binomial(n, k) for k > n is defined as 0.

It's not so clear cut. You can find different definitions out there. Knuth et. al., for example, in the book "Concrete Mathematics", extend the definition not just to negative k, but to negative n as well. Mathematicians aren't very good at agreeing on things. :-)

But that doesn't really matter: what we need to decide is what behaviour is useful for the users of the function.

mdickinson · 2018-12-07T13:10:09Z

@KellerFuchs:

From a mathematical standpoint, (n choose k) is defined for all non-negative k, n, with (n chooze k) = 0 when k>n or k=0.

You don't mean the "k=0" part of that, right?

mdickinson · 2018-12-07T13:37:36Z

One more decision that needs to be made: should the new function accept integer-valued floats? Or should any float input give a TypeError.

I'd personally prefer that floats not be accepted; I think this was a misfeature of factorial that we shouldn't compound.

stevendaprano · 2018-12-07T14:44:09Z

On Fri, Dec 07, 2018 at 12:04:44AM +0000, Raymond Hettinger wrote:

Also, I'm not sure what the predominant choice for variable names
should be, "n things taken r at a time" or "n things taken k at time".

Also, it's worth considering whether the function should be called
"binomial", "choose", "combinations", or "comb".

I've done a quick survey of some of the most common/popular scientific
calculators:

TI Nspire
TI-84 Plus
Casio Classpad
Casio FX-82AU Plus II

all call this nCr, and nPr for the permutation version. This matches
the notation taught in secondary school maths classes in Australia.
That's common and familiar notation for secondary school students, but
personally I'm not super-keen on it.

For what its worth, the colour I prefer for this bikeshed are "comb" and
"perm", which are the names used by the HP 48GX calculator. Second
choice would be to spell the names out in full, "combinations" and
"permutations".

stevendaprano · 2018-12-07T14:52:19Z

Mathematically, binomial(n, k) for k > n is defined as 0.

It's not so clear cut. You can find different definitions out there.
Knuth et. al., for example, in the book "Concrete Mathematics", extend
the definition not just to negative k, but to negative n as well.
Mathematicians aren't very good at agreeing on things. :-)

I think the key word there is *extend*. To the degree that any
mathematical definition is "obvious", the obvious definition for number
of combinations ("n choose r") is going to be 1 for r == 0 and 0 for r > n.

However, I think that it is too easy to get the order of n and r (n and
k) mixed up, and write combinations(5, 10) when you wanted to choose 5
from 10. I know I make that mistake on my calculator *all the time*, and
so do my students, even with the nPr notation. So I recommend we raise
ValueError for r > n.

stevendaprano · 2018-12-07T14:56:39Z

On Fri, Dec 07, 2018 at 01:37:36PM +0000, Mark Dickinson wrote:

I'd personally prefer that floats not be accepted;

Agreed. We can always add support for floats later, but its hard to
remove it if it turns out to be problematic.

We ought to require n, r (or n, k) to be non-negative ints with 0 <= r
<= n. Extending this to negative ints or floats is probably YAGNI, but
if somebody does need it, they can request an enhancement in the future.

KellerFuchs · 2018-12-07T16:47:27Z

@serhiy Storchaka:

I think that it is better to add this function in a new module imath, which could contain other integer functions

imath is a fine idea, and you already started a discussion in python-ideas@, but it's a much bigger undertaking than just adding this one function, and you can move it there once imath happens.
As such, I think it's out-of-scope in this issue.

KellerFuchs · 2018-12-07T16:49:40Z

@mark Dickinson:

You don't mean the "k=0" part of that, right?

Indeed not; the tests in the PR actually assert binomial(n, n) == binomial(n, 0) == 1.

KellerFuchs · 2018-12-07T16:52:50Z

I'd personally prefer that floats not be accepted; I think this was a misfeature of factorial that we shouldn't compound.

OK; I only went with that because I assumed there were Good Reasons© that factorial did it, but if rejecting integral floats isn't going to be a controversial move I'm also in favor of it.

Updating the PR momentarily to check that binomial rejects floats.

KellerFuchs · 2018-12-07T17:01:35Z

@steven D'Aprano:

all call this nCr, and nPr for the permutation version. This matches
the notation taught in secondary school maths classes in Australia.
That's common and familiar notation for secondary school students, but
personally I'm not super-keen on it.

It's also not universal; in my experience, most calculators are localized for a given market, and may use different notations (in particular, the notation for combinations/binomial numbers changes across countries).

stevendaprano · 2018-12-07T23:53:02Z

Brett, what was the purpose of the title change?

title: The math module should provide a function for computing
binomial coefficients -> Add a function for computing binomial
coefficients to the math module

rhettinger · 2018-12-09T01:11:28Z

For what its worth, the colour I prefer for this bikeshed are
"comb" and "perm", which are the names used by the
HP 48GX calculator. Second choice would be to spell the names
out in full, "combinations" and "permutations".

+1 These would be my preferences as well :-)

tim-one · 2018-12-13T05:58:17Z

My two cents:

Spell it comb(n, k).
TypeError if args aren't ints.
ValueError if not 0 <= k <= n.

Not concerned about speed. It's possible to do this with no divisions involving integers bigger than n and k(*), using O(k) space, but for "practical" arguments I bet that's slower than the dumbest possible loop.

(*) Sketch: make lists of the k numerators and k-1 denominators (skip 1). For each prime P <= k, a modulus operation can determine the index of the first multiple of P in each list. For that, and for each P'th later list element, divide out the multiples of P, adding 1 to a running count for numerators, subtracting 1 for denominators, and reducing each list element by the Ps divided out. Then if the final P count isn't 0 (it will always be >= 0), append pow(P, count) to a list of factors. pow() is efficient.

After that, all the denominators will be reduced to 1, so can be ignored thereafter. It just remains to multiply all the reduced numerators and prime-power factors.

Catenate them all in a single list, heapify it (min heap), and so long as at least 2 factors remain pop the two smallest and push their product. This attempts to balance bit lengths of incoming factors, allowing close-to-best-case-for-it Karatsuba multiplication to kick in.

But that's nuts ;-) To get even nutsier, special-case P=2 to use shifts instead, skip adding pow(2, count) to the list of factors, and just shift left by the count at the end.

That said, even the "dumbest possible loop" may benefit in C by shifting out all trailing zeroes, multiplying/dividing only odd integers, and shifting left at the end.

mdickinson · 2018-12-13T08:38:30Z

[Tim]

My two cents:

Spell it comb(n, k).

TypeError if args aren't ints.

ValueError if not 0 <= k <= n.

+1 to all of this.

tim-one · 2018-12-14T19:37:40Z

Just for fun, here's a gonzo implementation (without arg-checking) using ideas from the sketch. All factors of 2 are shifted out first, and all divisions are done before any multiplies.

For large arguments, this can run much faster than a dumb loop. For example, combp(10**100, 400) takes about a quarter the time of a dumb-loop divide-each-time-thru implementation.

    # Return number of trailing zeroes in `n`.
    def tzc(n):
        result = 0
        if n:
            mask = 1
            while n & mask == 0:
                result += 1
                mask <<= 1
        return result

    # Return exponent of prime `p` in prime factorization of
    # factorial(k).
    def facexp(k, p):
        result = 0
        k //= p
        while k:
            result += k
            k //= p
        return result

    def combp(n, k):
        from heapq import heappop, heapify, heapreplace

        if n-k < k:
            k = n-k
        if k == 0:
            return 1
        if k == 1:
            return n
        firstnum = n - k + 1
        nums = list(range(firstnum, n+1))
        assert len(nums) == k

        # Shift all factors of 2 out of numerators.
        shift2 = 0
        for i in range(firstnum & 1, k, 2):
            val = nums[i]
            c = tzc(val)
            assert c
            nums[i] = val >> c
            shift2 += c
        shift2 -= facexp(k, 2) # cancel all 2's in factorial(k)
        assert shift2 >= 0

        # Any prime generator is fine here.  `k` can't be
        # huge, and we only want the primes through `k`.
        pgen = psieve()
        p = next(pgen)
        assert p == 2

        for p in pgen:
            if p > k:
                break
            pcount = facexp(k, p)
            assert pcount
            # Divide that many p's out of numerators.
            i = firstnum % p
            if i:
                i = p - i
            for i in range(i, k, p):
                val, r = divmod(nums[i], p)
                assert r == 0
                pcount -= 1
                while pcount:
                    val2, r = divmod(val, p)
                    if r:
                        break
                    else:
                        val = val2
                        pcount -= 1
                nums[i] = val
                if pcount == 0:
                    break
            assert pcount == 0

        heapify(nums)
        while len(nums) > 1:
            a = heappop(nums)
            heapreplace(nums, a * nums[0])
        return nums[0] << shift2

I'm NOT suggesting to adopt this. Just for history in the unlikely case there's worldwide demand for faster comb of silly arguments ;-)

FR4NKESTI3N · 2018-12-31T13:44:17Z

Can I work on C implementation if no-one else is doing it right now?

mdickinson · 2018-12-31T18:26:09Z

Can I work on C implementation if no-one else is doing it right now?

Sounds fine to me. You might want to coordinate with @KellerFuchs to see what the status of their PR is; maybe the two of you can collaborate?

@KellerFuchs: are you still planning to work on this?

rhettinger · 2018-12-31T22:10:53Z

Kellar and Yash, my suggestion is to separate the work into two phases.

Start with an initial patch that implements this simplest possible implementation, accompanied by clean documentation and thorough testing.

Once everyone has agreed on the API (i.e. calling it "comb()", how to handle various input datatypes, and handling of corner cases), and the patch is applied, only then turn to a second pass for optimizations (special casing various types, minimizing how big intermediate values can get by doing early cancellation, exploiting even/odd patterns etc).

rhettinger · 2019-06-01T20:20:53Z

Why do we want __index__ support? This seems like an unnecessary extension without relevant use cases.

serhiy-storchaka · 2019-06-01T20:23:38Z

NumPy integer types are not subclasses of int.

mdickinson · 2019-06-01T20:31:46Z

I'd expect any math module function accepting integers to be sufficiently duck-typed to accept integer-like things (i.e., objects that implement __index__), in just the same way that the regular math module functions (sin, log, atan, sqrt, ...) accept arbitrary objects defining __float__. We already do this for gcd, factorial.

rhettinger · 2019-06-01T20:56:06Z

Mark, please see: https://twitter.com/daveinstpaul/status/1134919179361034240

What are your thoughts?

rhettinger · 2019-06-01T20:58:07Z

One other idea: it may be worth putting in an alias for "binomial" to improve findability and readability in contexts where a person really does what binomial coefficients and is not otherwise thinking about counting contexts.

pablogsal · 2019-06-01T21:01:21Z

Just a heads up from bpo-37125: The leak is was fixed by PR13725.

mdickinson · 2019-06-01T21:02:47Z

What are your thoughts?

Sigh. I don't object to extending to k < 0 and k > n, but once we've made that extension it's impossible to undo if we decide we'd rather have had the error checking. I'd really like to see some convincing use-cases. Quotes from Concrete Mathematics (fine book though it is) don't amount to use-cases.

I'd say leave it as-is for 3.8, see what the reaction is, and maybe relax constraints in 3.9 if that seems appropriate. But if a majority of others really want to make the change now, that's okay with me.

rhettinger · 2019-06-01T21:10:25Z

I'd say leave it as-is for 3.8, see what the reaction is,
and maybe relax constraints in 3.9 if that seems appropriate.

I concur. That said, the referenced tweet was a reaction :-)

FWIW, itertools.combinations(seq, r) returns 0 values when r > len(seq).

Tim, what do you think?

mdickinson · 2019-06-01T21:10:53Z

I'm particularly sceptical about real-world use-cases for k < 0. Maybe we should consider removing just the k > n requirement.

tim-one · 2019-06-01T21:19:47Z

Strongly prefer requiring 0 <= k <= n at first. This is a programming language: it will be applied to real problems, not to abstract proofs where some slop can be helpful in reducing the number of cases that need to be considered.

The Twitter fellow is certainly right that "0" is the clear answer to "how many 5-element subsets does have a 4-element set have?", but in the context of real problems it's more likely (to my eyes) that a programmer asking that question of comb() has gotten confused. It certainly would have been "an error" in any programming use for comb() I've ever had.

Raymond, I'm not clear on what you mean by "alias". You mean supplying the same function under more than one name? I'd be -1 on that (where would it end? the answer to every future name bikeshedding issue then would be "all of the above"). But +1 on, e.g., adding a doc index entry for "binomial" pointing to "comb".

Radcliffe · 2019-06-01T22:37:31Z

I think that binomial(n, k) should return 0 when k > n or k < 0. This is a practical consideration. I'm concerned about evaluating sums involving binomial coefficients. Mathematicians are often rather loose about specifying the upper and lower bounds of summation, because the unwanted terms are zero anyway. These formulas should not result in value errors when they are implemented directly.

To give a simplistic example, the sum of the first n positive integers is binomial(n+1, 2), and the formula should still work if n is zero.

mdickinson · 2019-06-02T07:41:56Z

@david Ratcliffe

the sum of the first n positive integers is binomial(n+1, 2), and the formula should still work if n is zero.

I agree that's a convincing use-case.

Can you think of any practical uses for allowing negative k? I can't.

mdickinson · 2019-06-02T07:43:36Z

@david Ratcliffe

Radcliffe! Apologies for the misspelling. It's still early in this timezone and I haven't had any coffee yet.

tim-one · 2019-06-02T15:19:39Z

I'm not convinced, although I agree relaxing k <= n is less damaging than relaxing k >= 0.

Python isn't aimed at mathematicians (although some 3rd-party packages certainly are, and they're free to define things however they like). We have to trade off convenience for experts in edge cases against the likelihood that an ordinary user is making a mistake.

For example, that's why, despite that Python supports complex numbers, math.sqrt(-1) raises ValueError. For _most_ users, trying that was probably an error in their logic that led to the argument being negative. Experts can use cmath.sqrt() instead.

Ordinary users think comb(n, k) is the value of n!//(k!*(n-k)!), and as far as they're concerned factorial is only defined for non-negative integers. 0 <= k <= n follows from that. (Indeed, the current docs for comb() define the result via the factorial expression.)

And ordinary users think of the sum of the first n integers as "number of terms times the average", or memorize directly that the answer is n*(n+1)/2. That works fine for n=0. Only an expert thinks of it as comb(n+1, 2).

So I would still rather enforce 0 <= k <= n at the start, and relax it later only if there's significant demand. In going on three decades of using Python and having written my own comb() at least a dozen times, I've never found that constraint limiting, and enforcing it has caught errors in my code.

rhettinger · 2019-06-03T01:59:54Z

Here are a few other data points. There is no consensus but either returning 0 or erroring out seem reasonable.

MS Excel's COMBIN: If number < 0, number_chosen < 0, or number < number_chosen, COMBIN returns the #NUM! error value.

Scipy.misc.comb: If k > N, N < 0, or k < 0, then a 0 is returned.

Matlib.nchoose: Error: K must an integer between 0 and N.

Maple.numbcomb: Unclear from the docs what this does

Wolfram alpha: Returns 0
https://www.wolframalpha.com/input/?i=10+choose+12

tim-one · 2019-06-03T02:27:43Z

I'm going to repeat part of an earlier comment :-)

"""
Please resist pointless feature creep. The original report was about comb(n, k) for integer n and k with 0 <= k <= n and that's all. Everyone who commented appeared to agree they'd find that useful.
"""

Why can't we take universal agreement for an initial answer? ;-)

Looking at what other packages do doesn't really interest me, unless they differ on the cases everyone agreed would be useful - they have different audiences than Python has. Guess what Wolfram returns for

(-4) choose (-5)

Go ahead. That's right: -4. Sheesh - I don't care who their audience is, any code calling choose with those arguments would be far better served if the function complained.

Radcliffe · 2019-06-03T02:32:23Z

I understand that pure mathematics is not the primary use case. But I would
expect that math.comb() would be used to implement mathematical formulas.
As a simple example, the number of edges in a complete graph on n vertices
is binomial(n, 2), and this should be valid if n < 2. Polynomial
interpolation is another problem where this can occur. However, I do agree
that binomial coefficients with negative arguments are relatively
unimportant.

On Sun, Jun 2, 2019 at 8:59 PM Raymond Hettinger <report@bugs.python.org>
wrote:

Raymond Hettinger <raymond.hettinger@gmail.com> added the comment:

Here are a few other data points. There is no consensus but either
returning 0 or erroring out seem reasonable.

MS Excel's COMBIN: If number < 0, number_chosen < 0, or number <
number_chosen, COMBIN returns the #NUM! error value.

Scipy.misc.comb: If k > N, N < 0, or k < 0, then a 0 is returned.

Matlib.nchoose: Error: K must an integer between 0 and N.

Maple.numbcomb: Unclear from the docs what this does

Wolfram alpha: Returns 0
https://www.wolframalpha.com/input/?i=10+choose+12

----------

Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue35431\>

tim-one · 2019-06-03T03:46:26Z

I'm not fatally opposed to relaxing k <= n. David makes some good points about it, and - as Raymond already noted - "0" is consistent with the behavior of itertools.combinations().

The docs would need to change, though, because the factorial "definition" would no longer make sense. Defining it in terms of the falling factorial would, but that's too obscure for Python's audience. Probably a combinatorial definition would be best: `comb(n, k) is the number of k-element subsets of an n-element set". Then it's clear that n and k are cardinals (integers >= 0), and that the result is 0 when k > n.

stevendaprano · 2019-06-03T11:40:50Z

For what its worth, there are concrete, practical applications for
binomial coefficients with negative arguments. They are used in
fractional calculus

https://nrich.maths.org/1365

which in turn has applications in physics, chemistry and other sciences:

https://en.wikipedia.org/wiki/Fractional_calculus#Applications

Take that however you see fit :-)

My own preference is for a comb() function that returns 0 for out of
bounds values (e.g. "choose 5 items from 4"), and a seperate binomial()
function that accepts any positive or negative integer values. Even I
think that fractional and complex arguments are probably a bit too
exotic for the std lib -- that's Mathematica territory.

And yes, Mathematica does accept fractional and complex arguments to
the choose() function.

https://www.wolframalpha.com/input/?i=choose%28sqr%28-3.5%29,+sqrt%28-4%2Bi%29%29

mdickinson · 2019-06-03T13:31:15Z

Given that (a) we're right up against feature freeze, and (b) the discussions about API already happened in plenty of good time some months ago, leading to agreement on the initial API, I think we should avoid any last-minute changes and stick with what we have for 3.8. Then we can have a more relaxed discussion about what, if anything, should be changed for 3.9.

rhettinger · 2019-06-03T15:58:49Z

I don't have a preference one way or the other (both ways have plausible arguments, error checking vs utility beyond simple combinatorics, and both ways have ample precedents).

That said, if we're going to ultimately relax the constraints, it would be better to do it now before the API is released.

Even if we were already in feature freeze, it would be okay to change it (the purpose of a beta is to elicit end-user feedback which is what David has just given).

tim-one · 2019-06-03T16:56:29Z

Python needs a benevolent dictator ;-)

I'd be happy for Mark, Raymond, or me to play that role here. If it were me, I'd drop the k <= n requirement: both arguments must be non-negative integers. Because a combinatorial (not factorial-, and certainly not gamma-, based) definition of comb() is best suited for Python's general audience, and "number of ways to pick k things out of n without regard to order" makes clear sense for any pair of non-negative k and n. But anything beyond that does not.

rhettinger · 2019-06-03T17:13:13Z

If it were me, I'd drop the k <= n requirement

+1 from me.

mdickinson · 2019-06-03T17:23:13Z

Fine by me. The same change should be applied to math.perm.

rhettinger · 2019-06-04T08:23:27Z

New changeset 963eb0f by Raymond Hettinger in branch 'master':
bpo-35431: Drop the k <= n requirement (GH-13798)
963eb0f

serhiy-storchaka · 2019-06-17T13:58:42Z

New changeset 1b8a46d by Serhiy Storchaka in branch 'master':
bpo-35431: Test math.comb() and math.perm() for OverflowError only on CPython. (GH-14146)
1b8a46d

serhiy-storchaka · 2019-06-23T14:50:07Z

New changeset 914d6b7 by Serhiy Storchaka in branch '3.8':
[3.8] bpo-35431: Test math.comb() and math.perm() for OverflowError only on CPython. (GH-14146) (bpo-14226)
914d6b7

KellerFuchs mannequin added 3.8 only security fixes type-feature A feature request or enhancement labels Dec 6, 2018

brettcannon changed the title ~~The math module should provide a function for computing binomial coefficients~~ Add a function for computing binomial coefficients to the math module Dec 7, 2018

rhettinger added the 3.8 only security fixes label Jun 3, 2019

rhettinger closed this as completed Jun 5, 2019

ezio-melotti transferred this issue from another repository Apr 10, 2022

Add a function for computing binomial coefficients to the math module #79612

Add a function for computing binomial coefficients to the math module #79612

Comments

KellerFuchs mannequin commented Dec 6, 2018

KellerFuchs mannequin commented Dec 6, 2018

stevendaprano commented Dec 6, 2018

KellerFuchs mannequin commented Dec 6, 2018

rhettinger commented Dec 7, 2018

mdickinson commented Dec 7, 2018

mdickinson commented Dec 7, 2018

KellerFuchs mannequin commented Dec 7, 2018

serhiy-storchaka commented Dec 7, 2018

serhiy-storchaka commented Dec 7, 2018

mdickinson commented Dec 7, 2018

mdickinson commented Dec 7, 2018

mdickinson commented Dec 7, 2018

stevendaprano commented Dec 7, 2018

stevendaprano commented Dec 7, 2018

stevendaprano commented Dec 7, 2018

KellerFuchs mannequin commented Dec 7, 2018

KellerFuchs mannequin commented Dec 7, 2018

KellerFuchs mannequin commented Dec 7, 2018

KellerFuchs mannequin commented Dec 7, 2018

stevendaprano commented Dec 7, 2018

rhettinger commented Dec 9, 2018

tim-one commented Dec 13, 2018

mdickinson commented Dec 13, 2018

tim-one commented Dec 14, 2018

FR4NKESTI3N mannequin commented Dec 31, 2018

mdickinson commented Dec 31, 2018

rhettinger commented Dec 31, 2018

rhettinger commented Jun 1, 2019

serhiy-storchaka commented Jun 1, 2019

mdickinson commented Jun 1, 2019

rhettinger commented Jun 1, 2019

rhettinger commented Jun 1, 2019

pablogsal commented Jun 1, 2019

mdickinson commented Jun 1, 2019

rhettinger commented Jun 1, 2019

mdickinson commented Jun 1, 2019

tim-one commented Jun 1, 2019

Radcliffe mannequin commented Jun 1, 2019

mdickinson commented Jun 2, 2019

mdickinson commented Jun 2, 2019

tim-one commented Jun 2, 2019

rhettinger commented Jun 3, 2019

tim-one commented Jun 3, 2019

Radcliffe mannequin commented Jun 3, 2019

tim-one commented Jun 3, 2019

stevendaprano commented Jun 3, 2019

mdickinson commented Jun 3, 2019

rhettinger commented Jun 3, 2019

tim-one commented Jun 3, 2019

rhettinger commented Jun 3, 2019

mdickinson commented Jun 3, 2019

rhettinger commented Jun 4, 2019

serhiy-storchaka commented Jun 17, 2019

serhiy-storchaka commented Jun 23, 2019