Add an efficient popcount method for integers #74068
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
assignee = None closed_at = <Date 2020-05-29.16:28:39.179> created_at = <Date 2017-03-22.17:30:42.549> labels = ['interpreter-core', 'type-feature', '3.10'] title = 'Add an efficient popcount method for integers' updated_at = <Date 2022-01-23.09:59:43.073> user = 'https://github.com/niklasf'
activity = <Date 2022-01-23.09:59:43.073> actor = 'mark.dickinson' assignee = 'none' closed = True closed_date = <Date 2020-05-29.16:28:39.179> closer = 'mark.dickinson' components = ['Interpreter Core'] creation = <Date 2017-03-22.17:30:42.549> creator = 'niklasf' dependencies =  files =  hgrepos =  issue_num = 29882 keywords =  message_count = 26.0 messages = ['290003', '290013', '290014', '290015', '290016', '290017', '290073', '344215', '344216', '344223', '344224', '344225', '369860', '369878', '369879', '369881', '369887', '369966', '370323', '370423', '370447', '370456', '370987', '372497', '411211', '411357'] nosy_count = 12.0 nosy_names = ['tim.peters', 'rhettinger', 'mark.dickinson', 'vstinner', 'casevh', 'njs', 'Mark.Shannon', 'serhiy.storchaka', 'veky', 'Jim Fasarakis-Hilliard', 'niklasf', 'gbtami'] pr_nums = ['771', '20518', '30774', '30794'] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue29882' versions = ['Python 3.10']
The text was updated successfully, but these errors were encountered:
An efficient popcount (something equivalent to bin(a).count("1")) would
gmpy calls the operation popcount and returns -1/None for negative values:
>>> import gmpy2 >>> gmpy2.popcount(-10) -1
From the documentation :
(I am not a fan of the arbitrary return value).
The bitarray module has a count(value=True) method:
>>> from bitarray import bitarray >>> bitarray(bin(123456789).strip("0b")).count() 16
Bitsets  exposes __len__.
There is an SSE4 POPCNT instruction. C compilers call the corresponding
Rust calls the operation count_ones . Ones are counted in the binary
Introducing popcount was previously considered here but closed for lack
Sensible names could be bit_count along the lines of the existing
$ ./python -m timeit "bin(123456789).count('1')" # equivalent 1000000 loops, best of 5: 286 nsec per loop $ ./python -m timeit "(123456789).bit_count()" # fallback 5000000 loops, best of 5: 46.3 nsec per loop
Can you give some examples of concrete use-cases? I've spent the last six years or so writing scientific applications and parsing all sorts of odd binary formats, and haven't needed or wanted a popcount yet.
Agreed: if this were implemented, I think raising ValueError would be the most appropriate thing to do for negative inputs.
Searching popcount in Python files on GitHub yields
Probably most important:
Btw. not a concrete application. I just stumbled upon this.
Many of those applications are really for bitstrings (chess bitboards, hamming distance), which aren't really the same thing as integers.
Nice find for the mathmodule.c case. I'd forgotten about that one (though according to git blame, apparently I'm responsible for checking it in). It's a fairly obscure corner case, though.
Overall, I'm -1 on adding this: I don't think it meets the bar of being useful enough to justify the extra method. I'd suggest that people needing this kind of efficient bitstring operation use a 3rd-party bitstring library instead.
I think that adding bitarray or bitset (or both) in the stdlib would better satisfy the needs. There are open issues for adding ability to read or set selected bits or range of bits in int or for bitwise operations on bytes. I think that bitarray and bitset would provide better interface for these operations.
As that says, there are a number of languages and processors with first class support for a popcount function. I've frequently implemented it in Python when using integers as integer bitsets (
Not entirely, but it's not terribly wrong and it's consistent with how
Adding a function to a new hypothetical imath module sounds reasonable.
I'm less comfortable with adding a new method to int type: it would mean that any int subtype "should" implement it.
For example, should numpy.int64 get this method as well?
What is the effect on https://docs.python.org/3.9/library/numbers.html?
Does it make sense to call (True).popcount()?
Python/hamt.c contains an optimized function:
static inline uint32_t
Python/pymath.c provides a "unsigned int _Py_bit_length(unsigned long d)" function used by math.factorial, _PyLong_NumBits(), int.__format__(), long / long, _PyLong_Frexp() and PyLong_AsDouble(), etc.
Maybe we could add a _Py_bit_count().
See also bpo-29782: "Use __builtin_clzl for bits_in_digit if available" which proposes to micro-optimize _Py_bit_length().
In the meanwhile, I also added pycore_byteswap.h *internal* header which provides static inline function which *do* use builtin functions like __builtin_bswap32().
That's for the NumPy folks to decide (and I've added Nathaniel Smith to the nosy in case he wants to comment), but I don't see any particularly strong reason that NumPy would need to add it. It looks as though the NumPy integer types have survived happily without a bit_length method, for example - I don't even see any issues in the NumPy tracker suggesting that anyone missed it. (Though perhaps that's because in the case of a NumPy int one always has at least an upper bound on the bit_length available.)
No effect, just as int.bit_length has no effect.
It would be spelled
Why are calling a population count method "bit_count()"?
I might reasonable expect that 0b1000.bit_count() be 4, not 1. It does have 4 bits.
I have no objection to adding this method, just the choice of name.
A couple of other data points:
@mark Shannon: what name would you suggest, and why? The term "population count" feels too non-obvious and specialist to me, and anything involving "Hamming" likewise.
"count_ones" isn't obviously a bit operation.