Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

listsort.txt wrongly assumes you cannot calculate leading zeros in O(1) time. #90646

Closed
chemoelectric mannequin opened this issue Jan 23, 2022 · 6 comments
Closed

listsort.txt wrongly assumes you cannot calculate leading zeros in O(1) time. #90646

chemoelectric mannequin opened this issue Jan 23, 2022 · 6 comments
Assignees
Labels
3.11 only security fixes docs Documentation in the Doc dir performance Performance or resource usage

Comments

@chemoelectric
Copy link
Mannequin

chemoelectric mannequin commented Jan 23, 2022

BPO 46488
Nosy @tim-one, @rhettinger, @chemoelectric

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = 'https://github.com/tim-one'
closed_at = <Date 2022-01-24.06:48:38.693>
created_at = <Date 2022-01-23.17:08:12.696>
labels = ['3.11', 'docs', 'invalid', 'performance']
title = 'listsort.txt wrongly assumes you cannot calculate leading zeros in O(1) time.'
updated_at = <Date 2022-01-24.06:48:38.692>
user = 'https://github.com/chemoelectric'

bugs.python.org fields:

activity = <Date 2022-01-24.06:48:38.692>
actor = 'rhettinger'
assignee = 'tim.peters'
closed = True
closed_date = <Date 2022-01-24.06:48:38.693>
closer = 'rhettinger'
components = ['Documentation']
creation = <Date 2022-01-23.17:08:12.696>
creator = 'chemoelectric'
dependencies = []
files = []
hgrepos = []
issue_num = 46488
keywords = []
message_count = 6.0
messages = ['411384', '411425', '411426', '411428', '411429', '411444']
nosy_count = 4.0
nosy_names = ['tim.peters', 'rhettinger', 'docs@python', 'chemoelectric']
pr_nums = []
priority = 'normal'
resolution = 'not a bug'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue46488'
versions = ['Python 3.11']

@chemoelectric
Copy link
Mannequin Author

chemoelectric mannequin commented Jan 23, 2022

The Objects/listsort.txt incorrectly implies that it is not possible to compute leading zero bits in O(1) time, using only standard C. For a fixed integer size it can be done, for instance, using de Bruijn sequences. See https://www.chessprogramming.org/BitScan

(The existence of such methods is not as widely known as it ought to be.)

@chemoelectric chemoelectric mannequin added the 3.11 only security fixes label Jan 23, 2022
@chemoelectric chemoelectric mannequin added docs Documentation in the Doc dir 3.11 only security fixes performance Performance or resource usage labels Jan 23, 2022
@rhettinger
Copy link
Contributor

Presumably the OP is referring to this text:

"""
powerloop() emulates these divisions, 1 bit at a time, using comparisons,
subtractions, and shifts in a loop.

You'll notice the paper uses an O(1) method instead, but that relies on two
things we don't have:

  • An O(1) "count leading zeroes" primitive. We can find such a thing as a C
    extension on most platforms, but not all, and there's no uniform spelling
    on the platforms that support it.

  • Integer division on an integer type twice as wide as needed to hold the
    list length. But the latter is Py_ssize_t for us, and is typically the
    widest native signed integer type the platform supports.

But since runs in our algorithm are almost never very short, the once-per-run
overhead of powerloop() seems lost in the noise.

"""

@rhettinger rhettinger assigned tim-one and unassigned docspython Jan 23, 2022
@chemoelectric
Copy link
Mannequin Author

chemoelectric mannequin commented Jan 23, 2022

Yes. Actually the issue is branching, not order of complexity, because looping at most 64 times is a linear-bounded operation. The methods I point out involve no branching! And so can be rather fast. I don't suggest they be used, but that the listsort.txt be revised.

@tim-one
Copy link
Member

tim-one commented Jan 23, 2022

I'm not inclined to change anything here. It's a trivial point, and by "primitive" I had in mind a dedicated hardware instruction, blazing fast. Yes, I was aware of long-winded ways of doing it for specific fixed integer widths. But that's not what O(1) means. A dead obvious loop testing each bit, one at a time, starting with the MSB, until finding the first bit set, is also O(1) for any fixed-width int size.

So I'm not doing anything here. If someone else creates a PR with text they want to see instead, I'll review it, and if it's not unbearably pedantic ;-) I'll merge it.

@chemoelectric
Copy link
Mannequin Author

chemoelectric mannequin commented Jan 23, 2022

I meant constant bounded

@tim-one
Copy link
Member

tim-one commented Jan 24, 2022

For any fixed width integer type, the worst case of the dead simple loop (all bits are zero) is a fixed upper bound. So you don't mean "constant bounded" either. You mean something more like "clever C code that usually runs faster than the obvious loop".

See my "if it's not unbearably pedantic" comment earlier ;-) Again, by "primitive" I meant HW-level primitive. I agree that's not wholly clear from what I wrote, but really don't care - it's a trivial point that makes no difference in context. The lack of an integer type in C wide enough to support the division the paper uses is _the_ deal-breaker. That C doesn't define a count-leading-zero function either is just a flea on the tail of that dog.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes docs Documentation in the Doc dir performance Performance or resource usage
Projects
None yet
Development

No branches or pull requests

2 participants