Use PyUnicode_MAX_CHAR_VALUE instead of PyUnicode_KIND in some API's short path #73129

zhangyangyu · 2016-12-12T11:22:10Z

BPO	28943
Nosy	@serhiy-storchaka, @zhangyangyu
Files	short-path.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2016-12-12.12:44:42.679>
created_at = <Date 2016-12-12.11:22:10.428>
labels = ['interpreter-core', 'type-feature', '3.7']
title = "Use PyUnicode_MAX_CHAR_VALUE instead of PyUnicode_KIND in some API's short path"
updated_at = <Date 2016-12-12.12:45:04.968>
user = 'https://github.com/zhangyangyu'

bugs.python.org fields:

activity = <Date 2016-12-12.12:45:04.968>
actor = 'xiang.zhang'
assignee = 'none'
closed = True
closed_date = <Date 2016-12-12.12:44:42.679>
closer = 'xiang.zhang'
components = ['Interpreter Core']
creation = <Date 2016-12-12.11:22:10.428>
creator = 'xiang.zhang'
dependencies = []
files = ['45856']
hgrepos = []
issue_num = 28943
keywords = ['patch']
message_count = 3.0
messages = ['282982', '282983', '282990']
nosy_count = 2.0
nosy_names = ['serhiy.storchaka', 'xiang.zhang']
pr_nums = []
priority = 'normal'
resolution = 'rejected'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue28943'
versions = ['Python 3.7']

zhangyangyu · 2016-12-12T11:22:10Z

Some unicode APIs like PyUnicode_Contains get a short path comparing kinds. But this get a problem cannot apply to ascii and latin1. PyUnicode_MAX_CHAR_VALUE could be used instead to make the short path also apply to ascii and latin1. This skill is already used in PyUnicode_Replace.

serhiy-storchaka · 2016-12-12T11:37:39Z

PyUnicode_KIND() just extracts three bits from the state word. PyUnicode_MAX_CHAR_VALUE() extracts bits multiple times and does few conditional branching. I think it is much slower that PyUnicode_KIND(). In common case you search ASCII needle or the needle of the same kind as a string, therefore checking for fast path just adds the overhead. It is appropriate while the overhead is tiny.

Optimize common cases, not rare and obscure cases.

zhangyangyu · 2016-12-12T12:44:43Z

I know the difference and thought the overhead should be tiny (not in a critical part). But benchmarks show it's not. :-(

zhangyangyu added 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Dec 12, 2016

zhangyangyu closed this as completed Dec 12, 2016

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use PyUnicode_MAX_CHAR_VALUE instead of PyUnicode_KIND in some API's short path #73129

Use PyUnicode_MAX_CHAR_VALUE instead of PyUnicode_KIND in some API's short path #73129

zhangyangyu commented Dec 12, 2016

zhangyangyu commented Dec 12, 2016

serhiy-storchaka commented Dec 12, 2016

zhangyangyu commented Dec 12, 2016

Use PyUnicode_MAX_CHAR_VALUE instead of PyUnicode_KIND in some API's short path #73129

Use PyUnicode_MAX_CHAR_VALUE instead of PyUnicode_KIND in some API's short path #73129

Comments

zhangyangyu commented Dec 12, 2016

zhangyangyu commented Dec 12, 2016

serhiy-storchaka commented Dec 12, 2016

zhangyangyu commented Dec 12, 2016