Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use PyUnicode_MAX_CHAR_VALUE instead of PyUnicode_KIND in some API's short path #73129

Closed
zhangyangyu opened this issue Dec 12, 2016 · 3 comments
Labels
3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@zhangyangyu
Copy link
Member

BPO 28943
Nosy @serhiy-storchaka, @zhangyangyu
Files
  • short-path.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2016-12-12.12:44:42.679>
    created_at = <Date 2016-12-12.11:22:10.428>
    labels = ['interpreter-core', 'type-feature', '3.7']
    title = "Use PyUnicode_MAX_CHAR_VALUE instead of PyUnicode_KIND in some API's short path"
    updated_at = <Date 2016-12-12.12:45:04.968>
    user = 'https://github.com/zhangyangyu'

    bugs.python.org fields:

    activity = <Date 2016-12-12.12:45:04.968>
    actor = 'xiang.zhang'
    assignee = 'none'
    closed = True
    closed_date = <Date 2016-12-12.12:44:42.679>
    closer = 'xiang.zhang'
    components = ['Interpreter Core']
    creation = <Date 2016-12-12.11:22:10.428>
    creator = 'xiang.zhang'
    dependencies = []
    files = ['45856']
    hgrepos = []
    issue_num = 28943
    keywords = ['patch']
    message_count = 3.0
    messages = ['282982', '282983', '282990']
    nosy_count = 2.0
    nosy_names = ['serhiy.storchaka', 'xiang.zhang']
    pr_nums = []
    priority = 'normal'
    resolution = 'rejected'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue28943'
    versions = ['Python 3.7']

    @zhangyangyu
    Copy link
    Member Author

    Some unicode APIs like PyUnicode_Contains get a short path comparing kinds. But this get a problem cannot apply to ascii and latin1. PyUnicode_MAX_CHAR_VALUE could be used instead to make the short path also apply to ascii and latin1. This skill is already used in PyUnicode_Replace.

    @zhangyangyu zhangyangyu added 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Dec 12, 2016
    @serhiy-storchaka
    Copy link
    Member

    PyUnicode_KIND() just extracts three bits from the state word. PyUnicode_MAX_CHAR_VALUE() extracts bits multiple times and does few conditional branching. I think it is much slower that PyUnicode_KIND(). In common case you search ASCII needle or the needle of the same kind as a string, therefore checking for fast path just adds the overhead. It is appropriate while the overhead is tiny.

    Optimize common cases, not rare and obscure cases.

    @zhangyangyu
    Copy link
    Member Author

    I know the difference and thought the overhead should be tiny (not in a critical part). But benchmarks show it's not. :-(

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants