Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate entry in 'Objects/unicodetype_db.h' #91399

Closed
LiarPrincess mannequin opened this issue Apr 6, 2022 · 3 comments
Closed

Duplicate entry in 'Objects/unicodetype_db.h' #91399

LiarPrincess mannequin opened this issue Apr 6, 2022 · 3 comments
Labels
3.11 only security fixes topic-unicode type-feature A feature request or enhancement

Comments

@LiarPrincess
Copy link
Mannequin

LiarPrincess mannequin commented Apr 6, 2022

BPO 47243
Nosy @vstinner, @ezio-melotti, @LiarPrincess
PRs
  • bpo-47243: Duplicate entry in 'Objects/unicodetype_db.h' #32376
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2022-04-06.17:23:31.334>
    labels = ['type-feature', 'expert-unicode', '3.11']
    title = "Duplicate entry in 'Objects/unicodetype_db.h'"
    updated_at = <Date 2022-04-06.17:48:43.752>
    user = 'https://github.com/LiarPrincess'

    bugs.python.org fields:

    activity = <Date 2022-04-06.17:48:43.752>
    actor = 'LiarPrincess'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Unicode']
    creation = <Date 2022-04-06.17:23:31.334>
    creator = 'LiarPrincess'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 47243
    keywords = ['patch']
    message_count = 2.0
    messages = ['416889', '416892']
    nosy_count = 3.0
    nosy_names = ['vstinner', 'ezio.melotti', 'LiarPrincess']
    pr_nums = ['32376']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue47243'
    versions = ['Python 3.11']

    @LiarPrincess
    Copy link
    Mannequin Author

    LiarPrincess mannequin commented Apr 6, 2022

    This one is so tiny that I'm not really sure we want to merge it…

    === Problem ===

    [Objects/unicodetype_db.h](https://github.com/python/cpython/blob/main/Objects/unicodetype_db.h) starts in a following way:

    /* a list of unique character type descriptors */
    const _PyUnicode_TypeRecord _PyUnicode_TypeRecords[] = {
        {0, 0, 0, 0, 0, 0},
        {0, 0, 0, 0, 0, 0},
        {0, 0, 0, 0, 0, 32},
        {0, 0, 0, 0, 0, 48},
        …

    The 1st record ({0, 0, 0, 0, 0, 0}) is duplicated.
    This is not a problem, since the 1st occurrence is never used, but if we wanted to remove it then this is the ticket about it.

    === Detailed description ===

    [Objects/unicodetype_db.h](https://github.com/python/cpython/blob/main/Objects/unicodetype_db.h) is generated by [Tools/unicode/makeunicodedata.py](https://github.com/python/cpython/blob/main/Tools/unicode/makeunicodedata.py) (I removed irrelevant lines):

    def makeunicodetype(unicode, trace):
        dummy = (0, 0, 0, 0, 0, 0)
        table = [dummy] # (1)
        cache = {0: dummy} # (2)
    
        for char in unicode.chars:
            # Things…
    
            item = (upper, lower, title, decimal, digit, flags)
    
            i = cache.get(item) # (3)
            if i is None:
                cache[item] = i = len(table)
                table.append(item)
    
            index[char] = i
    • (1) - list which contains unique character properties (as (upper, lower, title, decimal, digit, flags) tuples)
    • (2) - mapping from character properties to index in table - improperly initialized as a mapping from index to character properties
    • (3) - we check if the current tuple is in cache

    === Result ===

    The first time we get to a character that has (0, 0, 0, 0, 0, 0) properties (which is code point 0 - NULL) we check if it is in cache. It it not (there is an entry that goes from index 0 to (0, 0, 0, 0, 0, 0) - the other way around), so we add this entry to table and cache.

    === Fix ===

    In the line (2) we should have: cache = {dummy: 0}. Obviously after doing so we have to run makeunicodedata.py - this is why this simple change modifies a lot of lines.

    I will submit PR on github in just a sec…

    @LiarPrincess LiarPrincess mannequin added topic-unicode type-feature A feature request or enhancement labels Apr 6, 2022
    @LiarPrincess
    Copy link
    Mannequin Author

    LiarPrincess mannequin commented Apr 6, 2022

    CLA is signed, but there is this 'it might take a few days before your tracker profile is updated'.

    Added version 3.11 (present also in previous versions, bot no point in back-porting it).

    Github: #32376

    @LiarPrincess LiarPrincess mannequin added the 3.11 only security fixes label Apr 6, 2022
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @hauntsaninja
    Copy link
    Contributor

    Thanks, looks like this has been completed

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes topic-unicode type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant