Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up ASCII decoding #34362

Closed
loewis mannequin opened this issue Apr 18, 2001 · 8 comments
Closed

Speed up ASCII decoding #34362

loewis mannequin opened this issue Apr 18, 2001 · 8 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@loewis
Copy link
Mannequin

loewis mannequin commented Apr 18, 2001

BPO 416953
Nosy @malemburg, @loewis
Files
  • unicode_ascii.patch
  • unicode_ascii.patch2: Alternative patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/malemburg'
    closed_at = <Date 2001-04-23.14:44:32.000>
    created_at = <Date 2001-04-18.05:37:32.000>
    labels = ['interpreter-core']
    title = 'Speed up ASCII decoding'
    updated_at = <Date 2001-04-23.14:44:32.000>
    user = 'https://github.com/loewis'

    bugs.python.org fields:

    activity = <Date 2001-04-23.14:44:32.000>
    actor = 'lemburg'
    assignee = 'lemburg'
    closed = True
    closed_date = None
    closer = None
    components = ['Interpreter Core']
    creation = <Date 2001-04-18.05:37:32.000>
    creator = 'loewis'
    dependencies = []
    files = ['3275', '3276']
    hgrepos = []
    issue_num = 416953
    keywords = ['patch']
    message_count = 8.0
    messages = ['36413', '36414', '36415', '36416', '36417', '36418', '36419', '36420']
    nosy_count = 2.0
    nosy_names = ['lemburg', 'loewis']
    pr_nums = []
    priority = 'normal'
    resolution = 'accepted'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue416953'
    versions = []

    @loewis
    Copy link
    Mannequin Author

    loewis mannequin commented Apr 18, 2001

    In code that supports both byte and unicode strings,
    mixing unicode strings with plain character constants
    is frequent. E.g. both sre_compile and xmlproc look for
    specific characters in an input string. Every usage of
    such a character requires default decoding, which will
    create a temporary Unicode object.

    This patch caches Unicode objects that represent ASCII
    characters. On the benchmark

    import time
    u = u""
    t=time.time()
    for i in xrange(1000000):
        u+"("
    print time.time()-t

    it shows a 10% speed-up.

    @loewis loewis mannequin closed this as completed Apr 18, 2001
    @loewis loewis mannequin assigned malemburg Apr 18, 2001
    @loewis loewis mannequin added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Apr 18, 2001
    @loewis loewis mannequin closed this as completed Apr 18, 2001
    @loewis loewis mannequin assigned malemburg Apr 18, 2001
    @loewis loewis mannequin added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Apr 18, 2001
    @loewis
    Copy link
    Mannequin Author

    loewis mannequin commented Apr 18, 2001

    Logged In: YES
    user_id=21627

    Attach patch.

    @malemburg
    Copy link
    Member

    Logged In: YES
    user_id=38388

    I knew this would come one day :-)

    The patch looks OK, but please also add proper init and
    finalize code so that unicode_ascii[] gets cleared up
    properly when the interpreter shuts down (this is important
    for uses of Python in e.g. mod_snake).

    @loewis
    Copy link
    Mannequin Author

    loewis mannequin commented Apr 18, 2001

    Logged In: YES
    user_id=21627

    Committed as 2.83 of unicodeobject.c, with the requested
    addition of init/fini code.

    @loewis
    Copy link
    Mannequin Author

    loewis mannequin commented Apr 21, 2001

    Logged In: YES
    user_id=21627

    Reopened, since the previous patch broke test_unicodedata.

    In this version, the cache is only consulted in DecodeASCII,
    since PyUnicode_FromUnicode must not share objects. It also
    has the requested init/fini code.

    @loewis
    Copy link
    Mannequin Author

    loewis mannequin commented Apr 21, 2001

    Logged In: YES
    user_id=21627

    I've added an alternative patch, which does return shared
    objects from PyUnicode_FromUnicode, and corrects the two
    places where the result of PyUnicode_FromUnicode did modify
    the resulting object.

    @malemburg
    Copy link
    Member

    Logged In: YES
    user_id=38388

    Thanks for the update. Digging a little deeper into the
    possibilities of sharing Unicode objects I found that there
    are some important issues to be taken into consideration
    which require a little more work on the sharing code.

    I will work on this during the week and get back to you next
    week.

    @malemburg
    Copy link
    Member

    Logged In: YES
    user_id=38388

    Checked in a modified patch.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant