Speed up ASCII decoding #34362

loewis · 2001-04-18T05:37:32Z

BPO	416953
Nosy	@malemburg, @loewis
Files	unicode_ascii.patch unicode_ascii.patch2: Alternative patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/malemburg'
closed_at = <Date 2001-04-23.14:44:32.000>
created_at = <Date 2001-04-18.05:37:32.000>
labels = ['interpreter-core']
title = 'Speed up ASCII decoding'
updated_at = <Date 2001-04-23.14:44:32.000>
user = 'https://github.com/loewis'

bugs.python.org fields:

activity = <Date 2001-04-23.14:44:32.000>
actor = 'lemburg'
assignee = 'lemburg'
closed = True
closed_date = None
closer = None
components = ['Interpreter Core']
creation = <Date 2001-04-18.05:37:32.000>
creator = 'loewis'
dependencies = []
files = ['3275', '3276']
hgrepos = []
issue_num = 416953
keywords = ['patch']
message_count = 8.0
messages = ['36413', '36414', '36415', '36416', '36417', '36418', '36419', '36420']
nosy_count = 2.0
nosy_names = ['lemburg', 'loewis']
pr_nums = []
priority = 'normal'
resolution = 'accepted'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue416953'
versions = []

loewis · 2001-04-18T05:37:32Z

In code that supports both byte and unicode strings,
mixing unicode strings with plain character constants
is frequent. E.g. both sre_compile and xmlproc look for
specific characters in an input string. Every usage of
such a character requires default decoding, which will
create a temporary Unicode object.

This patch caches Unicode objects that represent ASCII
characters. On the benchmark

import time
u = u""
t=time.time()
for i in xrange(1000000):
    u+"("
print time.time()-t

it shows a 10% speed-up.

loewis · 2001-04-18T06:04:31Z

Logged In: YES
user_id=21627

Attach patch.

malemburg · 2001-04-18T08:54:35Z

Logged In: YES
user_id=38388

I knew this would come one day :-)

The patch looks OK, but please also add proper init and
finalize code so that unicode_ascii[] gets cleared up
properly when the interpreter shuts down (this is important
for uses of Python in e.g. mod_snake).

loewis · 2001-04-18T12:51:03Z

Logged In: YES
user_id=21627

Committed as 2.83 of unicodeobject.c, with the requested
addition of init/fini code.

loewis · 2001-04-21T12:15:56Z

Logged In: YES
user_id=21627

Reopened, since the previous patch broke test_unicodedata.

In this version, the cache is only consulted in DecodeASCII,
since PyUnicode_FromUnicode must not share objects. It also
has the requested init/fini code.

loewis · 2001-04-21T14:29:43Z

Logged In: YES
user_id=21627

I've added an alternative patch, which does return shared
objects from PyUnicode_FromUnicode, and corrects the two
places where the result of PyUnicode_FromUnicode did modify
the resulting object.

malemburg · 2001-04-23T11:26:50Z

Logged In: YES
user_id=38388

Thanks for the update. Digging a little deeper into the
possibilities of sharing Unicode objects I found that there
are some important issues to be taken into consideration
which require a little more work on the sharing code.

I will work on this during the week and get back to you next
week.

malemburg · 2001-04-23T14:44:32Z

Logged In: YES
user_id=38388

Checked in a modified patch.

loewis mannequin closed this as completed Apr 18, 2001

loewis mannequin assigned malemburg Apr 18, 2001

loewis mannequin added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Apr 18, 2001

loewis mannequin closed this as completed Apr 18, 2001

loewis mannequin assigned malemburg Apr 18, 2001

loewis mannequin added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Apr 18, 2001

ezio-melotti transferred this issue from another repository Apr 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up ASCII decoding #34362

Speed up ASCII decoding #34362

loewis mannequin commented Apr 18, 2001

loewis mannequin commented Apr 18, 2001

loewis mannequin commented Apr 18, 2001

malemburg commented Apr 18, 2001

loewis mannequin commented Apr 18, 2001

loewis mannequin commented Apr 21, 2001

loewis mannequin commented Apr 21, 2001

malemburg commented Apr 23, 2001

malemburg commented Apr 23, 2001

Speed up ASCII decoding #34362

Speed up ASCII decoding #34362

Comments

loewis mannequin commented Apr 18, 2001

loewis mannequin commented Apr 18, 2001

loewis mannequin commented Apr 18, 2001

malemburg commented Apr 18, 2001

loewis mannequin commented Apr 18, 2001

loewis mannequin commented Apr 21, 2001

loewis mannequin commented Apr 21, 2001

malemburg commented Apr 23, 2001

malemburg commented Apr 23, 2001