Use _PyBytesWriter for unicode escape and raw unicode escape encoders #69540

vstinner · 2015-10-09T12:13:43Z

BPO	25353
Nosy	@vstinner, @serhiy-storchaka
Files	unicode_escape.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2015-10-12.20:41:38.625>
created_at = <Date 2015-10-09.12:13:42.699>
labels = ['performance']
title = 'Use _PyBytesWriter for unicode escape and raw unicode escape encoders'
updated_at = <Date 2015-10-12.20:41:38.624>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2015-10-12.20:41:38.624>
actor = 'vstinner'
assignee = 'none'
closed = True
closed_date = <Date 2015-10-12.20:41:38.625>
closer = 'vstinner'
components = []
creation = <Date 2015-10-09.12:13:42.699>
creator = 'vstinner'
dependencies = []
files = ['40727']
hgrepos = []
issue_num = 25353
keywords = ['patch']
message_count = 4.0
messages = ['252599', '252600', '252601', '252888']
nosy_count = 3.0
nosy_names = ['vstinner', 'python-dev', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = None
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue25353'
versions = ['Python 3.6']

vstinner · 2015-10-09T12:13:42Z

Attached patch modifies unicode escape and raw unicode escape encoders to use the new _PyBytesWriter API.

The patch is optimized to encode Latin1 characters: encoding Latin1 characters when no character is escaped should not have to call _PyByte_Resize() at all.

When characters are escaped or a BMP or non-BMP string is encoded, overallocation is used to reduce the number of _PyByte_Resize(). It uses _PyBytesWriter overallocation strategy instead of always overallocate for the worst case.

_PyBytesWriter also embeds a small buffer allocated on the stack which also avoids calls to _PyBytes_Resize() when the output fits into 512 bytes.

vstinner · 2015-10-09T12:15:28Z

A few more encoders should be updated to use _PyBytesWriter API:

Code Page (Windows only)
Charmap
UTF-7
UTF-16
UTF-32

vstinner · 2015-10-09T12:16:37Z

The _PyBytesWriter API was added in the issue bpo-25318. See also the issue bpo-25349 which optimized bytes % args.

python-dev · 2015-10-12T20:39:29Z

New changeset 8e27f8398a4f by Victor Stinner in branch 'default':
Issue bpo-25353: Optimize unicode escape and raw unicode escape encoders to use
https://hg.python.org/cpython/rev/8e27f8398a4f

vstinner added the performance Performance or resource usage label Oct 9, 2015

vstinner closed this as completed Oct 12, 2015

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use _PyBytesWriter for unicode escape and raw unicode escape encoders #69540

Use _PyBytesWriter for unicode escape and raw unicode escape encoders #69540

vstinner commented Oct 9, 2015

vstinner commented Oct 9, 2015

vstinner commented Oct 9, 2015

vstinner commented Oct 9, 2015

python-dev mannequin commented Oct 12, 2015

Use _PyBytesWriter for unicode escape and raw unicode escape encoders #69540

Use _PyBytesWriter for unicode escape and raw unicode escape encoders #69540

Comments

vstinner commented Oct 9, 2015

vstinner commented Oct 9, 2015

vstinner commented Oct 9, 2015

vstinner commented Oct 9, 2015

python-dev mannequin commented Oct 12, 2015