unicodedata_UCD_lookup() has theoretical buffer overflow #68185

tiran · 2015-04-18T22:32:38Z

BPO	23997
Nosy	@malemburg, @pitrou, @vstinner, @tiran, @benjaminp, @ezio-melotti, @serhiy-storchaka
Files	unicode_name_maxlen.patch unicode_name_maxlen_trunc.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2015-04-18.22:32:38.141>
labels = ['extension-modules', 'type-bug']
title = 'unicodedata_UCD_lookup() has theoretical buffer overflow'
updated_at = <Date 2015-12-19.23:12:32.273>
user = 'https://github.com/tiran'

bugs.python.org fields:

activity = <Date 2015-12-19.23:12:32.273>
actor = 'serhiy.storchaka'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Extension Modules']
creation = <Date 2015-04-18.22:32:38.141>
creator = 'christian.heimes'
dependencies = []
files = ['39109', '41365']
hgrepos = []
issue_num = 23997
keywords = ['patch']
message_count = 2.0
messages = ['241461', '256744']
nosy_count = 7.0
nosy_names = ['lemburg', 'pitrou', 'vstinner', 'christian.heimes', 'benjamin.peterson', 'ezio.melotti', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue23997'
versions = ['Python 2.7', 'Python 3.5', 'Python 3.6']

tiran · 2015-04-18T22:32:38Z

Coverity has found a potential buffer overflow in the unicodedata module. The function call _getcode() which calls _cmpname(). _cmpname() copies data into fixed size buffer of length NAME_MAXLEN. Neither lookup() nor _getcode() limit name_length to NAME_MAXLEN. Therefore the buffer could theoretical overflow.

In practice the buffer overflow can't be abused because Tools/unicode/makeunicodedata.py already limits max name length. I still like to fix the bug because it is a low hanging fruit. In most versions of Python the code already checks that name_length fits in INT_MAX.

CID 1295028 (#1 of 1): Out-of-bounds access (OVERRUN)
overrun-call: Overrunning callee's array of size 256 by passing argument (int)name_length (which evaluates to 2147483647) in call to _getcode

serhiy-storchaka · 2015-12-19T23:12:32Z

For now the error message virtually always contains the name (unless the length of its UTF-8 representation > INT_MAX). With unicode_name_maxlen.patch it doesn't contains the name of length few hundreds or tens characters.

Proposed patch makes the error message always contain the name, but truncated to NAME_MAXLEN bytes.

>>> name = ''.join(map(chr, range(0x2c80, 0x2ce4)))
>>> unicodedata.lookup(name)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: "undefined character name 'ⲀⲁⲂⲃⲄⲅⲆⲇⲈⲉⲊⲋⲌⲍⲎⲏⲐⲑⲒⲓⲔⲕⲖⲗⲘⲙⲚⲛⲜⲝⲞⲟⲠⲡⲢⲣⲤⲥⲦⲧⲨⲩⲪⲫⲬⲭⲮⲯⲰⲱⲲⲳⲴⲵⲶⲷⲸⲹⲺⲻⲼⲽⲾⲿⳀⳁⳂⳃⳄⳅⳆⳇⳈⳉⳊⳋⳌⳍⳎⳏⳐⳑⳒⳓⳔ�...'"

tiran added the type-bug An unexpected behavior, bug, or error label Apr 18, 2015

serhiy-storchaka added the extension-modules C modules in the Modules dir label Dec 19, 2015

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unicodedata_UCD_lookup() has theoretical buffer overflow #68185

unicodedata_UCD_lookup() has theoretical buffer overflow #68185

tiran commented Apr 18, 2015

tiran commented Apr 18, 2015

serhiy-storchaka commented Dec 19, 2015

unicodedata_UCD_lookup() has theoretical buffer overflow #68185

unicodedata_UCD_lookup() has theoretical buffer overflow #68185

Comments

tiran commented Apr 18, 2015

tiran commented Apr 18, 2015

serhiy-storchaka commented Dec 19, 2015