array constructor and array.fromstring should accept bytearray. #53236

tjol · 2010-06-13T19:42:53Z

BPO	8990
Nosy	@terryjreedy, @abalkin, @pitrou, @vstinner, @tjol
Files	array.diff: array.fromstring patch array2.diff: patch including test array_3.2_fromstring.diff: updated version of the patch tofrombytes.diff: patch introducing to/frombytes tostring_usage.diff: remove to/fromstring usage in stdlib

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2010-09-01.20:30:46.820>
created_at = <Date 2010-06-13.19:42:52.931>
labels = ['extension-modules', 'type-bug', 'library']
title = 'array constructor and array.fromstring should accept bytearray.'
updated_at = <Date 2010-09-01.20:30:46.818>
user = 'https://github.com/tjol'

bugs.python.org fields:

activity = <Date 2010-09-01.20:30:46.818>
actor = 'pitrou'
assignee = 'none'
closed = True
closed_date = <Date 2010-09-01.20:30:46.820>
closer = 'pitrou'
components = ['Extension Modules', 'Library (Lib)']
creation = <Date 2010-06-13.19:42:52.931>
creator = 'tjollans'
dependencies = []
files = ['17658', '17749', '17826', '18697', '18698']
hgrepos = []
issue_num = 8990
keywords = ['patch']
message_count = 18.0
messages = ['107744', '108159', '108161', '108163', '108164', '108186', '108226', '108312', '108419', '109028', '109049', '109051', '111872', '114675', '114684', '115271', '115274', '115331']
nosy_count = 5.0
nosy_names = ['terry.reedy', 'belopolsky', 'pitrou', 'vstinner', 'tjollans']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue8990'
versions = ['Python 3.2']

tjol · 2010-06-13T19:42:52Z

Currently, the array constructor, if given a bytearray, detects this with PyByteArray_Check, and hands it on to array_fromstring, which does not support bytearray (by using "s#" with PyArg_ParseTuple) and raises TypeError.

>>> array('h', bytearray(b'xyxyxyxyxyxyxyxyxy'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: must be bytes or read-only buffer, not bytearray
>>>

I see no reason to insist on read-only buffers. I'm attaching a patch that I think fixes this.

terryjreedy · 2010-06-19T01:05:24Z

Summary: real bug, needs test, patch provided, looks plausible but I am not qualified to even read-review it, let alone apply and test.

(I am removing 3.3 because that is only listed for syntax feature requests that cannot go into 3.2 because of its moratorium on such.)

I agree that this is a bug. For 3.1, the '8.6. array' doc says
"class array.array(typecode[, initializer])
A new array whose items are restricted by typecode, and initialized from the optional initializer value, which must be a list, object supporting the buffer interface, or iterable over elements of the appropriate type."

I believe bytearray 'supports the buffer interface', whatever that means in 3.x. 'Readonly' is definitely *not* specified. Even if bytearray did not qualify on that, instances are *definitely* iterable.

The doc goes on: "If given a list or string, the initializer is passed to the new array’s fromlist(), fromstring(), or fromunicode() method (see below) to add initial items to the array. Otherwise, the iterable initializer is passed to the extend() method."

For 3.x, this is pretty bad as it is copied without change from 2.x and uses 'string' in the 2.x sense. It seems to me that fromstring/fromunicode should have been renamed to frombytes/fromstring with a 2to3 fixer. Too late now, I suppose. However, if indeed 'fromstring' is intended to mean 'from bytes only and not bytearrays', then bytearrays should be passed on to .extend, which accepts them as well as anything else.

>>> from array import array
>>> a=array('h')
>>> a.extend(bytearray(b'xjxjx'))
>>> a
array('h', [120, 106, 120, 106, 120])

The two possible fixes:

Extend .fromstring to accept bytearrays. Its current doc "Appends items from the string, interpreting the string as an array of machine values " to me supports doing that. This is what Thomas's patch does.
If that is rejected, then bytearray initializers should be passed to .extend.

Thomas: a complete patch needs to include a update to the array unit test suite. This can then move to the patch review stage.

abalkin · 2010-06-19T02:17:25Z

Thomas' patch does more than just allow bytearray. It allows any object that can present itself as a buffer with byte-size items. It is a bit unfortunate that such method will end up being called "fromstring" in 3.x. With string being unicode, this is hopelessly ambiguous. Is the string interpreted as array of bytes, UTF-8 style, 16 or 32-bit integers? It would be much better to have frombuffer, and even better extendfrombuffer method with this functionality.

I am +1 on Terry's option 2. It may be possible to optimize .extend to detect objects that support buffer interface and bypass iterator protocol. Of course, when receiving array is not type 'b', extend and fromxyz are not the same, so maybe we should just rename fromunicode to fromstring and fromstring to frombytes?

terryjreedy · 2010-06-19T03:19:03Z

Whatever is done, I think a bytearray should be handled the same as bytes. It must be that they give the same result. In basic operations, I believe that bytearrays can *always* substitute for bytes. "Bytes and bytearray objects contain single bytes – the former is immutable while the latter is a mutable sequence." For example:
>>> b'abc'.capitalize()
b'Abc'
>>> bytearray(b'abc').capitalize()
bytearray(b'Abc')

This, to me, implies that .fromstring should accept bytearray (though probably not general buffer objects). In 2.x, I understand .fromstring to initialize an array from machine bytes read into a string, but not .fromunicode. This is *not* a text method, and the result may vary for 2 and 4-byte unicode builds. So I can see that 3.x needs .fromstring(bytes) but not .fromunicode(string).

abalkin · 2010-06-19T03:34:13Z

This, to me, implies that .fromstring should accept bytearray (though
probably not general buffer objects). In 2.x, I understand .fromstring
to initialize an array from machine bytes read into a string, but not
.fromunicode. This is *not* a text method, and the result may vary for 2
and 4-byte unicode builds. So I can see that 3.x needs
.fromstring(bytes) but not .fromunicode(string).

No, fromunicode is still needed. It is used to fill type 'u' array with unicode wide characters. In 3.x unicode type was renamed to str and str to bytes, so fromunicode should become fromstring (or maybe fromstr to avoid confusion) and fromstring should become frombytes.

pitrou · 2010-06-19T15:24:09Z

On the principle the patch looks good. In practice, it lacks a call to PyBuffer_Release(&buffer) in the various error cases (before returning NULL). It also lacks some tests in Lib/test/test_array.py.

In 3.x unicode type was renamed to str and str to bytes, so fromunicode
should become fromstring (or maybe fromstr to avoid confusion) and
fromstring should become frombytes.

This should be discussed separately, perhaps on python-dev first.

vstinner · 2010-06-20T08:53:27Z

About the patch:

Why not reading itemsize from buffer.itemsize?
A test would be nice in Lib/test/test_array.py (in test_tofromstring?). Please add also a.extend(bytearray(b'xjxjx')) to the test suite.

I agree that it is a bugfix and so i can be applied to 3.1 (and 3.2).

--

fromstring should become frombytes.

I remember an issue proposing that. I also prefer .frombytes() name: we can add .frombytes() and keep .fromstring() as an alias to .frombytes() (and maybe mark it as deprecated).

A message:
http://bugs.python.org/issue3565#msg71205

An issue starting at:
http://bugs.python.org/issue1023290#msg92069

Can someone open a issue about .frombytes()?

fromunicode is still needed.

Can someone open a new issue about .fromunicodes()?

tjol · 2010-06-21T20:09:49Z

Thanks for the input. I'm going to re-work the patch a bit (releasing buffers and such) and add a test within the next few days.

The question remains whether or not to accept other buffers with itemsize == 1. The way I understand it, fromstring already accepted any read-only buffer object, no matter the item size / whether it actually makes sense to call it a "string". I don't think accepting a hypothetical read-only buffer with items wider than 1 in fromstring (yes, bad naming) is desirable behaviour - I see a few options on how to deal with input validation:

ignore the item size. This'd be similar to current behaviour, plus r/w buffers
only accept byte-based buffers. ("things that look like 'const char*'") - this is what I've been aiming at.
only accept bytes and bytearray, and let the user think about how to deal with other objects. Question is - shouldn't array('B') be treated like bytearray in this respect?

tjol · 2010-06-22T21:11:32Z

OK, here's the new patch. I added tests for array(typecode, bytearray(b'abab')), a.extend(b'123') and a.extend(bytearray(b'123')).

@victor: int itemsize is the array's item size, buffer.itemsize is the strings' (and must be 1)

PROBLEM with this patch:

I changed "s#" to "y*". This means that str arguments are no longer accepted by fromstring. I don't think they ever should have been in 3.x, but it is an incompatible change and this got the test suite, which (I assume the code hasn't changed since 2.x) used a str argument. (changed in patch). It might be best to use "s*" instead of "y*", especially if this is applied to 3.1?

vstinner · 2010-07-01T02:30:16Z

array2.diff:

can you reindent the line "Py_ssize_t old_size = Py_SIZE(self);"?(even if it's not part of your patch)
you can avoid the "char *str;" variable (use directly buffer.buf)

--

I changed "s#" to "y*". This means that str arguments are no longer
accepted by fromstring (...) it is an incompatible change

It's maybe time to create .frombytes() and .tobytes() methods:

.tostring() will be a (deprecated?) alias to .tobytes()
.frombytes() only accepts bytes, bytearray and buffer compatible objects: use "y*" format
.fromstring() accepts str, bytes, bytearray and buffer compatible objects (encode str to utf-8): use "s*" format

But I still don't understand why array.fromstring() accepts character strings. So an easier solution is to apply array2.diff to Python 3.2, and replace "y*" by "s*" when the patch is applied to 3.1.

tjol · 2010-07-01T14:00:51Z

Two more patches:

Firstly, this patch (array_3.2_fromstring.diff) is nearly identical to array2.diff. "y*" would (again) have to be changed to "s*" to apply this to 3.1

tjol · 2010-07-01T14:09:15Z

Secondly, this is my attempt to add the more sensibly named {to|from}bytes methods, and to deprecate {to|from}string.

I doubt it's perfect, maybe there's some policy on deprecating methods that I didn't find? This may be better discussed in a separate forum, eg a separate issue or python-dev maybe? I wouldn't know.

The unpatched test suite passes with the rest of this patch applied. (unless using -Werror)

vstinner · 2010-07-28T23:25:32Z

I prefer the second solution (add to/frombytes, deprecate to/fromstring) because I prefer the new method names and it keeps backward compatibility (until we choose to remove the old methods, which should be in Python 3.3).

About the patch (tofrombytes.diff). You should use PyErr_WarnEx() result: if it is not nul (eg. if the user choosed to raise exceptions on warnings), you have to exit.

tjol · 2010-08-22T12:35:41Z

Hello again, sorry for the absense.

Victor, thanks for the input. I've attached a new patch that checks the PyErr_WarnEx return value.

pitrou · 2010-08-22T17:59:31Z

From a quick glance, the latest patch looks ok.

pitrou · 2010-08-31T18:10:26Z

The patch shouldn't remove the tests for tostring()/fromstring() (they might be deprecated, but they are still supported).

tjol · 2010-08-31T19:35:13Z

That sounds reasonable. I've updated the patch to keep the old test_tofromstring testcase.

I'll also attach another patch in a moment that removes what I'm reasonably sure is all the uses of array.tostring and .fromstring in the standard library and the other modules' tests.

pitrou · 2010-09-01T20:30:47Z

The patches were committed in r84403. Thank you very much!

tjol mannequin added extension-modules C modules in the Modules dir stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Jun 13, 2010

pitrou closed this as completed Sep 1, 2010

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

array constructor and array.fromstring should accept bytearray. #53236

array constructor and array.fromstring should accept bytearray. #53236

tjol mannequin commented Jun 13, 2010

tjol mannequin commented Jun 13, 2010

terryjreedy commented Jun 19, 2010

abalkin commented Jun 19, 2010

terryjreedy commented Jun 19, 2010

abalkin commented Jun 19, 2010

pitrou commented Jun 19, 2010

vstinner commented Jun 20, 2010

tjol mannequin commented Jun 21, 2010

tjol mannequin commented Jun 22, 2010

vstinner commented Jul 1, 2010

tjol mannequin commented Jul 1, 2010

tjol mannequin commented Jul 1, 2010

vstinner commented Jul 28, 2010

tjol mannequin commented Aug 22, 2010

pitrou commented Aug 22, 2010

pitrou commented Aug 31, 2010

tjol mannequin commented Aug 31, 2010

pitrou commented Sep 1, 2010

array constructor and array.fromstring should accept bytearray. #53236

array constructor and array.fromstring should accept bytearray. #53236

Comments

tjol mannequin commented Jun 13, 2010

tjol mannequin commented Jun 13, 2010

terryjreedy commented Jun 19, 2010

abalkin commented Jun 19, 2010

terryjreedy commented Jun 19, 2010

abalkin commented Jun 19, 2010

pitrou commented Jun 19, 2010

vstinner commented Jun 20, 2010

tjol mannequin commented Jun 21, 2010

tjol mannequin commented Jun 22, 2010

vstinner commented Jul 1, 2010

tjol mannequin commented Jul 1, 2010

tjol mannequin commented Jul 1, 2010

vstinner commented Jul 28, 2010

tjol mannequin commented Aug 22, 2010

pitrou commented Aug 22, 2010

pitrou commented Aug 31, 2010

tjol mannequin commented Aug 31, 2010

pitrou commented Sep 1, 2010