Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

array constructor and array.fromstring should accept bytearray. #53236

Closed
tjol mannequin opened this issue Jun 13, 2010 · 18 comments
Closed

array constructor and array.fromstring should accept bytearray. #53236

tjol mannequin opened this issue Jun 13, 2010 · 18 comments
Labels
extension-modules C modules in the Modules dir stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@tjol
Copy link
Mannequin

tjol mannequin commented Jun 13, 2010

BPO 8990
Nosy @terryjreedy, @abalkin, @pitrou, @vstinner, @tjol
Files
  • array.diff: array.fromstring patch
  • array2.diff: patch including test
  • array_3.2_fromstring.diff: updated version of the patch
  • tofrombytes.diff: patch introducing to/frombytes
  • tostring_usage.diff: remove to/fromstring usage in stdlib
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2010-09-01.20:30:46.820>
    created_at = <Date 2010-06-13.19:42:52.931>
    labels = ['extension-modules', 'type-bug', 'library']
    title = 'array constructor and array.fromstring should accept bytearray.'
    updated_at = <Date 2010-09-01.20:30:46.818>
    user = 'https://github.com/tjol'

    bugs.python.org fields:

    activity = <Date 2010-09-01.20:30:46.818>
    actor = 'pitrou'
    assignee = 'none'
    closed = True
    closed_date = <Date 2010-09-01.20:30:46.820>
    closer = 'pitrou'
    components = ['Extension Modules', 'Library (Lib)']
    creation = <Date 2010-06-13.19:42:52.931>
    creator = 'tjollans'
    dependencies = []
    files = ['17658', '17749', '17826', '18697', '18698']
    hgrepos = []
    issue_num = 8990
    keywords = ['patch']
    message_count = 18.0
    messages = ['107744', '108159', '108161', '108163', '108164', '108186', '108226', '108312', '108419', '109028', '109049', '109051', '111872', '114675', '114684', '115271', '115274', '115331']
    nosy_count = 5.0
    nosy_names = ['terry.reedy', 'belopolsky', 'pitrou', 'vstinner', 'tjollans']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue8990'
    versions = ['Python 3.2']

    @tjol
    Copy link
    Mannequin Author

    tjol mannequin commented Jun 13, 2010

    Currently, the array constructor, if given a bytearray, detects this with PyByteArray_Check, and hands it on to array_fromstring, which does not support bytearray (by using "s#" with PyArg_ParseTuple) and raises TypeError.

    >>> array('h', bytearray(b'xyxyxyxyxyxyxyxyxy'))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: must be bytes or read-only buffer, not bytearray
    >>>  

    I see no reason to insist on read-only buffers. I'm attaching a patch that I think fixes this.

    @tjol tjol mannequin added extension-modules C modules in the Modules dir stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Jun 13, 2010
    @terryjreedy
    Copy link
    Member

    terryjreedy commented Jun 19, 2010

    Summary: real bug, needs test, patch provided, looks plausible but I am not qualified to even read-review it, let alone apply and test.

    (I am removing 3.3 because that is only listed for syntax feature requests that cannot go into 3.2 because of its moratorium on such.)

    I agree that this is a bug. For 3.1, the '8.6. array' doc says
    "class array.array(typecode[, initializer])
    A new array whose items are restricted by typecode, and initialized from the optional initializer value, which must be a list, object supporting the buffer interface, or iterable over elements of the appropriate type."

    I believe bytearray 'supports the buffer interface', whatever that means in 3.x. 'Readonly' is definitely *not* specified. Even if bytearray did not qualify on that, instances are *definitely* iterable.

    The doc goes on: "If given a list or string, the initializer is passed to the new array’s fromlist(), fromstring(), or fromunicode() method (see below) to add initial items to the array. Otherwise, the iterable initializer is passed to the extend() method."

    For 3.x, this is pretty bad as it is copied without change from 2.x and uses 'string' in the 2.x sense. It seems to me that fromstring/fromunicode should have been renamed to frombytes/fromstring with a 2to3 fixer. Too late now, I suppose. However, if indeed 'fromstring' is intended to mean 'from bytes only and not bytearrays', then bytearrays should be passed on to .extend, which accepts them as well as anything else.

    >>> from array import array
    >>> a=array('h')
    >>> a.extend(bytearray(b'xjxjx'))
    >>> a
    array('h', [120, 106, 120, 106, 120])

    The two possible fixes:

    1. Extend .fromstring to accept bytearrays. Its current doc "Appends items from the string, interpreting the string as an array of machine values " to me supports doing that. This is what Thomas's patch does.

    2. If that is rejected, then bytearray initializers should be passed to .extend.

    Thomas: a complete patch needs to include a update to the array unit test suite. This can then move to the patch review stage.

    @abalkin
    Copy link
    Member

    abalkin commented Jun 19, 2010

    Thomas' patch does more than just allow bytearray. It allows any object that can present itself as a buffer with byte-size items. It is a bit unfortunate that such method will end up being called "fromstring" in 3.x. With string being unicode, this is hopelessly ambiguous. Is the string interpreted as array of bytes, UTF-8 style, 16 or 32-bit integers? It would be much better to have frombuffer, and even better extendfrombuffer method with this functionality.

    I am +1 on Terry's option 2. It may be possible to optimize .extend to detect objects that support buffer interface and bypass iterator protocol. Of course, when receiving array is not type 'b', extend and fromxyz are not the same, so maybe we should just rename fromunicode to fromstring and fromstring to frombytes?

    @terryjreedy
    Copy link
    Member

    terryjreedy commented Jun 19, 2010

    Whatever is done, I think a bytearray should be handled the same as bytes. It must be that they give the same result. In basic operations, I believe that bytearrays can *always* substitute for bytes. "Bytes and bytearray objects contain single bytes – the former is immutable while the latter is a mutable sequence." For example:
    >>> b'abc'.capitalize()
    b'Abc'
    >>> bytearray(b'abc').capitalize()
    bytearray(b'Abc')

    This, to me, implies that .fromstring should accept bytearray (though probably not general buffer objects). In 2.x, I understand .fromstring to initialize an array from machine bytes read into a string, but not .fromunicode. This is *not* a text method, and the result may vary for 2 and 4-byte unicode builds. So I can see that 3.x needs .fromstring(bytes) but not .fromunicode(string).

    @abalkin
    Copy link
    Member

    abalkin commented Jun 19, 2010

    This, to me, implies that .fromstring should accept bytearray (though
    probably not general buffer objects). In 2.x, I understand .fromstring
    to initialize an array from machine bytes read into a string, but not
    .fromunicode. This is *not* a text method, and the result may vary for 2
    and 4-byte unicode builds. So I can see that 3.x needs
    .fromstring(bytes) but not .fromunicode(string).

    No, fromunicode is still needed. It is used to fill type 'u' array with unicode wide characters. In 3.x unicode type was renamed to str and str to bytes, so fromunicode should become fromstring (or maybe fromstr to avoid confusion) and fromstring should become frombytes.

    @pitrou
    Copy link
    Member

    pitrou commented Jun 19, 2010

    On the principle the patch looks good. In practice, it lacks a call to PyBuffer_Release(&buffer) in the various error cases (before returning NULL). It also lacks some tests in Lib/test/test_array.py.

    In 3.x unicode type was renamed to str and str to bytes, so fromunicode
    should become fromstring (or maybe fromstr to avoid confusion) and
    fromstring should become frombytes.

    This should be discussed separately, perhaps on python-dev first.

    @vstinner
    Copy link
    Member

    vstinner commented Jun 20, 2010

    About the patch:

    • Why not reading itemsize from buffer.itemsize?
    • A test would be nice in Lib/test/test_array.py (in test_tofromstring?). Please add also a.extend(bytearray(b'xjxjx')) to the test suite.

    I agree that it is a bugfix and so i can be applied to 3.1 (and 3.2).

    --

    fromstring should become frombytes.

    I remember an issue proposing that. I also prefer .frombytes() name: we can add .frombytes() and keep .fromstring() as an alias to .frombytes() (and maybe mark it as deprecated).

    A message:
    http://bugs.python.org/issue3565#msg71205

    An issue starting at:
    http://bugs.python.org/issue1023290#msg92069

    Can someone open a issue about .frombytes()?

    fromunicode is still needed.

    Can someone open a new issue about .fromunicodes()?

    @tjol
    Copy link
    Mannequin Author

    tjol mannequin commented Jun 21, 2010

    Thanks for the input. I'm going to re-work the patch a bit (releasing buffers and such) and add a test within the next few days.

    The question remains whether or not to accept other buffers with itemsize == 1. The way I understand it, fromstring already accepted any read-only buffer object, no matter the item size / whether it actually makes sense to call it a "string". I don't think accepting a hypothetical read-only buffer with items wider than 1 in fromstring (yes, bad naming) is desirable behaviour - I see a few options on how to deal with input validation:

    1. ignore the item size. This'd be similar to current behaviour, plus r/w buffers

    2. only accept byte-based buffers. ("things that look like 'const char*'") - this is what I've been aiming at.

    3. only accept bytes and bytearray, and let the user think about how to deal with other objects. Question is - shouldn't array('B') be treated like bytearray in this respect?

    @tjol
    Copy link
    Mannequin Author

    tjol mannequin commented Jun 22, 2010

    OK, here's the new patch. I added tests for array(typecode, bytearray(b'abab')), a.extend(b'123') and a.extend(bytearray(b'123')).

    @victor: int itemsize is the array's item size, buffer.itemsize is the strings' (and must be 1)

    PROBLEM with this patch:

    I changed "s#" to "y*". This means that str arguments are no longer accepted by fromstring. I don't think they ever should have been in 3.x, but it is an incompatible change and this got the test suite, which (I assume the code hasn't changed since 2.x) used a str argument. (changed in patch). It might be best to use "s*" instead of "y*", especially if this is applied to 3.1?

    @vstinner
    Copy link
    Member

    vstinner commented Jul 1, 2010

    array2.diff:

    • can you reindent the line "Py_ssize_t old_size = Py_SIZE(self);"?(even if it's not part of your patch)
    • you can avoid the "char *str;" variable (use directly buffer.buf)

    --

    I changed "s#" to "y*". This means that str arguments are no longer
    accepted by fromstring (...) it is an incompatible change

    It's maybe time to create .frombytes() and .tobytes() methods:

    • .tostring() will be a (deprecated?) alias to .tobytes()
    • .frombytes() only accepts bytes, bytearray and buffer compatible objects: use "y*" format
    • .fromstring() accepts str, bytes, bytearray and buffer compatible objects (encode str to utf-8): use "s*" format

    But I still don't understand why array.fromstring() accepts character strings. So an easier solution is to apply array2.diff to Python 3.2, and replace "y*" by "s*" when the patch is applied to 3.1.

    @tjol
    Copy link
    Mannequin Author

    tjol mannequin commented Jul 1, 2010

    Two more patches:

    Firstly, this patch (array_3.2_fromstring.diff) is nearly identical to array2.diff. "y*" would (again) have to be changed to "s*" to apply this to 3.1

    @tjol
    Copy link
    Mannequin Author

    tjol mannequin commented Jul 1, 2010

    Secondly, this is my attempt to add the more sensibly named {to|from}bytes methods, and to deprecate {to|from}string.

    I doubt it's perfect, maybe there's some policy on deprecating methods that I didn't find? This may be better discussed in a separate forum, eg a separate issue or python-dev maybe? I wouldn't know.

    The unpatched test suite passes with the rest of this patch applied. (unless using -Werror)

    @vstinner
    Copy link
    Member

    vstinner commented Jul 28, 2010

    I prefer the second solution (add to/frombytes, deprecate to/fromstring) because I prefer the new method names and it keeps backward compatibility (until we choose to remove the old methods, which should be in Python 3.3).

    About the patch (tofrombytes.diff). You should use PyErr_WarnEx() result: if it is not nul (eg. if the user choosed to raise exceptions on warnings), you have to exit.

    @tjol
    Copy link
    Mannequin Author

    tjol mannequin commented Aug 22, 2010

    Hello again, sorry for the absense.

    Victor, thanks for the input. I've attached a new patch that checks the PyErr_WarnEx return value.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 22, 2010

    From a quick glance, the latest patch looks ok.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 31, 2010

    The patch shouldn't remove the tests for tostring()/fromstring() (they might be deprecated, but they are still supported).

    @tjol
    Copy link
    Mannequin Author

    tjol mannequin commented Aug 31, 2010

    That sounds reasonable. I've updated the patch to keep the old test_tofromstring testcase.

    I'll also attach another patch in a moment that removes what I'm reasonably sure is all the uses of array.tostring and .fromstring in the standard library and the other modules' tests.

    @pitrou
    Copy link
    Member

    pitrou commented Sep 1, 2010

    The patches were committed in r84403. Thank you very much!

    @pitrou pitrou closed this as completed Sep 1, 2010
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants