Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

base64 module should use memoryview #62039

Closed
ncoghlan opened this issue Apr 25, 2013 · 20 comments
Closed

base64 module should use memoryview #62039

ncoghlan opened this issue Apr 25, 2013 · 20 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@ncoghlan
Copy link
Contributor

BPO 17839
Nosy @warsaw, @ncoghlan, @pitrou, @vstinner, @ezio-melotti, @serhiy-storchaka, @kushaldas
Dependencies
  • bpo-17842: Add base64 module tests for a bytearray argument
  • Files
  • issue17839_v1.patch: Code update and test case update
  • base64_buffer_input.patch
  • base64_buffer_input_2.patch
  • issue17839_base64_buffer_input_3.patch: Now with memoryview support for the codec interface
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ncoghlan'
    closed_at = <Date 2013-10-02.15:03:44.785>
    created_at = <Date 2013-04-25.07:32:34.773>
    labels = ['type-feature', 'library']
    title = 'base64 module should use memoryview'
    updated_at = <Date 2013-11-13.14:25:23.484>
    user = 'https://github.com/ncoghlan'

    bugs.python.org fields:

    activity = <Date 2013-11-13.14:25:23.484>
    actor = 'python-dev'
    assignee = 'ncoghlan'
    closed = True
    closed_date = <Date 2013-10-02.15:03:44.785>
    closer = 'python-dev'
    components = ['Library (Lib)']
    creation = <Date 2013-04-25.07:32:34.773>
    creator = 'ncoghlan'
    dependencies = ['17842']
    files = ['30022', '30044', '30337', '30346']
    hgrepos = []
    issue_num = 17839
    keywords = ['patch']
    message_count = 20.0
    messages = ['187760', '187778', '187837', '187860', '187897', '187911', '187919', '187984', '189537', '189580', '189793', '189858', '189860', '189861', '189900', '198832', '198835', '198843', '198844', '202752']
    nosy_count = 8.0
    nosy_names = ['barry', 'ncoghlan', 'pitrou', 'vstinner', 'ezio.melotti', 'python-dev', 'serhiy.storchaka', 'kushal.das']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue17839'
    versions = ['Python 3.4']

    @ncoghlan
    Copy link
    Contributor Author

    The base64 module is currently restricted specifically to bytes and bytearray objects. By using memoryview, it could effectively decode any input type that provides an 8-bit C contiguous view of the underlying data.

    @ncoghlan ncoghlan added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Apr 25, 2013
    @kushaldas
    Copy link
    Member

    Working on this.

    @kushaldas
    Copy link
    Member

    A patch with tests update to use memoryview in base64 module.

    @serhiy-storchaka
    Copy link
    Member

    Not only memoryview should be supported, but any class which supports the buffer protocol and is C-contiguous (i.e. array.array).

    @pitrou
    Copy link
    Member

    pitrou commented Apr 27, 2013

    Yes, this should probably be done the other way around: wrap the base64 argument in a memoryview, and encode/decode from it.

    @ncoghlan
    Copy link
    Contributor Author

    As Serhiy and Antoine noted, what I had in mind here was to try to wrap the input in a memoryview if it wasn't an instance of str, bytes or bytearray.

    An existing memoryview will be passed back unmodified, while something like array.array will provide a view into its raw data for encoding or decoding. The tests would then cover both memoryview and array.array to ensure it was all working as expected.

    @serhiy-storchaka
    Copy link
    Member

    Note that some functions use bytes/bytearray methods and an argument should converted to bytes for them. Other functions use C functions which supports the buffer protocol and do not required any conversion.

    @serhiy-storchaka
    Copy link
    Member

    Here is a patch which allows bytes-like arguments in the base64 module. I just removed type checking if underlying function raises an exception with an appropriate message. I'm not sure about b32encode(), perhaps we can left an exception from memoryview().

    @serhiy-storchaka
    Copy link
    Member

    Nick, what you say about the patch?

    @ncoghlan
    Copy link
    Contributor Author

    Oops, my review comments don't actually make sense because I looked at the patch in isolation, rather than checking the full context in the module. Sorry about that.

    We have 2 different cases to deal with, only one of which currently has a helper function.

    I suggest renaming _bytes_from_decode_data to "_bytes_for_decoding" and adding "_bytes_for_encoding". The difference between them is the implicit encoding of pure ASCII strings to bytes in the decoding case and the details of the error message thrown.

    The encoding and decoding functions should then use the appropriate coercion helper for both the input data and for altchars.

    @serhiy-storchaka
    Copy link
    Member

    Thank you Ezio and Nick for your comments.

    I suggest renaming _bytes_from_decode_data to "_bytes_for_decoding" and adding "_bytes_for_encoding".

    I rather think a TypeError exception raised by memoryview(s).tobytes() is good enough and we don't need a special wrapper.

    @ncoghlan
    Copy link
    Contributor Author

    We now also need to update the new footnote that I added for bpo-17844 in http://hg.python.org/cpython/rev/801567d6302c

    @ncoghlan
    Copy link
    Contributor Author

    The codec uses the old API that breaks the encoded output up into multiple lines.

    The new patch:

    1. Also changes the behaviour of the older de/encodebytes API
    2. Checks that all the defined binary transform codecs actually support memoryview as input for both encoding and decoding (and that the data roundtrips correctly)

    @ncoghlan
    Copy link
    Contributor Author

    Just adding a note for easier cross-referencing: the old behaviour of these functions was one of the culprits that led to the removal of the codec aliases as discussed in bpo-7475

    @serhiy-storchaka
    Copy link
    Member

    It works only if input also supports slicing and this slicing is corresponded to bytes-slicing. It perhaps doesn't catch a non C-continuous buffers.

    Actually we need something like (unlikely cast() doesn't work with empty buffers):

        s = memoryview(s)
        if not s:
            return b''
        s = s.cast('B')
        ...

    @ncoghlan
    Copy link
    Contributor Author

    ncoghlan commented Oct 2, 2013

    binascii already only supports simple C contiguous buffers, expanding it and the base64 module to handle anything else should be a separate RFE.

    @ncoghlan
    Copy link
    Contributor Author

    ncoghlan commented Oct 2, 2013

    However, _input_type_check should enforce that (as binascii does), so I'll add that before committing.

    @ncoghlan ncoghlan self-assigned this Oct 2, 2013
    @ncoghlan
    Copy link
    Contributor Author

    ncoghlan commented Oct 2, 2013

    After working through this, I found that the modern base64 API just relies on the checks in binascii. All that checks for is:

    1. Can the data by exported using PyBUF_SIMPLE?
    2. Is it C contiguous?

    It completely ignores the number of dimensions and the format information. I added tests to at least capture this behaviour, even though it seems a little dubious to me.

    For the legacy API, I didn't relax the input checks that far - the legacy API will still complain if there is more than 1 dimension and if the format code isn't one of 'c', 'b' or 'B'. That's already substantially more permissive than what it supported in previous versions.

    Just running the full test suite now, will push after that finishes.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 2, 2013

    New changeset d90f25e1a705 by Nick Coghlan in branch 'default':
    Close bpo-17839: support bytes-like objects in base64 module
    http://hg.python.org/cpython/rev/d90f25e1a705

    @python-dev python-dev mannequin closed this as completed Oct 2, 2013
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 13, 2013

    New changeset e53984133740 by Nick Coghlan in branch 'default':
    Issue bpo-17839: mention base64 change in What's New
    http://hg.python.org/cpython/rev/e53984133740

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants