Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support bytes-like objects when base is given to int() #71759

Closed
zhangyangyu opened this issue Jul 19, 2016 · 14 comments
Closed

Support bytes-like objects when base is given to int() #71759

zhangyangyu opened this issue Jul 19, 2016 · 14 comments
Labels
3.13 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) pending The issue will be closed if no feedback is provided type-feature A feature request or enhancement

Comments

@zhangyangyu
Copy link
Member

BPO 27572
Nosy @rhettinger, @bitdancer, @vadmium, @serhiy-storchaka, @zhangyangyu
PRs
  • gh-71759: Deprecate using bytes-like objects in builtins. #779
  • Files
  • bytes_like_support_to_int.patch
  • bytes_like_support_to_int_v2.patch
  • deprecate_byte_like_support_in_int.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2016-07-19.08:56:16.627>
    labels = ['interpreter-core', 'type-feature', '3.8']
    title = 'Support bytes-like objects when base is given to int()'
    updated_at = <Date 2018-09-17.07:17:27.531>
    user = 'https://github.com/zhangyangyu'

    bugs.python.org fields:

    activity = <Date 2018-09-17.07:17:27.531>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Interpreter Core']
    creation = <Date 2016-07-19.08:56:16.627>
    creator = 'xiang.zhang'
    dependencies = []
    files = ['43790', '43825', '44841']
    hgrepos = []
    issue_num = 27572
    keywords = ['patch']
    message_count = 11.0
    messages = ['270818', '270972', '270973', '270975', '270989', '270994', '270999', '271009', '271101', '277509', '290032']
    nosy_count = 5.0
    nosy_names = ['rhettinger', 'r.david.murray', 'martin.panter', 'serhiy.storchaka', 'xiang.zhang']
    pr_nums = ['779']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue27572'
    versions = ['Python 3.8']

    @zhangyangyu
    Copy link
    Member Author

    Right now, int() supports bytes-like objects when *base* is not given:

    >>> int(memoryview(b'100'))
    100

    When *base* is given bytes-like objects are not supported:

    >>> int(memoryview(b'100'), base=2)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: int() can't convert non-string with explicit base

    Is there any obvious reason not to support it when *base* is given? I suggest add it.

    @zhangyangyu zhangyangyu added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Jul 19, 2016
    @zhangyangyu
    Copy link
    Member Author

    Thanks for the reviews Martin. Change the doc and test.

    @vadmium
    Copy link
    Member

    vadmium commented Jul 22, 2016

    I am torn on this. On one hand, it would be good to be consistent with the single-argument behaviour. But on the other hand, APIs normally accept arbitrary bytes-like objects (like memoryview) to minimise unnecessary copying, whereas this case has to make a copy to append a null terminator.

    Perhaps another option is to deprecate int(byteslike) support instead, in favour of explicitly making a copy using bytes(byteslike). Similarly for float, compile, eval, exec, which also do copying thanks to bpo-24802. But PyNumber_Long() has called PyObject_AsCharBuffer() (predecessor of Python 3’s bytes-like objects) since 1.5.2 (revision 74b7213fb609). So this option would probably need wider discussion.

    @zhangyangyu
    Copy link
    Member Author

    It's reasonable. My original intention is to make the behaviour consistent. If the single-argument behaviour is OK with bytes-like objects, why not others? So I think we'd better wait for other developers to see what their opinions are.

    @bitdancer
    Copy link
    Member

    Since bytes are accepted in both cases, the inconsistency does seem odd. Looking at the history, I think the else statement that checks the types that can be handled was introduced during the initial py3k conversion, and I'm guessing that else was just forgotten in subsequent updates that added additional bytes-like types. The non-base branch calls PyNumber_Long, where I presume it picked up the additional type support.

    If a copy has to be done anyway, perhaps we can future proof the code by doing a bytes conversion internally in long_new?

    Disallowing something that currently works without a good reason isn't good for backward compatibility, so I'd vote for making this work consistently one way or another.

    @serhiy-storchaka
    Copy link
    Member

    It looks to me that the support of bytes-like objects besides bytes and bytearray was added accidentally, as a side effect of supporting Unicode. Note, that this support had a bug until bpo-24802, thus correct support of other bytes-like objects exists less than a year. The option of deprecating other bytes-like objects support looks reasonable to me. Especially in the light of deprecating bytearray paths support (bpo-26800).

    On other side, the need of copying a buffer can be considered as implementation detail, since low-level int parsing functions require NUL-terminated C strings. We can add alternative low-level functions that work with not-NUL-terminated strings. This needs more work.

    @bitdancer
    Copy link
    Member

    So less than a year means only some versions of 3.5? So we could drop it in 3.6 and hope we don't break anybody's code? I'm not sure I like that...I think the real problem is the complexity of handling multiple bytes types, and that ought to have a more general solution. I'm not volunteering to work on it, though, so I'm not voting against dropping it.

    @serhiy-storchaka
    Copy link
    Member

    No, the fix was applied to all maintained versions (2.7 and 3.4+). This means that we need some deprecation period before dropping this feature (if decide to drop it).

    What about other Python implementations? Are they support byte-likes objects besides bytes and bytearray? Do they correctly handle embedded NUL and not-NUL-terminated buffers?

    @zhangyangyu
    Copy link
    Member Author

    pypy seems so.

    [PyPy 5.2.0-alpha0 with GCC 4.8.2] on linux
    >>>> int(memoryview(b'123A'[1:3]))
    23
    >>>> int(memoryview(b'123 '[1:3]))
    23

    @serhiy-storchaka
    Copy link
    Member

    Here is a patch that deprecates support of bytes-like objects except bytes and bytearray in int(), float(), compile(), eval(), exec(). I'm not arguing for it, this is just for the ground of the discussion.

    @serhiy-storchaka
    Copy link
    Member

    Created PR 779 for the deprecation.

    @serhiy-storchaka serhiy-storchaka added the 3.7 (EOL) end of life label Mar 23, 2017
    @serhiy-storchaka serhiy-storchaka added 3.8 only security fixes and removed 3.7 (EOL) end of life labels Sep 17, 2018
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @iritkatriel iritkatriel added the pending The issue will be closed if no feedback is provided label Nov 9, 2022
    @iritkatriel
    Copy link
    Member

    According to the discussion on the PR both @serhiy-storchaka and @brettcannon have concerns about this change. Marking as pending so we can make a decision either way.

    @arhadthedev

    This comment was marked as outdated.

    @JelleZijlstra
    Copy link
    Member

    My preference would be to close this issue with no action, leaving the current behavior in place. If we were adding these APIs today we shouldn't add support for arbitrary buffers, but the support is there now, it doesn't cost us a lot, and deprecating support may break some users, so I don't think it's worth deprecating the support as #779 proposes.

    @erlend-aasland erlend-aasland added 3.13 bugs and security fixes and removed 3.8 only security fixes labels Jan 5, 2024
    @encukou encukou closed this as not planned Won't fix, can't repro, duplicate, stale Mar 19, 2024
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.13 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) pending The issue will be closed if no feedback is provided type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    9 participants