-
-
Notifications
You must be signed in to change notification settings - Fork 29.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support bytes-like objects when base is given to int() #71759
Comments
Right now, int() supports bytes-like objects when *base* is not given: >>> int(memoryview(b'100'))
100 When *base* is given bytes-like objects are not supported: >>> int(memoryview(b'100'), base=2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: int() can't convert non-string with explicit base Is there any obvious reason not to support it when *base* is given? I suggest add it. |
Thanks for the reviews Martin. Change the doc and test. |
I am torn on this. On one hand, it would be good to be consistent with the single-argument behaviour. But on the other hand, APIs normally accept arbitrary bytes-like objects (like memoryview) to minimise unnecessary copying, whereas this case has to make a copy to append a null terminator. Perhaps another option is to deprecate int(byteslike) support instead, in favour of explicitly making a copy using bytes(byteslike). Similarly for float, compile, eval, exec, which also do copying thanks to bpo-24802. But PyNumber_Long() has called PyObject_AsCharBuffer() (predecessor of Python 3’s bytes-like objects) since 1.5.2 (revision 74b7213fb609). So this option would probably need wider discussion. |
It's reasonable. My original intention is to make the behaviour consistent. If the single-argument behaviour is OK with bytes-like objects, why not others? So I think we'd better wait for other developers to see what their opinions are. |
Since bytes are accepted in both cases, the inconsistency does seem odd. Looking at the history, I think the else statement that checks the types that can be handled was introduced during the initial py3k conversion, and I'm guessing that else was just forgotten in subsequent updates that added additional bytes-like types. The non-base branch calls PyNumber_Long, where I presume it picked up the additional type support. If a copy has to be done anyway, perhaps we can future proof the code by doing a bytes conversion internally in long_new? Disallowing something that currently works without a good reason isn't good for backward compatibility, so I'd vote for making this work consistently one way or another. |
It looks to me that the support of bytes-like objects besides bytes and bytearray was added accidentally, as a side effect of supporting Unicode. Note, that this support had a bug until bpo-24802, thus correct support of other bytes-like objects exists less than a year. The option of deprecating other bytes-like objects support looks reasonable to me. Especially in the light of deprecating bytearray paths support (bpo-26800). On other side, the need of copying a buffer can be considered as implementation detail, since low-level int parsing functions require NUL-terminated C strings. We can add alternative low-level functions that work with not-NUL-terminated strings. This needs more work. |
So less than a year means only some versions of 3.5? So we could drop it in 3.6 and hope we don't break anybody's code? I'm not sure I like that...I think the real problem is the complexity of handling multiple bytes types, and that ought to have a more general solution. I'm not volunteering to work on it, though, so I'm not voting against dropping it. |
No, the fix was applied to all maintained versions (2.7 and 3.4+). This means that we need some deprecation period before dropping this feature (if decide to drop it). What about other Python implementations? Are they support byte-likes objects besides bytes and bytearray? Do they correctly handle embedded NUL and not-NUL-terminated buffers? |
pypy seems so. [PyPy 5.2.0-alpha0 with GCC 4.8.2] on linux
>>>> int(memoryview(b'123A'[1:3]))
23
>>>> int(memoryview(b'123 '[1:3]))
23 |
Here is a patch that deprecates support of bytes-like objects except bytes and bytearray in int(), float(), compile(), eval(), exec(). I'm not arguing for it, this is just for the ground of the discussion. |
Created PR 779 for the deprecation. |
According to the discussion on the PR both @serhiy-storchaka and @brettcannon have concerns about this change. Marking as pending so we can make a decision either way. |
This comment was marked as outdated.
This comment was marked as outdated.
My preference would be to close this issue with no action, leaving the current behavior in place. If we were adding these APIs today we shouldn't add support for arbitrary buffers, but the support is there now, it doesn't cost us a lot, and deprecating support may break some users, so I don't think it's worth deprecating the support as #779 proposes. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: