-
-
Notifications
You must be signed in to change notification settings - Fork 31.7k
patch to implement PEP 461 (%-interpolation for bytes) #64483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is a very rough, proof of concept patch that implements %-style formatting for bytes objects. Currently it calls __format__ with a bytes argument and expects a bytes result. I've only implemented Expected behavior: >>> b'%s' % b'hello'
b'hello'
>>> b'%s' % 'hello'
TypeError is raised
>>> b'%s' % 123
b'123'
>>> b'%d' % 123
b'123' Some issues:
|
I'm attaching v2 of my proposed patch. This one is quite a bit better, IMHO.
|
I reviewed your second patch on Rietveld. |
Uploading new patch with the following changes:
I will upload a draft PEP (proposed as a replacement for 461). Victor, thanks for the review. My reply is:
|
Another revision of the patch, now quite close to PEP-461 as proposed. Changes from PEP-461:
Changes from previous patch:
Reference counting in PyBytes_Format is quite hairy, could use some review. The code is nearly the same as Python 2.x stringobject.c. |
I've updated my patch into a sequence, the first of which implements PEP-461. 02-code-a.patch adds support for %a (ascii() on arg) 03-py2-flag.patch makes %s and %r behave similar to Python 2 if a command 04-py-eq.patch makes the command line flag also enable comparision between bytes() and str() (warning is generated). |
PEP-461 has been accepted. I'll look over the code soon. |
Just noting I'm working on some significant updates to the bytes and bytearray docs in bpo-21777. I'll try to get that ready for review and merged relatively soon, so the docs for this can build on top of those changes. |
Hi. I proposed twice to Ethan to implement the PEP-461, but he replied that he wants to implement it. So, what's the status of the implementation? |
I would be nice to share as much code as possible with the Unicode implementation. My idea was to add a "_PyBytesWriter" API, very close to the "_PyUnicodeWriter", to share code. Old patch implementing the _PyBytesWriter API: issue bpo-17742 (rejected because it was less efficient, the compiler produces less efficient machine code). |
With the first alpha next month, unless we hear otherwise from Ethan in the next day or two, I'd suggest going ahead with the implementation. We can always tweak it during the alpha cycle if there are specific details he'd like to see changed. |
Here is what I have so far:
This is basically an adaptation of the 2.7 code for str, adjusted appropriately. I was planning on having bytearray convert to bytes, then call the bytes code, then integrate the results back into the existing bytearray (for %=) or create and return a new bytearray (for %). I can easily believe this is not the most efficient way to do it. ;) I should have the bytearray portion done, if not this weekend, then by the following weekend. I have no objections if Victor wants to combine and optimize with the unicode implementation (and no need to wait for me to finish the bytearray portion). |
Ethan, do you have a public repository? If no, you can for example |
Sorry, no. And time is scarce at the moment so figuring out server-side clones will have to wait as well. I uploaded the patch of what I have so far -- hopefully that will be helpful. Also attaching patch with just the tests. |
I've been digging into this over the last week and come to the realization that I won't be able to finish this patch. My apologies. Victor, can you take over? I would appreciate it. The tests I have written are only for the Python side. The patch I was working on (inherited from Niel and the Python 2 code base) also added a couple C ABI functions -- do we want/need these? How do we write tests for them? |
Removed the new ABI functions, all new functions are static. Duplicated bytes code in bytearray. in-place interpolation returns new bytearray at this point. I'll work on getting in-place working, but otherwise I'll commit this in a week so we have something in for the first alpha. |
Better patch, along the lines of my original thought:
Now working on in-place format. |
I will not have to work on optimization before the alpha 1 (February 8, 2015). Ethan: just commit your patch when you consider that it's ready to be IMO it's more important to have the feature in alpha 1 than having |
Here's the patch -- the code for % and %= is in place for bytes and bytearray; I still need to get the doc patch done. I'll commit Wednesday-ish barring problems. Big question Background Actual Question My Thoughts |
Thanks, Victor, for the feedback. I was able to figure out some more of the C side thanks to Georg, and I think the code is looking pretty good. There may be room for optimization by having the bytes code call the unicode implementation for more of the conversions (currently it's only using the unicode fromlong function), but the docs should happen before that. ;) |
New changeset 8d802fb6ae32 by Ethan Furman in branch 'default': |
It's strange that %s format raises an error about the %b format: >>> b's? %s' % 1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: %b requires bytes, or an object that implements __bytes__, not 'int' |
it does seem a bit odd -- on the other hand, %s is an alias for %b, is deprecated for new 3-only code, and this might help serve as a reminder of that. Or we could fix it. ;) |
New changeset db7ec64aac39 by Victor Stinner in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: