New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bytes and unicode splitlines() methods differ on what is a line break #68789
Comments
for bytes, \v (0x0b) is not considered a line break. for unicode, it is. this traces back to the Objects/stringlib/ code where unicode defers to the decision made by Objects/unicodeobject.c's ascii_linebreak table which contains 7 line breaks in the 0..127 character range: static unsigned char ascii_linebreak[] = { Whereas Objects/stringlib/stringdefs.h used by only considers \r and \n. I think these should be consistent. But making this change likely breaks existing code in weird ways. This does come up when porting from 2 to 3 as a str '' type with one of those other characters in it was not broken by splitlines in 2.x but is broken by splitlines in 3.x. |
On Fri, Jul 10, 2015 at 02:18:33AM +0000, Gregory P. Smith wrote:
I'm not sure that they should. Unicode includes other line breaks which |
|
hah, i should've searched the tracker first. looks like the other open issues cover this. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: