-
-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reading UTF16-encoded text file crashes if \r on 64-char boundary #48824
Comments
Problem in the newline handling in io.py, class The attached script illustrates the problems. |
The bug is in IncrementalNewlineDecoder, not in the codec nor |
Smaller example to demonstrate the problem. |
Here is a patch for test_io.py: check the problem by adding new |
Ugly patch to fix this issue:
|
A couple of suggestions:
- to encode '\r' without the BOM, you can e.g. use an incremental
encoder and encode it twice:
>>> enc = codecs.getincrementalencoder('utf16')('strict')
>>> enc.encode('\r')
b'\xff\xfe\r\x00'
>>> enc.encode('\r')
b'\r\x00' I think breaking the API can be ok since the original API is broken |
Here is a simpler patch with a different approach and a lot of tests. |
This new variant also removes the dangerous hack in getstate / setstate. |
utf16_newlines2.patch looks good to me. This is a data corruption issue. If it is deferred for 3.0.1 it must be +1 on putting this in 3.0.1. |
Committed to py3k and release30-maint in r67760 and r67759. Needs |
Backported to trunk and 2.6.2 in r67762 and r67764. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: