New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
codecs.readline sometimes removes newline chars #41794
Comments
In Python 2.4.1 i observed a new bug in codecs.readline, Probably related to bug bpo-1076985 (Incorrect See the attached files that demonstrate the problem. (btw, it seems bug bpo-1076985 was fixed in python 2.4.1, |
Logged In: YES Checked in a fix as: Are you really sure, that the fix for bpo-1098990 is not in |
Logged In: YES Sorry to comment on a closed report, but perhaps this fix Since the size parameter is already documented as being |
Logged In: YES OK, I'm reopening to bug report. I didn't manage to install |
Logged In: YES foo2.py from bpo-1163244 fails to import. Not being expert in |
Logged In: YES I think the foo2.py from 1163244 is probably the same bug; The problem is caused by StreamReader.readline doing: if self.atcr and data.startswith(u"\n"):
data = data[1:] since the tokenizer relies on '\n' as the line break FWIW (not much), I think the 2.4 StreamReader.readline As to changes from 2.4, if the unicode object were to add a |
Logged In: YES The current readline() is implemented so that even if the IMHO the best solution would be the read the extra character But in any case the tokenizer should be fixed to be able to Of course the simplest solution would be: "If you want a The other problem is if readline() returns data from the |
Logged In: YES OK, I've checked in the following: Lib/codecs.py 1.44 with the following changes as suggested by glchapman:
If a chunk read has a trailing "\r", read one more character
even if the user has passed a size parameter. Remove the
atcr logic. This should fix most of the problems. There are
three open issues:
|
Logged In: YES
My suggestion is to make the top of the loop look like: while True:
havecharbuffer = bool(self.charbuffer) And then the break condition (when no line break found) # we didn't get anything or this was our only try
if not data or (size is not None and not havecharbuffer): (too many negatives in that). Anyway, the idea is that, if Also, not sure about this, but should the size parameter As to issue 2, it looks like it should be possible to get By the way, using a findlinebreak function (using sre) turns |
Logged In: YES
This makes sense. However, with the current state of the
None seems to be a better default from an API viewpoint,
The patch for bpo-1178484 fixes this. Combined with this patch
Coding this on the C level and using Py_UNICODE_ISLINEBREAK() |
Logged In: YES Hello doerwalter, Our thanks to you and to glchapman for working on this bug. I think the project I am working on may have run into this Do you have any projection for when the fix will make it into Our production is running on 2.3.5. If it looks like a long Thanks again.
|
Logged In: YES Build 204 of pywin32 has a workaround for bug 1163244 which |
Logged In: YES 2.4.2 is out the door, so I'm closing this bug as fixed. |
Logged In: YES Confirmed fixed with Python 2.4.2 on mac os x. But the |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: