New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
py3k: duplicated line endings when using read(1) #45736
Comments
When reading a Windows text file one byte at a time, \r\n get split into The following test fails (put it somewhere in test_io.py, inside def testReadOneByOne(self):
txt = io.TextIOWrapper(io.BytesIO(b"AA\r\nBB"))
reads = ""
while True:
c = txt.read(1)
if not c:
break
reads += c
self.assertEquals(reads, "AA\nBB")
# AssertionError: 'AA\n\nBB' != 'AA\nBB' Note that replacing read(1) by read(2) gives the correct result. This problem is why test_netrc fails on Windows. It may also be the root |
Wow, thanks! >>> f = open("@", "wb")>>> f.write(b"a\r\n")
6
>>> f.close()
>>> f = open("@", "r")
>>> f.read(1)
'a'
>>> f.read(1)
'\n'
>>> |
I think the solution is to do the translation on a bigger chunk than on Index: Lib/io.py --- Lib/io.py (revision 58874)
+++ Lib/io.py (working copy)
@@ -1253,8 +1253,9 @@
res += pending
if not readahead:
break
+ res = self._replacenl(res)
self._pending = res[n:]
- return self._replacenl(res[:n])
+ return res[:n] def __next__(self):
self._telling = False Of course, we need to take care of the case when the last character in |
Some thoughts:
|
I am attaching the patch io.diff that does the following:
I also incorporated the test case by Amaury and added one more. With this patch in place, the following tests failed (on SuSE 10.1): test_doctest test_mailbox test_nis test_old_mailbox The failures (other than known test_mailbox and test_old_mailbox) didn't |
This patch goes in the right direction, but has several problems IMO:
I will try to write these test cases if you want. |
IMO you shouldn't read another chunk when the last character you've seen (The problem with reading another character is that in interactive input |
On 11/6/07, Amaury Forgeot d'Arc <report@bugs.python.org> wrote:
Yes. reading another chunk is not an optimal solution but after seeing
That will definitely be useful. |
Guido van Rossum wrote:
In my opinion the check for \r should only happen when os.linesep or Christian |
No: it is not dependent on os.linesep but on the newline parameter |
I am attaching another patch (io2.diff). Please review. I am not sure BTW, PEP-3116 says: "If universal newlines without translation are requested on input (i.e. I suppose this issue is mainly talking about the latter (newline is |
The new patch fixes test_netrc for me but test_csv and test_mailbox are |
Unfortunately, I am not able to build python on windows so I can not |
This looks promising, but my head hurts when I try to understand the Regarding "universal newlines without translation:" that means that \r\n |
For test_mailbox at least, I think I have a clue: the _pending member Can we try to somehow move the replacenl() call inside the _get_chunk |
Somebody needs to reverse-engineer the invariants applying to the |
By the way what happened to the SoC project related to Python's new IO |
On 11/7/07, Christian Heimes <report@bugs.python.org> wrote:
I think it was Alexandre Vassalotti. Is that right, Alexandre? Or am I |
Hi Amaury and Christian, io3.diff does replacenl() in adjust_chunk() (trying Amaury's |
Good work! The tests for mailbox, netrc and csv are passing with your test. I'm |
I take it back. I accidentally run the unit tests on the trunk instead |
On 11/7/07, Guido van Rossum <guido@python.org> wrote:
I think so. My GSoC project was to merge the interface of |
Cool. How hard do you think it would be to extend your work on On Nov 7, 2007 3:55 PM, Alexandre Vassalotti <alexandre@peadrop.com> wrote:
|
Unfortunately, it does not. And some tests now fail in test_io |
On 11/7/07, Guido van Rossum wrote:
Well, StringIO and TextIOWrapper are quite different. The only part Nevertheless, that would be neat project for me. I could start to work |
OK, I have taken another approach which seems to work (see io4.diff): While not completely finished, this approach seems much saner to me: it Next steps are:
About mailbox.py: it seems that the code cannot work: it uses statements |
The patch doesn't apply $ patch -p0 < io4.diff
(Stripping trailing CRs from patch.)
patching file Lib/io.py
patch: **** malformed patch at line 41: @@ -1133,7 +1160,10 @@ |
Sorry, I think I corrupted the file by hand. Here is another version |
Of course: the file mode was recently changed from rb+ to r+ (revision |
The new io4.diff breaks test_io and test_univnewlines on Linux |
Here is a new version: io5.diff, which should handle the "seen newlines" Two more bug fixes found by test_univnewlines:
|
Considering that test_csv is failing on windows even without any changes ----------------- --- Lib/test/test_csv.py (revision 58914)
+++ Lib/test/test_csv.py (working copy)
@@ -375,7 +375,7 @@
class TestCsvBase(unittest.TestCase):
def readerAssertEqual(self, input, expected_result):
- with TemporaryFile("w+") as fileobj:
+ with TemporaryFile("w+", newline='') as fileobj:
fileobj.write(input)
fileobj.seek(0)
reader = csv.reader(fileobj, dialect = self.dialect) Does this look ok? The tests pass on windows and Linux. |
On 11/8/07, Amaury Forgeot d'Arc <report@bugs.python.org> wrote:
I like this approach even though I haven't looked at the patch in detail. |
Updated patch (io6.diff):
|
Yes, especially since writerAssertEqual() already uses that. :-) I do think there is something iffy here -- the 2.x version of this |
On 11/8/07, Guido van Rossum <report@bugs.python.org> wrote:
I think that requirement (need to open in binary mode) is no more |
By the way I've found the daily builds you were asking for, Raghuram. |
Committed the patch in r59060. |
I am trying to get Python working when compiled with Visual Studio 2010 (cf bpo-13210). When running the tests with the python 2.7 branch compiled with VS2010, the "test_issue_1395_5" in test_io.py will cause Python to eat the whole memory within a few seconds and make the server completely unresponsive. |
You should open a new issue for this new problem. |
OK, sorry. Done in bpo-13461. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: