Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lost writes when writing to a FileObject that is a pipe #1711

Closed
ekimekim opened this issue Nov 29, 2020 · 3 comments · Fixed by #1715
Closed

Lost writes when writing to a FileObject that is a pipe #1711

ekimekim opened this issue Nov 29, 2020 · 3 comments · Fixed by #1715
Labels
PyVer: python2 Affects Python 2 Type: Question User support and/or waiting for responses

Comments

@ekimekim
Copy link

  • gevent version: All versions from 1.5a3 onwards
  • Python version: Tested with CPython 2.7.15 from alpine 3.7 repositories, and CPython 2.7.18 from archlinux repositories
  • Operating System: Tested on Arch Linux (kernel 5.8.10-arch1-1) and in an Alpine 3.7 docker container on the same host, as well as in an Alpine 3.7 container on a Debian Stretch (kernel 4.9.0-11-amd64) host

Description:

I have an application that concatenates a number of input files and writes them to stdin of a subprocess.
I've been experiencing issues where the input is claimed to be corrupted.
By replacing the subprocess with cat, I was able to inspect the exact content being written, and found that data, sometimes a large amount of data, was missing.

After further investigation, I've narrowed this down to an issue with the gevent FileObject api affecting all versions from 1.5a3 onwards, including the current version (20.9.0) and latest master (b7f8787 at time of writing). Using git bisect, I identified the commit where the behaviour begins to happen as 0dc161e (which makes sense, as this commit involves signifigant changes to the FileObject code).

I have reduced the reproduction case to this:

def concatenate(paths):
    with open('out', 'wb') as out:
        cat = subprocess.Popen(['cat'], stdin=subprocess.PIPE, stdout=out)
    for path in paths:
        with open(path, 'rb') as inp:
            # This block can be replaced with shutil.copyfileobj() but I do it manually here to reduce unknowns
            while True:
                c = inp.read(16*1024)
                if not c:
                    break
                cat.stdin.write(c)
    cat.stdin.close()
    if cat.wait() != 0:
        raise Exception()

Having multiple files to copy seems to be important for the bug to occur - I'm not sure why. Note that this code is entirely sequential, and indeed under normal python it runs correctly. It only breaks if I monkey patch subprocess.

When the bug occurs, it seems to take the form of having blocks of missing data in the output - in effect, some calls to write() are ignored. I have confirmed that the bug is when writing - if I count the amount of data that I read from the files, I get the correct size.

The bug does not trigger every time, but seems to be more common under certain circumstances:

  • It seems more likely to occur when dealing with many small files instead of few large ones
  • It does not occur if the file sizes are round numbers (eg. exactly 1MiB, but I suspect it matters if it's an exact multiple of the 16KiB write block size)
  • It occurs regardless of whether the data is valid utf-8 vs binary
  • It does not occur if writing to a regular file, only to a pipe. I suspect this is because regular file writes are always blocking.
  • It does not occur in Python 3.8.5 - I suspect this is a python2-specific issue

Steps to reproduce

I suggest doing the following in a new directory, as it will write a large number of files.

  1. Run this script to generate 64 files of just over 1MB:
with open('correct', 'w') as correct:
    for a in range(64):
        with open("{:02x}".format(a), 'w') as f:
            for b in range(2**17+128):
                s = "{:02x}{:05x}\n".format(a, b)
                f.write(s)
                correct.write(s)

The generated files will contain lines of form FFooooo where FF is the file name and ooooo is the line number in hex. I found this useful for seeing exactly what data was missing since it was all regularly spaced and in order.

  1. Run this script using python2 and a gevent version later than 1.5a2:
import gevent.monkey
gevent.monkey.patch_all()

import itertools
import os
import subprocess

def concatenate():
    paths = ["%02x" % i for i in range(64)]
    with open('out', 'wb') as out:
        cat = subprocess.Popen(['cat'], stdin=subprocess.PIPE, stdout=out)
    for path in paths:
        with open(path, 'rb') as inp:
            while True:
                c = inp.read(16*1024)
                if not c:
                    break
                cat.stdin.write(c)
    cat.stdin.close()
    if cat.wait() != 0:
        raise Exception()

def main():
    correct_size = os.stat('correct').st_size
    for attempt in itertools.count():
        print("Attempt {}".format(attempt))
        concatenate()
        if os.stat('out').st_size != correct_size:
            break

main()

On my machine it generally takes up to 10 tries to trigger the bug. I have managed to trigger the bug even with only two files being read, but this required over 500 tries.

You can compare the expected output to actual output by looking at the correct and out files respectively.

@jamadden
Copy link
Member

jamadden commented Dec 8, 2020

(EDIT TO ADD: I saw "3.7" in the reported Python version string and interpreted all of this as if on Python 3. But instead it was Python 2, installed from some repository version 3.7.)

I argue that this example has a bug:

            while True:
                c = inp.read(16*1024)
                if not c:
                    break
                cat.stdin.write(c)

Much like #1706, this code is ignoring the fact that io.write(b) may not write all the bytes of b:

Write the given bytes-like object, b, to the underlying raw stream, and return the number of bytes written. This can be less than the length of b in bytes,

So when you use low-level IO like that, you may need to wrap it in a loop that makes write calls until the data is fully written:

                while True:
                    c = inp.read(16 * 1024)
                    if not c:
                        break
                    written = 0
                    to_write = len(c)
                    while to_write:
                        x = cat.stdin.write(c[written:])
                        to_write -= x
                        written += x

Doing so fixes the problem.

The documentation continues with weasel words:

depending on specifics of the underlying raw stream.

This explains the difference between Python 2 and Python 3. In Python 2, subprocess streams are unbuffered by default, while in Python 3 they are buffered by default. The bufzise argument to Popen controls this. Enabling buffering explicitly in Python 2 also solves the problem and makes it behave like Python 3:

        cat = subprocess.Popen(['cat'], stdin=subprocess.PIPE, stdout=out, bufsize=-1)

EDITED to add: Why does this start showing up in gevent 1.5? There was a bug fix to the file object that made it actually start respecting the buffering arguments. Previously, it always buffered even when asked not to.

(I suspect another bug is the scope of the with open('out', 'wb') as out:; as written, it's closing the file before it gets used because the for block beneath it is not part of the with block. I immediately get the expected ValueError: write to closed file running this example, so I corrected that by moving the for block into the with body.)

@jamadden jamadden added PyVer: python2 Affects Python 2 Status: not gevent Environment or otherwise not a gevent issue. No further work expected. Type: Question User support and/or waiting for responses labels Dec 8, 2020
@ekimekim
Copy link
Author

ekimekim commented Dec 8, 2020

Ah, I see what's happening.

I still think gevent is at fault here, but for a different reason - it means that the gevent.subprocess module is not acting as a drop-in replacement for the subprocess module.

The issue is that in python 2, calling cat.stdin.write() returns None (as indeed does file.write() for all file objects). The docstring states:

write(...)
    write(str) -> None.  Write string str to file.
    
    Note that due to buffering, flush() or close() may be needed before
    the file on disk reflects the data written.

I should note that my original bug happened when I was using shutil.copyfileobj (ie. a standard library routine that should be writing to the file correctly), not using my own write loop.

Just to confirm, I have this test program:

import os

if os.environ.get('GEVENT'):
	import gevent.monkey
	gevent.monkey.patch_all()

import subprocess
cat = subprocess.Popen(['cat'], stdin=subprocess.PIPE)

ret = cat.stdin.write(b"foo\n")
print(repr(ret))

Here are the results with and without gevent, in python2 and python3:

$ python test.py
None
$ GEVENT=true python test.py
4L
$ python3 test.py
4
$ GEVENT=true python3 test.py
4

So in python 2, using gevent changes the code from being required to not check the number of bytes written, to being required to check.

I'm not sure what the best behaviour would be to do here from gevent. It could:

  • Change its behaviour depending on python version
  • Always ensure all bytes are written, and return the total (IMO returning an int from a function that normally has no useful return value isn't a problem, and this approach won't break any existing gevent-using py2 code that is assuming the current behaviour)
  • Keep the currrent behavour, but put a warning in the documentation that monkey-patching subprocess has known issues under python2.

In any case, thanks for looking into this.

Also about the with open('out', 'wb') as out: - it shouldn't matter that it's closed once the for loop starts, as we're only closing it in our process. The cat process should have already been started and inherited the open file. We never write to it directly. I'm not sure how you'd get a write to closed file error.

@jamadden jamadden removed the Status: not gevent Environment or otherwise not a gevent issue. No further work expected. label Dec 8, 2020
@jamadden
Copy link
Member

jamadden commented Dec 8, 2020

I'm not sure how you'd get a write to closed file error.

User error on my end. Ignore please.

jamadden added a commit that referenced this issue Dec 8, 2020
…p to write all the data.

More like what Python 2 standard library does. Beware of relying on that though during an upgrade.

Fixes #1711
ekimekim added a commit to ekimekim/wubloader that referenced this issue Jan 16, 2021
ekimekim added a commit to ekimekim/wubloader that referenced this issue Aug 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PyVer: python2 Affects Python 2 Type: Question User support and/or waiting for responses
Projects
None yet
2 participants