-
-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
open().write() and .read() fails on 2 GB+ data (OS X) #68846
Comments
On OS X, the Homebrew and MacPorts versions of Python 3.4.3 raise an exception when writing a 4 GB bytearray: >>> open('/dev/null', 'wb').write(bytearray(2**31-1))
2147483647
>>> open('/dev/null', 'wb').write(bytearray(2**31))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [Errno 22] Invalid argument This has an impact on pickle, in particular (http://stackoverflow.com/questions/31468117/python-3-can-pickle-handle-byte-objects-larger-than-4gb). |
PS: I should have written "2 GB" bytearray (so this looks like a signed 32 bit integer issue). |
This is likely a platform bug, it fails with os.write as well. Interestingly enough file.write works fine on Python 2.7 (which uses stdio), that appearently works around this kernel misfeature. A possible partial workaround is recognise this error in the implementation of os.write and then perform a partial write. Problem is: while write(2) is documented as possibly writing less data than expected most users writing to normal files (as opposed to sockets) probably don’t expect that behavior. On the other hand, os.write already limits writes to INT_MAX on Windows (see _Py_write in Python/fileutils.c) Because of this I’m in favour of adding a simular workaround on OSX (and can provide a patch). BTW. the manpage for write says that writev(2) might fail with EINVAL:
I wouldn’t be surprised if write(2) is implemented using writev(2) and that this explains the problem.
|
The attached patch is a first stab at a workaround. It will unconditionally limit the write size in os.write to INT_MAX on OSX. I haven't tested yet if this actually fixes the problem mentioned on stack overflow. |
Thank you for looking into this, Ronald. What does your patch do, exactly? does it only limit the returned byte count, or does it really limit the size of the data written by truncating it? In any case, it would be very useful to have a warning from the Python interpreter. If the data is truncated, I would even prefer an explicit exception (e.g. "data too big for this platform (>= 2 GB)"), along with an explicit mention of it in the documentation. What do you think? |
The patch limits os.write to writing at most INT_MAX bytes on OSX. Buffered I/O using open("/some/file", "wb") should still write all data (at least according to the limited tests I've done so far). The same limitation is already present on Windows. And as I wrote before: os.write may accoding to the manpage for write(2) already write less bytes than requested. I'm -1 on using an explicit exception or printing a warning about this. |
I see, thanks. This sounds good to me too: no need for a warning or exception, indeed, since file.write() should work and the behavior of os.write() is documented. |
The Windows limit to INT_MAX is one many functions:
In the default branch, there is now _Py_write(), so only one place should be fixed. See the issue bpo-11395 which fixed the bug on Windows. If it's a bug, it should be fixed on Python 2.7, 3.4, 3.5 and default branches. |
The patch I attached earlier is for the default branch. More work is needed for the other active branches. |
I don't know how helpful it is at this point, but the issue happens while reading also. Here's some related discussion in the numpy tracker: numpy/numpy#3858 (The claim was that OSX Mavericks fixed this issue, it didn't, and there is an Apple bug ID in there somewhere, plus there is a link to a patch the torch folks used) and also in pandas: pandas-dev/pandas#10641 I'd be happy to try to test patches out. |
Indeed, read(2) has the same problem. I just tested this with a small C program. I'll rework the patch for this, and will work on patches for 3.4/3.5 and 2.7 as well. |
Write still fails on 3.5.1 and OS X 10.11.2. I'm no dev, so can someone explain how to use the patch while it's under review? |
Here is my patch 3.6, I am going to provide the patch for 3.5 |
Sorry, I was busy with a task but here is my patch for 3.5, in fact, it's just the same for 3.6 |
ping |
Ned Deily, I added you because you are in the expert for the OSX platform. |
Victor, could you check the new patch ? |
upload a new version |
Hello.... I just updated this ticket with a PR on Github. |
I see that we have other clamps on Windows using INT_MAX:
Are these functions ok on macOS? If not, a new issue should be opened ;-) |
what do you suggest ? |
I don't say that something is broken. Just that it would be nice if someone On Windows, the bug was obvious: the function takes a C int... |
Hi all, Could you test the PR with Windows? I don't have a Windows computer. Thank you, Stéphane |
Nosying myself since I just landed here based on an internal $work bug report. We're seeing it with reads. I'll try to set aside some work time to review the PRs. |
Hi @barry normally this issue is fixed for 3.x but I need to finish my PR for 2.7. I think to fix for 2.7 in the next weeks. |
Since 3.x is fixed and 2.7 has reached EOL, I'm closing the issue. Thanks for getting it fixed in 3.x, Stephane and Victor! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: