Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tofile() truncation on arrays >= 2**32 on 64-bit OSX (Trac #2114) #574

Closed
numpy-gitbot opened this issue Oct 19, 2012 · 4 comments
Closed

Comments

@numpy-gitbot
Copy link

Original ticket http://projects.scipy.org/numpy/ticket/2114 on 2012-04-24 by trac user embray, assigned to unknown.

After a fair bit of debugging we've tracked down a bug in OSX's fwrite() (actually in an internal function that affects fwrite(), fprintf(), and other functions that write to a file handle). This bug was originally discovered by trying to write out some large arrays with Numpy. As far as I can tell (from some Google searches) this bug isn't otherwise well known yet.

The bug is that at some point the size passed to fwrite() is stuffed into a 32-bit register and checks if it's a multiple of 0x1000 (4096) and then branches off to some separate routine for doing writes that are a multiple of one block size.

Thus, if the size is a multiple of 4096 and >= 2**32, the size gets silently truncated to size & 0xffffffff.

The attached test program illustrates the problem. This has been tested and been shown buggy on Leopard and Lion (and so presumably the bug exists in Snow Leopard--not sure about earlier OSX versions).

This is what the output looks like:

$ gcc -g -Wall -arch x86_64 -Wextra writetest.c -o writetest 
$ ./writetest 0x100000000 && ls -l test.array
size_t bytes: 8
array size: 4294967296
array size cast as size_t: 4294967296
wrote 4294967296 bytes
-rw-r--r--  1 embray  31  0 Apr 24 11:03 test.array

As you can see, fwrite() even returns that it wrote "4294967296 bytes", though in reality it wrote zero bytes. Likewise:

$ ./writetest 0x100001000 && ls -l test.array
size_t bytes: 8
array size: 4294971392
array size cast as size_t: 4294971392
wrote 4294971392 bytes
-rw-r--r--  1 embray  31  4096 Apr 24 11:04 test.array

Further testing has shown that this holds for any multiple of 4096.

The fix that was implemented for #2256, where arrays are written in 2GB chunks, would also solve this problem. So I think it would probably be sufficient to just enable the same chunked write code block in PyArray_ToFile() on OSX as well.

Although the OSX bug only occurs on those 4K boundaries and only for sizes >= 2**32, for the sake of simplicity I think it's fine to just use more or less the same workaround.

@numpy-gitbot
Copy link
Author

Attachment added by trac user embray on 2012-04-24: writetest.c

@numpy-gitbot
Copy link
Author

@charris wrote on 2012-04-27

Arrggghhhh...

I h8 these buggy OS workarounds. But we are here to serve ;) If you can put together a pull request with a fix and test using #2256 as a template I'll put it in. And thanks for tracking it down.

@numpy-gitbot
Copy link
Author

trac user embray wrote on 2012-04-27

Believe me, I hate them just as much. "OS X Lion - The world's most advanced OS...that can't write files properly."

Sure, I'll put together a pull request to fix this.

@charris
Copy link
Member

charris commented May 5, 2014

I believe this is fixed in Mavericks. Please reopen if there is a continuing problem.

@charris charris closed this as completed May 5, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants