Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI_File_Write* bug due to uncatched write(v) calls for iov_len > 2,147,479,552 bytes #2399

Open
cniethammer opened this issue Nov 11, 2016 · 11 comments

Comments

@cniethammer
Copy link
Contributor

There is a bug with MPI_File_write* operations in Open MPI which truncates files or writes incorrect data to files.
I broke the problem down to the POSIX writev call used in ompi/mca/fbtl/posix/fbtl_posix_pwritev.c that will not write more than 2,147,479,552 bytes to disk. From man 2 write NOTES:

On Linux, write() (and similar system calls) will transfer at most 0x7ffff000 (2,147,479,552) bytes,
returning the number of bytes actually transferred. (This is true on both 32-bit and 64-bit systems.)
We do not catch the case when the iov contains elements with iov_len > 2,147,479,552 at the moment which causes this issues with large I/O blocks and files.

My suggestions:
Either (1) check for writes completing with the correct number of written bytes and adding another write or (2) modify the convertor used in mca_io_ompio_build_io_array/ompi_io_ompio_decode_datatype to build an iov struct with elements not larger than 2,147,479,552 bytes when using POSIX I/O on Linux systems.

@edgargabriel
Copy link
Member

Christoph,

thank you for your bug report and analysis. I think the second
solution is already implemented, at least for blocking operations, it is
just not active by default. Could you try to rerun your test case and
set the mca parameter

mca_io_ompio_cycle_buffer_size

to something (e.g. 1 GB or similar, value has to be given in bytes
however), and see whether this fixes the problem? If this solution
works, we might just have to change the default value for this mca
parameter.

THis would not solve the issue for non-blocking operations. We do have
the logic for splitting the write sequences into multiple chunks for the
aio_write function calls in the posix fbtl already in the code, just in
a slightly different context. I will look into that part as well.

Also as a side note, I am fairly confident that the problem should not
arise for collective write operations, since they are broken down into
cycles of e.g. 32MB anyway.

Thanks

Edgar

On 11/11/2016 3:58 AM, cniethammer wrote:

There is a bug with MPI_File_write* operations in Open MPI which
truncates files or writes incorrect data to files.
I broke the problem down to the POSIX writev call used in
ompi/mca/fbtl/posix/fbtl_posix_pwritev.c that will not write more than
2,147,479,552 bytes to disk. From man 2 write NOTES:

On Linux, write() (and similar system calls) will transfer at most
0x7ffff000 (2,147,479,552) bytes,
returning the number of bytes actually transferred. (This is true
on both 32-bit and 64-bit systems.)
We do not catch the case when the iov contains elements with
iov_len > 2,147,479,552 at the moment which causes this issues
with large I/O blocks and files.

My suggestions:
Either (1) check for writes completing with the correct number of
written bytes and adding another write or (2) modify the convertor
used in mca_io_ompio_build_io_array/ompi_io_ompio_decode_datatype to
build an iov struct with elements not larger than 2,147,479,552 bytes
when using POSIX I/O on Linux systems.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#2399, or mute the thread
https://github.com/notifications/unsubscribe-auth/AH22HibitjylZCy1K-zKXowC12PmD0dVks5q9Dw7gaJpZM4KvmAC.

@cniethammer
Copy link
Contributor Author

Hello Edgar,

Thanks for pointing out this option. I can confirm that setting
-mca mca_io_ompio_cycle_buffer_size 1073741824
results in writing the correct amount of data to disk.
I welcome setting a default which is safe for the user - e.g. to the Linux write limit - accepting a possible performance loss, which is obviously better than losing data at this point.

Best
Christoph

@edgargabriel
Copy link
Member

Christoph,
I am open to any reasonable value, can you make a suggestion?
Thanks
Edgar

@jsquyres jsquyres added the bug label Nov 14, 2016
@jsquyres jsquyres added this to the v2.0.2 milestone Nov 14, 2016
@jsquyres
Copy link
Member

@edgargabriel @cniethammer If you guys want this in v2.0.2, please come to a decision ASAP (i.e., today/tomorrow). Otherwise we're going to have to push this to v2.0.3.

@edgargabriel
Copy link
Member

this is the bug fix that was merged a couple of days back with the default value for the parameter. So it is done for now, I will work on the second solution that we discussed over the break.

@jsquyres
Copy link
Member

Ah! Ok, my bad -- I should have read closer / remembered better. 😊

@jsquyres jsquyres modified the milestones: v2.1.0, v2.0.2 Dec 13, 2016
@jsquyres
Copy link
Member

@edgargabriel @cniethammer It's been about 2 months since the last update -- any progress? The door for v2.1.0 is just about closed. I'm going to pre-emptively move this to v2.1.1; feel free to move to v3.0.0 if you think that is more realistic.

@jsquyres jsquyres modified the milestones: v2.1.1, v2.1.0 Feb 22, 2017
@edgargabriel
Copy link
Member

feel free to move it to 3.0, I will not get to it. The fix that we committed for the 2.0 series (and 2.1, but will double check that), should do the trick for now.

@jsquyres jsquyres modified the milestones: v3.0.0, v2.1.1 Feb 22, 2017
@jsquyres
Copy link
Member

@edgargabriel Done; thanks.

@hppritcha
Copy link
Member

@edgargabriel should we just move this to future or do you think you can get a fix in to 3.0.1?

@edgargabriel
Copy link
Member

@hppritcha it is on my to do list for this summer, but I don't think it will make it for 3.0.1, I hope to have it ready for 3.1 (or 4.0 whatever the next release is).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants