-
Notifications
You must be signed in to change notification settings - Fork 932
Pr/sharedfp append fix v2.0.x #2787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pr/sharedfp append fix v2.0.x #2787
Conversation
Revert the logic of io_ompio_sharedfp_lazy_open. The user now has to explicitely disable shared fp in order for the structures not to be allocated. Otherwise, resetting the shared fp e.g. in case the file was opened in append mode will not work correctly, the code could deadlock. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
|
@edgargabriel Got a compile fail: Can you have a look? |
|
Build Failed with GNU compiler! Please review the log, and get in touch if you have questions. Gist: https://gist.github.com/67e65d9823b05c596456c25f9b560ef0 |
|
Build Failed with XL compiler! Please review the log, and get in touch if you have questions. Gist: https://gist.github.com/e5df2d6317ee8e58798fb2a9ac2a3d9d |
|
Build Failed with PGI compiler! Please review the log, and get in touch if you have questions. Gist: https://gist.github.com/d1f0dc9ac8383165963718d6a64e667d |
Fixes a bug reported on the mailing list. ompio did only reposition the individual file pointer when the file was opened in append mode. Set the shared file pointer also to point to the end of the file, similarly to the individual file pointer. This is the equivalent to commit d3a8d38 on master, cannot be cherry-picked because of differences in the organization of the ompio files ( usage of mca/common/ompio on master). Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
fa6bf24 to
537c652
Compare
|
fixed the patch, compiled and run my testsuite with v2.0.x to be safe, seems to work. |
|
Hm. I'm seeing on master some problems that I did not have yesterday, not sure what changed. I fixed the issues there. It does not happen on v2.0.x, right now, but it actually should. I would prefer to add one more commit to this pr to be on the safe side,. Will work on this later today. |
jsquyres
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a bunch of opal_output's in ompi/mca/io/ompio/io_ompio_file_open.c that really should be converted to opal_show_help()'s. But that's larger than this specific PR. It would be nice to see another PR to convert these opal_output's on the error cases to opal_show_help()s. 😄
|
ok, but please hold back with merging. As I mentioned, we might have to bring in another commit. |
|
@edgargabriel Oh! I missed your comment above -- got it. Will wait; thanks. If you have to choose on where to spend your time, fixing the bug should be considered more important than s/opal_output/opal_show_help/. Thanks! |
when a file is opened a second time for shared file pointer operations, avoid setting the create and exclusive flag. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
it looks like disabling the lazy_open flag for sharedfp components revealead a bug that lead to a crash in file_close in some tests. Make sure the SHAREDFP_IS_SET flag is correctly set (and not overwritten again), and we use that to avoid a double-free of the communicator. This is equivalent to commit commit f5289a1 on master. The commit cannot be cherry-picked due to the differences in the organization of the ompio files (ompio uses now common/ompio for many files). Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
|
travis did not finish for the master branch, everything else worked. I tested this patch on 2.0.x fairly thoroughly, so I think we can start to evaluate the overall patch in parallel to master. |
|
@jsquyres regarding the conversion to use show_help() instead of opal_output(), I would prefer to do that on master first. I can look into this, but am right now a little tight on time, so I would prefer to postpone this a bit (february/march roughly), unless you think it is critical. |
|
@edgargabriel Nah, I think the show_help() thing can wait. Not critical for v2.0.2, but it should be done at some point. |
|
@edgargabriel Is this now done? |
|
@edgargabriel We're going to make an RC tomorrow. Let us know if you can finish this in time. Thanks! |
|
it is in theory done, I would have prefered to see the results of the MTT overnight on master before we merge it into v2.0.x, but we can go ahead and look at both MTT results. |
|
@edgargabriel We're having lots of discussions about time-based releases here. We're trying this out by saying "2.0.2 RC tomorrow. Get your fixes in." I'll bring you up to speed out-of-band... 😄 |
|
@jsquyres I was actually ready with the fix yesterday around noon, but travis was basically hanging, and I couldn't merge the pr on master until in this morning. Anyway, I will support your decision whether you decide to merge before the rc candidate is created or not. |
|
Cool. Travis has been out to lunch recently. We'll go ahead and merge this for v2.0.2 tomorrow morning, just to check MTT overnight. |
|
@edgargabriel could you post the url of the email on the mail list where this problem was originally reported? thanks. |
|
The new MTT on master looks good, the MPI I/O errors from the night before are gone as far as I can see. |
Fixes a bug reported on the mailing list. ompio did only reposition the individual
file pointer when the file was opened in append mode. Set the shared file
pointer also to point to the end of the file, similarly to the individual
file pointer.
This is the equivalent to commit d3a8d38
on master, cannot be cherry-picked because of differences in the organization
of the ompio files ( usage of mca/common/ompio on master).