Skip to content

Conversation

@edgargabriel
Copy link
Member

@edgargabriel edgargabriel commented Jan 23, 2017

Fixes a bug reported on the mailing list. ompio did only reposition the individual
file pointer when the file was opened in append mode. Set the shared file
pointer also to point to the end of the file, similarly to the individual
file pointer.

This is the equivalent to commit d3a8d38
on master, cannot be cherry-picked because of differences in the organization
of the ompio files ( usage of mca/common/ompio on master).

Revert the logic of io_ompio_sharedfp_lazy_open. The user now has to explicitely
disable shared fp in order for the structures not to be allocated.
Otherwise, resetting the shared fp e.g. in case the file was opened
in append mode will not work correctly, the code could deadlock.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
@jsquyres jsquyres added this to the v2.0.2 milestone Jan 23, 2017
@jsquyres jsquyres added the bug label Jan 23, 2017
@jsquyres
Copy link
Member

@edgargabriel Got a compile fail:

io_ompio_file_open.c: In function 'ompio_io_ompio_file_open':
io_ompio_file_open.c:257:17: error: 'shared_fp_base_module' undeclared (first use in this function)
                 shared_fp_base_module = ompio_fh->f_sharedfp;
                 ^
io_ompio_file_open.c:257:17: note: each undeclared identifier is reported only once for each function it appears in

Can you have a look?

@ibm-ompi
Copy link

Build Failed with GNU compiler! Please review the log, and get in touch if you have questions.

Gist: https://gist.github.com/67e65d9823b05c596456c25f9b560ef0

@ibm-ompi
Copy link

Build Failed with XL compiler! Please review the log, and get in touch if you have questions.

Gist: https://gist.github.com/e5df2d6317ee8e58798fb2a9ac2a3d9d

@ibm-ompi
Copy link

Build Failed with PGI compiler! Please review the log, and get in touch if you have questions.

Gist: https://gist.github.com/d1f0dc9ac8383165963718d6a64e667d

Fixes a bug reported on the mailing list. ompio did only reposition the individual
file pointer when the file was opened in append mode. Set the shared file
pointer also to point to the end of the file, similarly to the individual
file pointer.

This is the equivalent to commit d3a8d38
on master, cannot be cherry-picked because of differences in the organization
of the ompio files ( usage of mca/common/ompio on master).

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
@edgargabriel edgargabriel force-pushed the pr/sharedfp-append-fix-v2.0.x branch from fa6bf24 to 537c652 Compare January 24, 2017 14:23
@edgargabriel
Copy link
Member Author

fixed the patch, compiled and run my testsuite with v2.0.x to be safe, seems to work.

@hppritcha hppritcha requested a review from jsquyres January 24, 2017 17:43
@edgargabriel
Copy link
Member Author

Hm. I'm seeing on master some problems that I did not have yesterday, not sure what changed. I fixed the issues there. It does not happen on v2.0.x, right now, but it actually should. I would prefer to add one more commit to this pr to be on the safe side,. Will work on this later today.

Copy link
Member

@jsquyres jsquyres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a bunch of opal_output's in ompi/mca/io/ompio/io_ompio_file_open.c that really should be converted to opal_show_help()'s. But that's larger than this specific PR. It would be nice to see another PR to convert these opal_output's on the error cases to opal_show_help()s. 😄

@edgargabriel
Copy link
Member Author

ok, but please hold back with merging. As I mentioned, we might have to bring in another commit.

@jsquyres
Copy link
Member

@edgargabriel Oh! I missed your comment above -- got it. Will wait; thanks. If you have to choose on where to spend your time, fixing the bug should be considered more important than s/opal_output/opal_show_help/. Thanks!

when a file is opened a second time for shared file pointer operations,
avoid setting the create and exclusive flag.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
it looks like disabling the lazy_open flag for sharedfp components
revealead a bug that lead to a crash in file_close in some tests. Make
sure the SHAREDFP_IS_SET flag is correctly set (and not overwritten again),
and we use that to avoid a double-free of the communicator.

This is equivalent to commit commit f5289a1
on master. The commit cannot be cherry-picked due to the differences in the
organization of the ompio files (ompio uses now common/ompio for many files).

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
@edgargabriel
Copy link
Member Author

travis did not finish for the master branch, everything else worked. I tested this patch on 2.0.x fairly thoroughly, so I think we can start to evaluate the overall patch in parallel to master.

@edgargabriel
Copy link
Member Author

@jsquyres regarding the conversion to use show_help() instead of opal_output(), I would prefer to do that on master first. I can look into this, but am right now a little tight on time, so I would prefer to postpone this a bit (february/march roughly), unless you think it is critical.

@jsquyres
Copy link
Member

@edgargabriel Nah, I think the show_help() thing can wait. Not critical for v2.0.2, but it should be done at some point.

@jsquyres
Copy link
Member

@edgargabriel Is this now done?

@jsquyres
Copy link
Member

@edgargabriel We're going to make an RC tomorrow. Let us know if you can finish this in time. Thanks!

@edgargabriel
Copy link
Member Author

edgargabriel commented Jan 26, 2017

it is in theory done, I would have prefered to see the results of the MTT overnight on master before we merge it into v2.0.x, but we can go ahead and look at both MTT results.

@jsquyres
Copy link
Member

@edgargabriel We're having lots of discussions about time-based releases here. We're trying this out by saying "2.0.2 RC tomorrow. Get your fixes in." I'll bring you up to speed out-of-band... 😄

@edgargabriel
Copy link
Member Author

edgargabriel commented Jan 26, 2017

@jsquyres I was actually ready with the fix yesterday around noon, but travis was basically hanging, and I couldn't merge the pr on master until in this morning. Anyway, I will support your decision whether you decide to merge before the rc candidate is created or not.

@jsquyres
Copy link
Member

Cool. Travis has been out to lunch recently. We'll go ahead and merge this for v2.0.2 tomorrow morning, just to check MTT overnight.

@hppritcha
Copy link
Member

@edgargabriel could you post the url of the email on the mail list where this problem was originally reported? thanks.

@edgargabriel
Copy link
Member Author

@edgargabriel
Copy link
Member Author

The new MTT on master looks good, the MPI I/O errors from the night before are gone as far as I can see.

@hppritcha hppritcha merged commit d05205c into open-mpi:v2.0.x Jan 26, 2017
@edgargabriel edgargabriel deleted the pr/sharedfp-append-fix-v2.0.x branch July 17, 2017 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants