Skip to content
This repository has been archived by the owner on Sep 30, 2022. It is now read-only.

ompi/win: use type int* for MPI_WIN_DISP_UNIT, MPI_WIN_CREATE_FLAVOR … #1018

Merged
merged 1 commit into from Mar 14, 2016

Conversation

ggouaillardet
Copy link
Contributor

…and MPI_WIN_MODEL

Thanks Alastair McKinstry for the report.

(cherry picked from commit open-mpi/ompi@d08fb46)

…and MPI_WIN_MODEL

Thanks Alastair McKinstry for the report.

(cherry picked from commit open-mpi/ompi@d08fb46)
@mellanox-github
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ompi-release-pr/1427/ for details.

@ggouaillardet
Copy link
Contributor Author

:bot:assign: @hjelmn
:bot:milestone:v2.0.0
:bot🏷️bug

@ompiteam-bot ompiteam-bot added this to the v2.0.0 milestone Mar 11, 2016
@hjelmn
Copy link
Member

hjelmn commented Mar 11, 2016

@jsquyres I have no idea what the change means. Fortran and attributes.

@ggouaillardet
Copy link
Contributor Author

@hjelmn ompi_attr_set_fortran_mpi2 expects an attribute of type MPI_Aint whereas ompi_attr_set_fortran_mpi1 expects an attribute of type MPI_Fint.

attribute types are defined in table 11.1 (chapter 11.2.6 page 414 of MPI 3.1 standard)

i can only guess there are historical reasons for using both fortran and mpi1 vs mpi2 ...

@@ -167,18 +167,18 @@ config_window(void *base, size_t size, int disp_unit,
MPI_WIN_SIZE, size, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"size" here isn't an MPI_Aint -- it's a size_t. In most situations, they'll likely be the same -- but shouldn't this be cast to be 100% safe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hjelmn Actually, size_t is unsigned here, and MPI_Aint is a ptrdiff_t, which is signed. Can't this cause a problem in some cases?

@ggouaillardet
Copy link
Contributor Author

this is a bug reported at open-mpi/ompi#1446
it occurs only on big endian platforms, I updated ibm/onesided/c_win_attr to test this
I agree that strictly speaking, I assumed size_t is equivalent to MPI_Aint, and int is equivalent to MPI_Fint

@jsquyres
Copy link
Member

@ggouaillardet I guess I'm confused as to how this fixed the bug. 😕 Is there some subtlety about 4-byte integer to 8-byte integer promotion that is different between little and big endian machines that would explain what is happening here? Or maybe something about a signed to unsigned promotion...?

@ggouaillardet
Copy link
Contributor Author

my latest theory is that the change is only visible when retrieving the attribute ...
I still need to think about it

@jsquyres
Copy link
Member

I talked a little about this with @goodell -- our theory is that it's not a problem of the promotion of the argument, but rather in a difference between how the _mpi1() vs. _mpi2() functions store the value, and ultimately how the values are read.

@jsquyres
Copy link
Member

👍

hppritcha added a commit that referenced this pull request Mar 14, 2016
ompi/win: use type int* for MPI_WIN_DISP_UNIT, MPI_WIN_CREATE_FLAVOR …
@hppritcha hppritcha merged commit 3cb696d into open-mpi:v2.x Mar 14, 2016
@dalcinl
Copy link

dalcinl commented Mar 16, 2016

@jsquyres What about the (maybe unlikely) case where sizeof(MPI_Fint) > sizeof(int)? In that case, the proposed fix will break in the exactly same way.

@jsquyres
Copy link
Member

@dalcinl I think we should be ok. I'm pretty sure the problem was not a promotion issue, but rather how _mpi1() was storing the value (vs. _mpi2()). When calling _mpi1(), the int will be promoted to an MPI_Fint. In the case you cited, it'll go from 4 bytes to 8 bytes. It'll then be stored as an 8-byte value, and be marked as such.

@dalcinl
Copy link

dalcinl commented Mar 16, 2016

@jsquyres But then, it they are stored ad an 8 bytes values (sizeof(MPI_Fint)==8), C user code will read them as 4 bytes values (because these attributes are C int values accorting to the MPI std), and then endian will bite you again.

IOW, if the hypothetical case that sizeof(MPI_Fint) == sizeof(MPI_Aint), It seems to me that the *_mpi1() and *_mpi2() routines will work equivalently, they will store the values in the same way. Then if sizeof(MPI_Fint) > sizeof(int), both sets or routines are not appropriate in a big-endian system. Maybe I'm missing something in this transitive reasoning.

@ggouaillardet
Copy link
Contributor Author

@dalcinl iirc, the Fortran binding/glue does the conversion between int and Fint.
the only corner case I can think of, is if you create a window in fortran with 64 bit Fint and a disp unit that does not fit on 32 bits ...

@dalcinl
Copy link

dalcinl commented Mar 16, 2016

@ggouaillardet This is not about Fortran bindings or inter-language operability. It is about how attributes are stored/read.

In MPI, attribute values can be either int, MPI_Aint, or MPI_Fint (the last one to support MPI-1 attributes). Open MPI handles correctly the second and the third, but not the first. If sizeof(int)==sizeof(MPI_Fint), you can support the first using the implementation for the third (this is proposed fix under discussion), however if sizeof(int)<sizeof(MPI_Fint), it will break in big-endian architectures.

As I do not expect sizeof(int)<sizeof(MPI_Fint) be a common case, we can just ignore all my previous comments. But I wanted to put a word about it here, just in the very unlikely case this issue pops-out again.

@jsquyres
Copy link
Member

@dalcinl Are you sure? See https://github.com/open-mpi/ompi/blob/master/ompi/attribute/attribute.c#L27-L192. When the _mpi1() function is used to write the variable (i.e., an MPI_Fint), then you're asking if cases 4, 5 and 6 are implemented correctly if sizeof(int) != sizeof(MPI_Fint).

These functions (https://github.com/open-mpi/ompi/blob/master/ompi/attribute/attribute.c#L27-L192) read the attribute values, and then call the relevant translate_to_FOO(). Are you saying that there's an error in cases 4, 5, or 6 in those 3 translate_to_FOO() functions?

@dalcinl
Copy link

dalcinl commented Mar 16, 2016

Yes, it seems to me that something is wrong. I'll try to elaborate what I'm getting from the actual code:

For now on, let's assume sizeof(MPI_Fint) == sizeof(MPI_Aint) < sizeof(int)

  • Look at ompi_attr_init(), this routine will initialize int_pos to 0, not matter what the endian the platform is, simply becase void* and MPI_Fint are the same size. Am I right?
  • Now look at attribute_value_construct(), as int_pos==0, the slots av_address_kind_pointer and av_integer_pointer will be initialized to the same address. Am I right?
  • Finally, when you store an attribute using either *_mpi1() or *_mpi2() routines, the final bits stored in the attribute should be the same. Then, if using *_mpi2() to store the value was wrong on big-endian, I fail to see how *_mpi1() would be right!!

@jsquyres
Copy link
Member

I'm not sure I understand your premise:

For now on, let's assume sizeof(MPI_Fint) == sizeof(MPI_Aint) < sizeof(int)

I don't know of any platform where that is true. Specifically if MPI_Aint is smaller than an int, I think lots of places in OMPI will have problems...!

Did you mean >?

Regardless, I think I follow your example. But I think the last line is wrong, because your premise (either with > or <) wasn't correct: in the bug that was reported, sizeof(MPI_Fint) < sizeof(MPI_Aint). Hence, the problem wasn't an endian issue in that case -- it was a size issue.

So in your example, if sizeof(MPI_Fint) == sizeof(MPI_Aint) > sizeof(int) (note the >), then technically using _mpi1() or _mpi2() would be correct, because they're both the same size (although semantically it's still better to use _mpi1(), because it expects an MPI_Fint).

Does that sound right?

@dalcinl
Copy link

dalcinl commented Mar 17, 2016

Sorry, yes, I meant >. I agree with yo that using *_mpi() is semantically better, and the right thing as long as sizeof(int) == sizeof(MPI_Fint). However, if sizeof(int) < sizeof(MPI_Fint), AFAICT the endian issue is not solved. Anyway, I would say sizeof(int) < sizeof(MPI_Fint) would be a rather rare configure choice, so we can ignore the issue. Still, I would prefer if the code had some preprocessor guards and #error in such situation, but at this point I think we already lost quite a bit of time discussing a potential issue that has not shown up, and that we have not easy way to tests.

@jsquyres
Copy link
Member

No worries, I agree that it would be good to fix this case. And sizeof(int) < sizeof(MPI_Fint) is a case that Open MPI is supposed to be able to handle (a small number of users use Fortran compiler -i8 types of options to force 8-byte INTEGERs).

@ggouaillardet Do you have a Fortran compiler on a big endian machine where you can force sizeof(INTEGER)==8 while leaving sizeof(int)==4?

@ggouaillardet
Copy link
Contributor Author

@dalcinl i finally got your point... and I prepared a fix I will test tomorrow
@jsquyres I should be able to find that ...
out of curiosity, is there any compiler out there that can make a C int 64 bits ?

@jsquyres
Copy link
Member

@ggouaillardet I thought icc might have an option for making sizeof(int)==8, but I don't see an obvious option in icc --help.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
7 participants