ompi/win: use type int* for MPI_WIN_DISP_UNIT, MPI_WIN_CREATE_FLAVOR … #1018

ggouaillardet · 2016-03-11T00:32:34Z

…and MPI_WIN_MODEL

Thanks Alastair McKinstry for the report.

(cherry picked from commit open-mpi/ompi@d08fb46)

…and MPI_WIN_MODEL Thanks Alastair McKinstry for the report. (cherry picked from commit open-mpi/ompi@d08fb46)

mellanox-github · 2016-03-11T01:14:31Z

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ompi-release-pr/1427/ for details.

ggouaillardet · 2016-03-11T01:20:59Z

:bot:assign: @hjelmn
:bot:milestone:v2.0.0
:bot🏷️bug

hjelmn · 2016-03-11T03:19:29Z

@jsquyres I have no idea what the change means. Fortran and attributes.

ggouaillardet · 2016-03-11T04:19:48Z

@hjelmn ompi_attr_set_fortran_mpi2 expects an attribute of type MPI_Aint whereas ompi_attr_set_fortran_mpi1 expects an attribute of type MPI_Fint.

attribute types are defined in table 11.1 (chapter 11.2.6 page 414 of MPI 3.1 standard)

i can only guess there are historical reasons for using both fortran and mpi1 vs mpi2 ...

jsquyres · 2016-03-11T11:37:50Z

ompi/win/win.c

@@ -167,18 +167,18 @@ config_window(void *base, size_t size, int disp_unit,
                                     MPI_WIN_SIZE, size, true);


"size" here isn't an MPI_Aint -- it's a size_t. In most situations, they'll likely be the same -- but shouldn't this be cast to be 100% safe?

@hjelmn Actually, size_t is unsigned here, and MPI_Aint is a ptrdiff_t, which is signed. Can't this cause a problem in some cases?

ggouaillardet · 2016-03-11T11:48:10Z

this is a bug reported at open-mpi/ompi#1446
it occurs only on big endian platforms, I updated ibm/onesided/c_win_attr to test this
I agree that strictly speaking, I assumed size_t is equivalent to MPI_Aint, and int is equivalent to MPI_Fint

jsquyres · 2016-03-11T14:05:56Z

@ggouaillardet I guess I'm confused as to how this fixed the bug. 😕 Is there some subtlety about 4-byte integer to 8-byte integer promotion that is different between little and big endian machines that would explain what is happening here? Or maybe something about a signed to unsigned promotion...?

ggouaillardet · 2016-03-11T14:16:31Z

my latest theory is that the change is only visible when retrieving the attribute ...
I still need to think about it

jsquyres · 2016-03-11T15:52:24Z

I talked a little about this with @goodell -- our theory is that it's not a problem of the promotion of the argument, but rather in a difference between how the _mpi1() vs. _mpi2() functions store the value, and ultimately how the values are read.

jsquyres · 2016-03-14T17:23:48Z

👍

ompi/win: use type int* for MPI_WIN_DISP_UNIT, MPI_WIN_CREATE_FLAVOR …

dalcinl · 2016-03-16T08:49:02Z

@jsquyres What about the (maybe unlikely) case where sizeof(MPI_Fint) > sizeof(int)? In that case, the proposed fix will break in the exactly same way.

jsquyres · 2016-03-16T10:28:17Z

@dalcinl I think we should be ok. I'm pretty sure the problem was not a promotion issue, but rather how _mpi1() was storing the value (vs. _mpi2()). When calling _mpi1(), the int will be promoted to an MPI_Fint. In the case you cited, it'll go from 4 bytes to 8 bytes. It'll then be stored as an 8-byte value, and be marked as such.

dalcinl · 2016-03-16T10:40:07Z

@jsquyres But then, it they are stored ad an 8 bytes values (sizeof(MPI_Fint)==8), C user code will read them as 4 bytes values (because these attributes are C int values accorting to the MPI std), and then endian will bite you again.

IOW, if the hypothetical case that sizeof(MPI_Fint) == sizeof(MPI_Aint), It seems to me that the *_mpi1() and *_mpi2() routines will work equivalently, they will store the values in the same way. Then if sizeof(MPI_Fint) > sizeof(int), both sets or routines are not appropriate in a big-endian system. Maybe I'm missing something in this transitive reasoning.

ggouaillardet · 2016-03-16T10:47:40Z

@dalcinl iirc, the Fortran binding/glue does the conversion between int and Fint.
the only corner case I can think of, is if you create a window in fortran with 64 bit Fint and a disp unit that does not fit on 32 bits ...

dalcinl · 2016-03-16T11:38:12Z

@ggouaillardet This is not about Fortran bindings or inter-language operability. It is about how attributes are stored/read.

In MPI, attribute values can be either int, MPI_Aint, or MPI_Fint (the last one to support MPI-1 attributes). Open MPI handles correctly the second and the third, but not the first. If sizeof(int)==sizeof(MPI_Fint), you can support the first using the implementation for the third (this is proposed fix under discussion), however if sizeof(int)<sizeof(MPI_Fint), it will break in big-endian architectures.

As I do not expect sizeof(int)<sizeof(MPI_Fint) be a common case, we can just ignore all my previous comments. But I wanted to put a word about it here, just in the very unlikely case this issue pops-out again.

jsquyres · 2016-03-16T11:59:06Z

@dalcinl Are you sure? See https://github.com/open-mpi/ompi/blob/master/ompi/attribute/attribute.c#L27-L192. When the _mpi1() function is used to write the variable (i.e., an MPI_Fint), then you're asking if cases 4, 5 and 6 are implemented correctly if sizeof(int) != sizeof(MPI_Fint).

These functions (https://github.com/open-mpi/ompi/blob/master/ompi/attribute/attribute.c#L27-L192) read the attribute values, and then call the relevant translate_to_FOO(). Are you saying that there's an error in cases 4, 5, or 6 in those 3 translate_to_FOO() functions?

dalcinl · 2016-03-16T13:46:05Z

Yes, it seems to me that something is wrong. I'll try to elaborate what I'm getting from the actual code:

For now on, let's assume sizeof(MPI_Fint) == sizeof(MPI_Aint) < sizeof(int)

Look at ompi_attr_init(), this routine will initialize int_pos to 0, not matter what the endian the platform is, simply becase void* and MPI_Fint are the same size. Am I right?
Now look at attribute_value_construct(), as int_pos==0, the slots av_address_kind_pointer and av_integer_pointer will be initialized to the same address. Am I right?
Finally, when you store an attribute using either *_mpi1() or *_mpi2() routines, the final bits stored in the attribute should be the same. Then, if using *_mpi2() to store the value was wrong on big-endian, I fail to see how *_mpi1() would be right!!

jsquyres · 2016-03-16T16:02:54Z

I'm not sure I understand your premise:

For now on, let's assume sizeof(MPI_Fint) == sizeof(MPI_Aint) < sizeof(int)

I don't know of any platform where that is true. Specifically if MPI_Aint is smaller than an int, I think lots of places in OMPI will have problems...!

Did you mean >?

Regardless, I think I follow your example. But I think the last line is wrong, because your premise (either with > or <) wasn't correct: in the bug that was reported, sizeof(MPI_Fint) < sizeof(MPI_Aint). Hence, the problem wasn't an endian issue in that case -- it was a size issue.

So in your example, if sizeof(MPI_Fint) == sizeof(MPI_Aint) > sizeof(int) (note the >), then technically using _mpi1() or _mpi2() would be correct, because they're both the same size (although semantically it's still better to use _mpi1(), because it expects an MPI_Fint).

Does that sound right?

dalcinl · 2016-03-17T06:46:56Z

Sorry, yes, I meant >. I agree with yo that using *_mpi() is semantically better, and the right thing as long as sizeof(int) == sizeof(MPI_Fint). However, if sizeof(int) < sizeof(MPI_Fint), AFAICT the endian issue is not solved. Anyway, I would say sizeof(int) < sizeof(MPI_Fint) would be a rather rare configure choice, so we can ignore the issue. Still, I would prefer if the code had some preprocessor guards and #error in such situation, but at this point I think we already lost quite a bit of time discussing a potential issue that has not shown up, and that we have not easy way to tests.

jsquyres · 2016-03-17T10:47:27Z

No worries, I agree that it would be good to fix this case. And sizeof(int) < sizeof(MPI_Fint) is a case that Open MPI is supposed to be able to handle (a small number of users use Fortran compiler -i8 types of options to force 8-byte INTEGERs).

@ggouaillardet Do you have a Fortran compiler on a big endian machine where you can force sizeof(INTEGER)==8 while leaving sizeof(int)==4?

ggouaillardet · 2016-03-17T11:57:55Z

@dalcinl i finally got your point... and I prepared a fix I will test tomorrow
@jsquyres I should be able to find that ...
out of curiosity, is there any compiler out there that can make a C int 64 bits ?

jsquyres · 2016-03-17T13:11:44Z

@ggouaillardet I thought icc might have an option for making sizeof(int)==8, but I don't see an obvious option in icc --help.

ompi/win: use type int* for MPI_WIN_DISP_UNIT, MPI_WIN_CREATE_FLAVOR …

ca6a4b0

…and MPI_WIN_MODEL Thanks Alastair McKinstry for the report. (cherry picked from commit open-mpi/ompi@d08fb46)

ompiteam-bot assigned hjelmn Mar 11, 2016

ompiteam-bot added this to the v2.0.0 milestone Mar 11, 2016

ompiteam-bot added the bug label Mar 11, 2016

jsquyres reviewed Mar 11, 2016
View reviewed changes

jsquyres mentioned this pull request Mar 11, 2016

wrong attributes on MPI_win_create() on big-endian systems? open-mpi/ompi#1446

Closed

ompiteam-bot added the reviewed label Mar 14, 2016

hppritcha added the RM-Approved label Mar 14, 2016

hppritcha added a commit that referenced this pull request Mar 14, 2016

Merge pull request #1018 from ggouaillardet/topic/v2.x/win_attr_types

3cb696d

ompi/win: use type int* for MPI_WIN_DISP_UNIT, MPI_WIN_CREATE_FLAVOR …

hppritcha merged commit 3cb696d into open-mpi:v2.x Mar 14, 2016

ggouaillardet mentioned this pull request Mar 18, 2016

Topic/fortran attributes open-mpi/ompi#1473

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ompi/win: use type int* for MPI_WIN_DISP_UNIT, MPI_WIN_CREATE_FLAVOR … #1018

ompi/win: use type int* for MPI_WIN_DISP_UNIT, MPI_WIN_CREATE_FLAVOR … #1018

ggouaillardet commented Mar 11, 2016

mellanox-github commented Mar 11, 2016

ggouaillardet commented Mar 11, 2016

hjelmn commented Mar 11, 2016

ggouaillardet commented Mar 11, 2016

jsquyres Mar 11, 2016

jsquyres Mar 11, 2016

ggouaillardet commented Mar 11, 2016

jsquyres commented Mar 11, 2016

ggouaillardet commented Mar 11, 2016

jsquyres commented Mar 11, 2016

jsquyres commented Mar 14, 2016

dalcinl commented Mar 16, 2016

jsquyres commented Mar 16, 2016

dalcinl commented Mar 16, 2016

ggouaillardet commented Mar 16, 2016

dalcinl commented Mar 16, 2016

jsquyres commented Mar 16, 2016

dalcinl commented Mar 16, 2016

jsquyres commented Mar 16, 2016

dalcinl commented Mar 17, 2016

jsquyres commented Mar 17, 2016

ggouaillardet commented Mar 17, 2016

jsquyres commented Mar 17, 2016

		@@ -167,18 +167,18 @@ config_window(void *base, size_t size, int disp_unit,
		MPI_WIN_SIZE, size, true);

ompi/win: use type int* for MPI_WIN_DISP_UNIT, MPI_WIN_CREATE_FLAVOR … #1018

ompi/win: use type int* for MPI_WIN_DISP_UNIT, MPI_WIN_CREATE_FLAVOR … #1018

Conversation

ggouaillardet commented Mar 11, 2016

mellanox-github commented Mar 11, 2016

ggouaillardet commented Mar 11, 2016

hjelmn commented Mar 11, 2016

ggouaillardet commented Mar 11, 2016

jsquyres Mar 11, 2016

Choose a reason for hiding this comment

jsquyres Mar 11, 2016

Choose a reason for hiding this comment

ggouaillardet commented Mar 11, 2016

jsquyres commented Mar 11, 2016

ggouaillardet commented Mar 11, 2016

jsquyres commented Mar 11, 2016

jsquyres commented Mar 14, 2016

dalcinl commented Mar 16, 2016

jsquyres commented Mar 16, 2016

dalcinl commented Mar 16, 2016

ggouaillardet commented Mar 16, 2016

dalcinl commented Mar 16, 2016

jsquyres commented Mar 16, 2016

dalcinl commented Mar 16, 2016

jsquyres commented Mar 16, 2016

dalcinl commented Mar 17, 2016

jsquyres commented Mar 17, 2016

ggouaillardet commented Mar 17, 2016

jsquyres commented Mar 17, 2016