Skip to content

Conversation

@jjhursey
Copy link
Member

@jjhursey jjhursey commented Feb 6, 2017

Using MPI_MINLOC or MPI_MAXLOC with the following data types
leads to data corruption:

  • MPI_DOUBLE_INT
  • MPI_LONG_INT
  • MPI_SHORT_INT
  • MPI_LONG_DOUBLE_INT

Detect this print a error message and abort.
This workaround should be removed once the following issue is resolved:

Signed-off-by: Joshua Hursey jhursey@us.ibm.com
(cherry picked from commit 94f92f6)
Signed-off-by: Joshua Hursey jhursey@us.ibm.com

Using MPI_MINLOC or MPI_MAXLOC with the following data types
leads to data corruption:
 * MPI_DOUBLE_INT
 * MPI_LONG_INT
 * MPI_SHORT_INT
 * MPI_LONG_DOUBLE_INT

Detect this print a error message and abort.
This workaround should be removed once the following issue is resolved:
 * open-mpi#1666

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
(cherry picked from commit 94f92f6)
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
@jjhursey jjhursey added the bug label Feb 6, 2017
@jjhursey jjhursey added this to the v2.0.3 milestone Feb 6, 2017
@jjhursey jjhursey requested a review from gpaulsen February 6, 2017 16:13
@jjhursey
Copy link
Member Author

jjhursey commented Feb 6, 2017

Refs PR #2832

@hjelmn
Copy link
Member

hjelmn commented Feb 6, 2017

These datatypes and ops need to die :-/.

Copy link
Member

@jsquyres jsquyres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consolidate the opal_output and opal_show_help into a single, descriptive/useful show_help message.

@nysal
Copy link
Member

nysal commented Feb 8, 2017

@jjhursey I think this patch will only work for osc/pt2pt. osc/rdma will probably need a similar check in ompi_osc_rdma_rget_accumulate_internal()

Copy link
Member

@gpaulsen gpaulsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put this in now, and create a new PR for ompi_osc_rdma_rget_accumulate_internal().

@hjelmn
Copy link
Member

hjelmn commented Feb 16, 2017

osc/pt2pt should be trivial to update to support for minloc/maxloc. It is a slight modification to osc_pt2pt_accumulate_buffer. The hetero #ifdef needs to be removed and the if needs to check if the datatype is non-contiguous.

I have no interest in making either MPI_MINLOC or MPI_MAXLOC work with osc/rdma (outside just treating MPI_SHORT_INT as contiguous) so I welcome a check that makes it error.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
@jjhursey jjhursey force-pushed the topic/ibm/v2.0.x/osc-base-dt-abort branch from fe1a6a3 to e2f0e43 Compare February 17, 2017 16:32
@jjhursey
Copy link
Member Author

@nysal I updated the PR to include a commit for ompi_osc_rdma_rget_accumulate_internal() - can you and/or @hjelmn confirm that this is the correct/best placement for that logic in the osc/rdma component?

  Using MPI_MINLOC or MPI_MAXLOC with the following data types
  leads to data corruption:
   * MPI_DOUBLE_INT
   * MPI_LONG_INT
   * MPI_SHORT_INT
   * MPI_LONG_DOUBLE_INT

  Detect this print a error message and abort.
  This workaround should be removed once the following issue is resolved:
   * open-mpi#1666

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
@jjhursey jjhursey force-pushed the topic/ibm/v2.0.x/osc-base-dt-abort branch from e2f0e43 to 2ab65cb Compare February 17, 2017 19:33
@jsquyres
Copy link
Member

@hppritcha Good to go

@hppritcha hppritcha merged commit e2c5ce0 into open-mpi:v2.0.x Feb 21, 2017
@jjhursey jjhursey deleted the topic/ibm/v2.0.x/osc-base-dt-abort branch February 21, 2017 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants