Skip to content

Conversation

@jjhursey
Copy link
Member

@jjhursey jjhursey commented Feb 6, 2017

Using MPI_MINLOC or MPI_MAXLOC with the following data types
leads to data corruption:

  • MPI_DOUBLE_INT
  • MPI_LONG_INT
  • MPI_SHORT_INT
  • MPI_LONG_DOUBLE_INT

Detect this print a error message and abort.
This workaround should be removed once the following issue is resolved:

Signed-off-by: Joshua Hursey jhursey@us.ibm.com
(cherry picked from commit 94f92f6)
Signed-off-by: Joshua Hursey jhursey@us.ibm.com

Using MPI_MINLOC or MPI_MAXLOC with the following data types
leads to data corruption:
 * MPI_DOUBLE_INT
 * MPI_LONG_INT
 * MPI_SHORT_INT
 * MPI_LONG_DOUBLE_INT

Detect this print a error message and abort.
This workaround should be removed once the following issue is resolved:
 * open-mpi#1666

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
(cherry picked from commit 94f92f6)
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
@jjhursey jjhursey added the bug label Feb 6, 2017
@jjhursey jjhursey added this to the v2.1.0 milestone Feb 6, 2017
@jjhursey jjhursey requested a review from gpaulsen February 6, 2017 16:13
@jjhursey
Copy link
Member Author

jjhursey commented Feb 6, 2017

Refs PR #2832

Copy link
Member

@jsquyres jsquyres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consolidate the opal_output and opal_show_help into a single, descriptive/useful show_help message.

Copy link
Member

@gpaulsen gpaulsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me for a temporary stop-gap.

I'd like to see a write up of what Nathan's proposed solution might look like. We may (or may not) have time to work on this.

@nysal
Copy link
Member

nysal commented Feb 8, 2017

@jjhursey I think this patch will only work for osc/pt2pt. osc/rdma will probably need a similar check in ompi_osc_rdma_rget_accumulate_internal()

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
@jjhursey jjhursey force-pushed the topic/ibm/v2.x/osc-base-dt-abort branch from 362ff7b to 13d965f Compare February 17, 2017 16:41
@jjhursey
Copy link
Member Author

I've updated this PR to reflect the changes in PR #2927 - I'll keep them in sync if there are any further changes necessary.

  Using MPI_MINLOC or MPI_MAXLOC with the following data types
  leads to data corruption:
   * MPI_DOUBLE_INT
   * MPI_LONG_INT
   * MPI_SHORT_INT
   * MPI_LONG_DOUBLE_INT

  Detect this print a error message and abort.
  This workaround should be removed once the following issue is resolved:
   * open-mpi#1666

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
@jjhursey jjhursey force-pushed the topic/ibm/v2.x/osc-base-dt-abort branch from 13d965f to cc5747f Compare February 17, 2017 19:32
@jjhursey
Copy link
Member Author

bot:mellanox:retest

@jsquyres
Copy link
Member

@hppritcha Once CI finishes, good to go

@hppritcha hppritcha merged commit 6697514 into open-mpi:v2.x Feb 21, 2017
@jjhursey jjhursey deleted the topic/ibm/v2.x/osc-base-dt-abort branch February 21, 2017 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants