Skip to content
This repository was archived by the owner on Sep 30, 2022. It is now read-only.

Conversation

@hjelmn
Copy link
Member

@hjelmn hjelmn commented Mar 15, 2016

A bus error occurs in sm OSC under the following conditions.

  • sparc64 or any other architectures which need strict alignment.
  • MPI_WIN_POST or MPI_WIN_START is called for a window created
    by sm OSC.
  • The communicator size is odd and greater than 3.

The lines 283-285 in current ompi/mca/osc/sm/osc_sm_component.c has
the following code.

module->global_state = (ompi_osc_sm_global_state_t *) (module->segment_base);
module->node_states = (ompi_osc_sm_node_state_t *) (module->global_state + 1);
module->posts[0] = (uint64_t *) (module->node_states + comm_size);

The size of ompi_osc_sm_node_state_t is multiples of 4 but not
multiples of 8. So if comm_size is odd, module->posts[0] does
not aligned to 8. This causes a bus error when accessing
module->posts[i][j].

This patch fixes the alignment of module->posts[0] by setting
module->posts[0] first.

(cherry picked from open-mpi/ompi@ad26899)

Signed-off-by: Nathan Hjelm hjelmn@lanl.gov

A bus error occurs in sm OSC under the following conditions.

- sparc64 or any other architectures which need strict alignment.
- `MPI_WIN_POST` or `MPI_WIN_START` is called for a window created
  by sm OSC.
- The communicator size is odd and greater than 3.

The lines 283-285 in current `ompi/mca/osc/sm/osc_sm_component.c` has
the following code.

```c
module->global_state = (ompi_osc_sm_global_state_t *) (module->segment_base);
module->node_states = (ompi_osc_sm_node_state_t *) (module->global_state + 1);
module->posts[0] = (uint64_t *) (module->node_states + comm_size);
```

The size of `ompi_osc_sm_node_state_t` is multiples of 4 but not
multiples of 8. So if `comm_size` is odd, `module->posts[0]` does
not aligned to 8. This causes a bus error when accessing
`module->posts[i][j]`.

This patch fixes the alignment of `module->posts[0]` by setting
`module->posts[0]` first.

(cherry picked from open-mpi/ompi@ad26899)

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
@hjelmn
Copy link
Member Author

hjelmn commented Mar 15, 2016

:bot:assign: @kawashima-fj
:bot🏷️bug
:bot:milestone:v2.0.0

@mellanox-github
Copy link

Test FAILed.
See http://bgate.mellanox.com/jenkins/job/gh-ompi-release-pr/1439/ for details.

@hppritcha
Copy link
Member

@miked-mellanox could you check the ucx/oshmem problems we seem to be hitting now?
This PR shouldn't impact oshmem.

@mike-dubman
Copy link
Member

disabled VG for ucx for now.

@mike-dubman
Copy link
Member

bot:retest

@mellanox-github
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ompi-release-pr/1440/ for details.

@kawashima-fj
Copy link
Member

👍

hppritcha added a commit that referenced this pull request Mar 17, 2016
osc/sm: Fix a bus error on MPI_WIN_{POST,START}.
@hppritcha hppritcha merged commit ea97656 into open-mpi:v2.x Mar 17, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants