Skip to content

Cache alignment in osc/sm #6950

@guserav

Description

@guserav

Background information

I have been working on a new osc/ofi component in an internship at HPE. To get started I tried to understand the existing osc/sm component. While jumping through the code I confirmed what @hkuno and @jlbyrne-hpe already told me. Everything in the sm component gets shifted over by 4 bytes. This means that nothing is cache-aligned, even though the code tries to make everything cache-aligned.

What version of Open MPI are you using?

I only looked at the code in master

Details of the problem

The problem is that each opal/shmem component introduces a header to the shared memory segment:

struct opal_shmem_seg_hdr_t {
/* segment lock */
opal_atomic_lock_t lock;
/* pid of the segment creator */
pid_t cpid;
};

This causes everything else in the segment to get shifted over by the size of the header:

/* update returned base pointer with an offset that hides our stuff */
return (ds_buf->seg_base_addr + sizeof(opal_shmem_seg_hdr_t));

This causes two problems:

  • The returned adress is no longer cache- and page- aligned.
  • The allocated memory is larger than requested, potentially mapping a page more than intended.

The first problem is potentially an issue because e.g. osc/sm is not aware of how much the allocated memory shifted away from a cache line boundary:

state_size = sizeof(ompi_osc_sm_global_state_t) + sizeof(ompi_osc_sm_node_state_t) * comm_size;
state_size += OPAL_ALIGN_PAD_AMOUNT(state_size, 64);
posts_size = comm_size * post_size * sizeof (module->posts[0][0]);
posts_size += OPAL_ALIGN_PAD_AMOUNT(posts_size, 64);

for (i = 0, total = state_size + posts_size ; i < comm_size ; ++i) {
if (i > 0) {
module->posts[i] = module->posts[i - 1] + post_size;
}
module->sizes[i] = rbuf[i];
if (module->sizes[i]) {
module->bases[i] = ((char *) module->segment_base) + total;
total += rbuf[i];
} else {
module->bases[i] = NULL;
}
}


Possible solutions

Remove the segment header

I couldn't find any evidence the segment base header is used apart from the initialisation of the lock in it. So if I am not mistaken the header could easily be removed and all problems would be solved.

Move the segment header to the end of the segment

This would solve the alignment issue but not the size issue.

Pad the header to the size of a cache line

This would be the method that probably requires the least amount of code change. Similar to the previous solution, this would only solve one of the "problems" described.

Add a method to the framework to get the header size

Now we get into more invasive idea but I want to name it anyway.
This would shift the issue over to the user of the opal/shmem components. It would allow the user to request exactly the amount of memory he wants, taking the header size into account. This especially means that if the user is padding to a page anyway he can take the header size into account and pad with less.

On the other hand this would be a lot of overhead on the user side and could lead to a potential misuse of the component if the user is not aware of the fact that his segment base will not be at a page boundary.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions