Introduce memory space and CUDA memkind space heap #68

minsii · 2020-08-25T17:21:58Z

No description provided.

minsii · 2020-08-25T22:13:38Z

test:oshmpi/mpich/builtin

minsii · 2020-08-25T23:02:11Z

test:oshmpi/mpich/builtin

minsii · 2020-08-25T23:03:37Z

test:oshmpi/mpich/builtin

minsii · 2020-08-26T03:20:46Z

test:oshmpi/mpich/builtin

minsii · 2020-08-26T04:01:44Z

test:oshmpi/mpich/builtin

minsii · 2020-08-26T05:35:15Z

test:oshmpi/mpich
test:oshmpi/ompi

minsii · 2020-08-26T16:28:40Z

test:oshmpi/mpich
test:oshmpi/ompi

minsii · 2020-08-26T16:29:50Z

test:oshmpi/mpich
test:oshmpi/ompi

minsii · 2020-08-27T23:09:40Z

test:oshmpi/mpich
test:oshmpi/ompi

minsii · 2020-08-27T23:10:39Z

test:oshmpi/mpich/builtin
test:oshmpi/ompi/builtin

minsii · 2020-08-27T23:36:58Z

test:oshmpi/mpich/builtin
test:oshmpi/ompi/builtin
test:oshmpi/mpich/sos
test:oshmpi/ompi/sos

minsii · 2020-08-27T23:38:38Z

test:oshmpi/mpich/builtin
test:oshmpi/ompi/builtin

minsii · 2020-08-27T23:42:57Z

@raffenet Sometimes I see some jenkins jobs get an older commit but some others get the latest commit.
For example:

The one got an old commit: https://jenkins-pmrs.cels.anl.gov/view/oshmpi/job/oshmpi-review-builtin-ompi-ucx/6/
The one got the latest: https://jenkins-pmrs.cels.anl.gov/view/oshmpi/job/oshmpi-review-sostest-mpich-ch4-ofi/69/
Both were triggered by the same comment which I sent 5 minutes ago.

Have you seen this issue with other jenkins jobs?

minsii · 2020-08-28T03:47:33Z

test:oshmpi/mpich/sos
test:oshmpi/ompi/sos

minsii · 2020-08-28T03:48:17Z

test:oshmpi/mpich/sos
test:oshmpi/ompi/sos

minsii · 2020-08-28T04:40:42Z

test:oshmpi/mpich/sos

We define sobj handle for each symm segment type (sheap, sdata,space heap) and transfer it to the target PE when issuing AM based operations. The target PE calculates the absolute vaddr of dest based on the handle. Note: this is the first refactoring patch for adding space heap. The code will be significantly changed again by a later commit

Use --enable-rma=direct|am|runtime and runtime env OSHMPI_DIRECT_RMA=1 to control the implementation of RMA. -direct: use MPI RMA -am: use MPI PT2PT based active message By default use direct mode. The am mode is useful when buffers are in GPU memory and the underlying MPI does not support GPU RMA.

Define three methods for RMA at configure: auto, direct, am. If auto is set, we check the buffer and symmetric object attribute at runtime. If any of the buffers is GPU pointer, we check the GPU features supported by MPI (OSHMPI_MPI_GPU_FEATURES, set by user, default all). - If PUT/GET is supported, then choose direct RMA in OSHMPI; - If not but PT2PT is supported, then choose am RMA in OSHMPI; - Otherwise, abort

This is a large refactoring patch basically touches everywhere: - Define sobj_attr for each of symm object(heap, data, space heap) - Separate sobj_attr and ictx query, and vaddr/disp translation at every op (RMA, AMO, AM, coll, P2P, lock) - Set namespace for sobj related routines - One icc compiler warning fix

This option is enabled only when --enable-fast is off

1. Do not set noinline program if ENABL_IPO is off. Because some compilers (e.g., gcc) may not understand the program and throw compile warning 2. Minor bug fix at print env

Use unique tag for multi-packets am to avoid mismatch in multithreading.

Configure --enable-threads=multiple only enables the thread support code at compile time. The default level at runtime is always single. The user should request a higher safety at shmem_init_thread. When the user requested level is higher than that provided, OSHMPI does not abort but internally degrades the thread level support. Error code SHMEM_OTHER_ERR and the actual provided level will be returned. One exception is that if async thread is required, it aborts if cannot enable THREAD_MULTIPLE.

When auto is set, runtime enables async thread when either RMA or AMO is AM based. User can use env var OSHMPI_ENABLE_ASYNC_THREAD to overwrite the setting. We change the default configure option of async thread from "no" to "auto" because it will not add any overhead at perf-critical RMA/AMO/fence/quiet path. For wait_until and test, it may adds one branch check based on global variable enable_async_thread which is set only at shmem_init. We assume the compiler can optimize such branch. Nevertheless, if the branch overhead is a concern, one can disable the async thread feature at configure.

Current async thread does not consider performance and may cause expensive overhead when oversubscribing cores. Thus, we disable it by default. Semantically complete async progress has to rely on async thread, but it is really needed only in a few cases where the remote PE does not make any SHMEM call. For such cases, the user can manually enable async thread.

minsii · 2020-08-28T17:10:54Z

test:oshmpi/mpich/builtin
test:oshmpi/ompi/builtin

minsii · 2020-08-28T17:11:18Z

test:oshmpi/mpich/builtin
test:oshmpi/ompi/builtin

minsii force-pushed the pr/mem-space branch 4 times, most recently from aa733e9 to c3ff602 Compare August 25, 2020 22:12

minsii force-pushed the pr/mem-space branch 3 times, most recently from cd7e438 to 8240252 Compare August 26, 2020 16:27

minsii force-pushed the pr/mem-space branch 3 times, most recently from 5014883 to 0ed71ac Compare August 27, 2020 23:07

minsii added 4 commits August 28, 2020 12:03

space: add shmemx space routines and ctx in RMA/AMO/sync

b889adc

gpu: enable CUDA memory kind

942cea7

impl: allow displacement with zero value

4bee46b

minsii added 16 commits August 28, 2020 12:06

amo: fix op tracking for fetch and cswap

fc1ecbc

am: skip am routines if both rma and amo are direct

cf7f879

env: add debug env vars for AMO/RMA direct/am switch at runtime

434372e

This option is enabled only when --enable-fast is off

ipo: do not set noinline program by default and a bug fix

9479597

1. Do not set noinline program if ENABL_IPO is off. Because some compilers (e.g., gcc) may not understand the program and throw compile warning 2. Minor bug fix at print env

rma am type_free fix

d7b32af

util: add finc routine for atomic counter

819b1ef

am: generate unique tag for multi-packets am operation

37c6dbe

Use unique tag for multi-packets am to avoid mismatch in multithreading.

am: tag ub fix

4901033

am: simplify async thread check in am_progress_impl.h

7e26b80

test: it is valid to provide higher thread level than required

28861d1

minsii force-pushed the pr/mem-space branch from 6191ef2 to f956791 Compare August 28, 2020 17:10

minsii merged commit 3ceffa6 into pmodels:master Aug 28, 2020

minsii deleted the pr/mem-space branch August 28, 2020 18:39

minsii mentioned this pull request Nov 9, 2020

shmemx: define gpu symm heap alloc|free #24

Closed

This was linked to issues Nov 18, 2020

Using single dynamic window for symm heap and symm data in OSHMPI #63

Closed

Develop memory space routines #41

Closed

CUDA memory kind in space_create #49

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce memory space and CUDA memkind space heap #68

Introduce memory space and CUDA memkind space heap #68

minsii commented Aug 25, 2020

minsii commented Aug 25, 2020

minsii commented Aug 25, 2020

minsii commented Aug 25, 2020

minsii commented Aug 26, 2020

minsii commented Aug 26, 2020

minsii commented Aug 26, 2020

minsii commented Aug 26, 2020

minsii commented Aug 26, 2020

minsii commented Aug 27, 2020

minsii commented Aug 27, 2020

minsii commented Aug 27, 2020

minsii commented Aug 27, 2020

minsii commented Aug 27, 2020

minsii commented Aug 28, 2020

minsii commented Aug 28, 2020

minsii commented Aug 28, 2020

minsii commented Aug 28, 2020

minsii commented Aug 28, 2020

Introduce memory space and CUDA memkind space heap #68

Introduce memory space and CUDA memkind space heap #68

Conversation

minsii commented Aug 25, 2020

minsii commented Aug 25, 2020

minsii commented Aug 25, 2020

minsii commented Aug 25, 2020

minsii commented Aug 26, 2020

minsii commented Aug 26, 2020

minsii commented Aug 26, 2020

minsii commented Aug 26, 2020

minsii commented Aug 26, 2020

minsii commented Aug 27, 2020

minsii commented Aug 27, 2020

minsii commented Aug 27, 2020

minsii commented Aug 27, 2020

minsii commented Aug 27, 2020

minsii commented Aug 28, 2020

minsii commented Aug 28, 2020

minsii commented Aug 28, 2020

minsii commented Aug 28, 2020

minsii commented Aug 28, 2020