-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce memory space and CUDA memkind space heap #68
Conversation
aa733e9
to
c3ff602
Compare
test:oshmpi/mpich/builtin |
4 similar comments
test:oshmpi/mpich/builtin |
test:oshmpi/mpich/builtin |
test:oshmpi/mpich/builtin |
test:oshmpi/mpich/builtin |
test:oshmpi/mpich |
cd7e438
to
8240252
Compare
test:oshmpi/mpich |
1 similar comment
test:oshmpi/mpich |
5014883
to
0ed71ac
Compare
test:oshmpi/mpich |
test:oshmpi/mpich/builtin |
test:oshmpi/mpich/builtin |
test:oshmpi/mpich/builtin |
@raffenet Sometimes I see some jenkins jobs get an older commit but some others get the latest commit.
Have you seen this issue with other jenkins jobs? |
test:oshmpi/mpich/sos |
1 similar comment
test:oshmpi/mpich/sos |
test:oshmpi/mpich/sos |
We define sobj handle for each symm segment type (sheap, sdata,space heap) and transfer it to the target PE when issuing AM based operations. The target PE calculates the absolute vaddr of dest based on the handle. Note: this is the first refactoring patch for adding space heap. The code will be significantly changed again by a later commit
Use --enable-rma=direct|am|runtime and runtime env OSHMPI_DIRECT_RMA=1 to control the implementation of RMA. -direct: use MPI RMA -am: use MPI PT2PT based active message By default use direct mode. The am mode is useful when buffers are in GPU memory and the underlying MPI does not support GPU RMA.
Define three methods for RMA at configure: auto, direct, am. If auto is set, we check the buffer and symmetric object attribute at runtime. If any of the buffers is GPU pointer, we check the GPU features supported by MPI (OSHMPI_MPI_GPU_FEATURES, set by user, default all). - If PUT/GET is supported, then choose direct RMA in OSHMPI; - If not but PT2PT is supported, then choose am RMA in OSHMPI; - Otherwise, abort
This is a large refactoring patch basically touches everywhere: - Define sobj_attr for each of symm object(heap, data, space heap) - Separate sobj_attr and ictx query, and vaddr/disp translation at every op (RMA, AMO, AM, coll, P2P, lock) - Set namespace for sobj related routines - One icc compiler warning fix
This option is enabled only when --enable-fast is off
1. Do not set noinline program if ENABL_IPO is off. Because some compilers (e.g., gcc) may not understand the program and throw compile warning 2. Minor bug fix at print env
Use unique tag for multi-packets am to avoid mismatch in multithreading.
Configure --enable-threads=multiple only enables the thread support code at compile time. The default level at runtime is always single. The user should request a higher safety at shmem_init_thread. When the user requested level is higher than that provided, OSHMPI does not abort but internally degrades the thread level support. Error code SHMEM_OTHER_ERR and the actual provided level will be returned. One exception is that if async thread is required, it aborts if cannot enable THREAD_MULTIPLE.
When auto is set, runtime enables async thread when either RMA or AMO is AM based. User can use env var OSHMPI_ENABLE_ASYNC_THREAD to overwrite the setting. We change the default configure option of async thread from "no" to "auto" because it will not add any overhead at perf-critical RMA/AMO/fence/quiet path. For wait_until and test, it may adds one branch check based on global variable enable_async_thread which is set only at shmem_init. We assume the compiler can optimize such branch. Nevertheless, if the branch overhead is a concern, one can disable the async thread feature at configure.
Current async thread does not consider performance and may cause expensive overhead when oversubscribing cores. Thus, we disable it by default. Semantically complete async progress has to rely on async thread, but it is really needed only in a few cases where the remote PE does not make any SHMEM call. For such cases, the user can manually enable async thread.
test:oshmpi/mpich/builtin |
1 similar comment
test:oshmpi/mpich/builtin |
No description provided.