-
Notifications
You must be signed in to change notification settings - Fork 10.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Libomptarget] Remove requires information from plugin #80345
Conversation
I need to look more carefully at the patch, but automatic zero copy decision is taken in the amdgpu plugin if the program did not have 'requires unified_shared_memory'. If you remove that information from the plugin how can it know that automatic zero-copy needs to be turned on? |
This was the only use upstream. I see there are some more uses downstream that I wasn't aware of. Obviously the APU stuff is device dependent which is why it stayed where it is. The use downstream seems to be some kind of checks for prepopulating some table? I don't fully understand that. It seems you and @ThorBl were involved in those changes, so maybe you know what the requirements are there. Either way, I don't think that we should use the same "requires" flags. |
Current upstream implementation of auto zero-copy does not support global variables. They need to be handled differently from unified_shared_memory. Thinking about it, we should have all the information in libomptarget as we surfaced it as a requires flag. That is, in libomptarget we know that we are in auto zero-copy and not usm, and that should be sufficient to decide on how to handle global variables. I am attending the OpenMP face-2-face this week and I didn't have time to work on the patch to support global variables, but will do next week. What we have downstream is support to prepopulate GPU page tables from the CPU without using XNACK. I haven't yet figured out if we need to exclude the usm case (prepopulating could apply to usm as well, as a potential optimization), so I don't yet know if we need the info in the plugin. I think we can go on with this and still be able to support global variables for automatic zero-copy. |
Alright, sounds good. I'll leave you to it. Hope you're having fun at the F2F. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you test on an MI300A? If not, please do before merging.
I don't think I have access to any currently. But I would expect just looking at the code upstream it would be inconsequential because I simply moved the |
This patch moves to libomptarget all OpenMP logic that depends on the requirement flags. This is due to upcoming trunk patch llvm#80345. The consequence of this change is that we have far less calls to the plugin during non-initialization phases. In addition, it - introduces diagnostic prints for zero-copy modes, and - make unified_shared_memory program fail on XNACK-Disabled environments when env variable OMPX_STRICT_SANITY_CHECKS=true. Note that unified_shared_memory can run on an XNACK-Disabled system by setting OMPX_EAGER_ZERO_COPY_MAPS, because it supports accesses to CPU-allocated memory using GPU page table prefaulting, instead of XNACK. Change-Id: I252e6557b3db1a13dd0fb1fe55f2f508361d9957
This patch cleans up coarse graining upon mapping for MI200 and it removes a dependency on the requirement flags in the plugin, to ease downstream merging of llvm#80345. Change-Id: I0bad9113c106c238c2089bf8097789a27f6899ea
Summary: Currently this is only used for the zero-copy handling. However, this can easily be moved into libomptarget so that we do not need to bother setting the requires flags in the plugin. The advantage here is that we no longer need to do this for every device redundently. Additionally, these requires flags are specifically OpenMP related, so they should live in libomptarget.
@llvm/pr-subscribers-offload Author: Joseph Huber (jhuber6) ChangesSummary: Full diff: https://github.com/llvm/llvm-project/pull/80345.diff 3 Files Affected:
diff --git a/offload/plugins-nextgen/common/include/PluginInterface.h b/offload/plugins-nextgen/common/include/PluginInterface.h
index c396099ac6252..d8d71e3e65a4a 100644
--- a/offload/plugins-nextgen/common/include/PluginInterface.h
+++ b/offload/plugins-nextgen/common/include/PluginInterface.h
@@ -958,8 +958,7 @@ struct GenericPluginTy {
/// Construct a plugin instance.
GenericPluginTy(Triple::ArchType TA)
- : RequiresFlags(OMP_REQ_UNDEFINED), GlobalHandler(nullptr), JIT(TA),
- RPCServer(nullptr) {}
+ : GlobalHandler(nullptr), JIT(TA), RPCServer(nullptr) {}
virtual ~GenericPluginTy() {}
@@ -1028,12 +1027,6 @@ struct GenericPluginTy {
return *RPCServer;
}
- /// Get the OpenMP requires flags set for this plugin.
- int64_t getRequiresFlags() const { return RequiresFlags; }
-
- /// Set the OpenMP requires flags for this plugin.
- void setRequiresFlag(int64_t Flags) { RequiresFlags = Flags; }
-
/// Initialize a device within the plugin.
Error initDevice(int32_t DeviceId);
@@ -1074,9 +1067,6 @@ struct GenericPluginTy {
/// Return the number of devices this plugin can support.
int32_t number_of_devices();
- /// Initializes the OpenMP register requires information.
- int64_t init_requires(int64_t RequiresFlags);
-
/// Returns non-zero if the data can be exchanged between the two devices.
int32_t is_data_exchangable(int32_t SrcDeviceId, int32_t DstDeviceId);
@@ -1203,9 +1193,6 @@ struct GenericPluginTy {
/// device was not initialized yet.
llvm::SmallVector<GenericDeviceTy *> Devices;
- /// OpenMP requires flags.
- int64_t RequiresFlags;
-
/// Pointer to the global handler for this plugin.
GenericGlobalHandlerTy *GlobalHandler;
diff --git a/offload/plugins-nextgen/common/src/PluginInterface.cpp b/offload/plugins-nextgen/common/src/PluginInterface.cpp
index 737a8b2a4064f..253acacc3a9dc 100644
--- a/offload/plugins-nextgen/common/src/PluginInterface.cpp
+++ b/offload/plugins-nextgen/common/src/PluginInterface.cpp
@@ -1618,11 +1618,6 @@ int32_t GenericPluginTy::init_device(int32_t DeviceId) {
int32_t GenericPluginTy::number_of_devices() { return getNumDevices(); }
-int64_t GenericPluginTy::init_requires(int64_t RequiresFlags) {
- setRequiresFlag(RequiresFlags);
- return OFFLOAD_SUCCESS;
-}
-
int32_t GenericPluginTy::is_data_exchangable(int32_t SrcDeviceId,
int32_t DstDeviceId) {
return isDataExchangable(SrcDeviceId, DstDeviceId);
@@ -1964,11 +1959,6 @@ int32_t GenericPluginTy::set_device_offset(int32_t DeviceIdOffset) {
}
int32_t GenericPluginTy::use_auto_zero_copy(int32_t DeviceId) {
- // Automatic zero-copy only applies to programs that did
- // not request unified_shared_memory and are deployed on an
- // APU with XNACK enabled.
- if (getRequiresFlags() & OMP_REQ_UNIFIED_SHARED_MEMORY)
- return false;
return getDevice(DeviceId).useAutoZeroCopy();
}
diff --git a/offload/src/device.cpp b/offload/src/device.cpp
index 749b4c567f8e4..943c778278730 100644
--- a/offload/src/device.cpp
+++ b/offload/src/device.cpp
@@ -77,15 +77,7 @@ DeviceTy::~DeviceTy() {
}
llvm::Error DeviceTy::init() {
- // Make call to init_requires if it exists for this plugin.
- int32_t Ret = 0;
- Ret = RTL->init_requires(PM->getRequirements());
- if (Ret != OFFLOAD_SUCCESS)
- return llvm::createStringError(
- llvm::inconvertibleErrorCode(),
- "Failed to initialize requirements for device %d\n", DeviceID);
-
- Ret = RTL->init_device(RTLDeviceID);
+ int32_t Ret = RTL->init_device(RTLDeviceID);
if (Ret != OFFLOAD_SUCCESS)
return llvm::createStringError(llvm::inconvertibleErrorCode(),
"Failed to initialize device %d\n",
@@ -285,5 +277,7 @@ void DeviceTy::dumpOffloadEntries() {
}
bool DeviceTy::useAutoZeroCopy() {
+ if (PM->getRequirements() & OMP_REQ_UNIFIED_SHARED_MEMORY)
+ return false;
return RTL->use_auto_zero_copy(RTLDeviceID);
}
|
Summary:
Currently this is only used for the zero-copy handling. However, this
can easily be moved into
libomptarget
so that we do not need to bothersetting the requires flags in the plugin. The advantage here is that we
no longer need to do this for every device redundently. Additionally,
these requires flags are specifically OpenMP related, so they should
live in
libomptarget
.