Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Libomptarget] Remove requires information from plugin #80345

Merged
merged 1 commit into from
May 16, 2024

Conversation

jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented Feb 1, 2024

Summary:
Currently this is only used for the zero-copy handling. However, this
can easily be moved into libomptarget so that we do not need to bother
setting the requires flags in the plugin. The advantage here is that we
no longer need to do this for every device redundently. Additionally,
these requires flags are specifically OpenMP related, so they should
live in libomptarget.

@carlobertolli
Copy link
Member

I need to look more carefully at the patch, but automatic zero copy decision is taken in the amdgpu plugin if the program did not have 'requires unified_shared_memory'. If you remove that information from the plugin how can it know that automatic zero-copy needs to be turned on?
At the same time, runtime decision on zero-copy for amdgpu's is based on the machine being an APU, and that is a vendor specific information that cannot be moved to libomptarget.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Feb 1, 2024

I need to look more carefully at the patch, but automatic zero copy decision is taken in the amdgpu plugin if the program did not have 'requires unified_shared_memory'. If you remove that information from the plugin how can it know that automatic zero-copy needs to be turned on? At the same time, runtime decision on zero-copy for amdgpu's is based on the machine being an APU, and that is a vendor specific information that cannot be moved to libomptarget.

This was the only use upstream. I see there are some more uses downstream that I wasn't aware of. Obviously the APU stuff is device dependent which is why it stayed where it is.

The use downstream seems to be some kind of checks for prepopulating some table? I don't fully understand that. It seems you and @ThorBl were involved in those changes, so maybe you know what the requirements are there. Either way, I don't think that we should use the same "requires" flags.

@carlobertolli
Copy link
Member

Current upstream implementation of auto zero-copy does not support global variables. They need to be handled differently from unified_shared_memory. Thinking about it, we should have all the information in libomptarget as we surfaced it as a requires flag. That is, in libomptarget we know that we are in auto zero-copy and not usm, and that should be sufficient to decide on how to handle global variables.

I am attending the OpenMP face-2-face this week and I didn't have time to work on the patch to support global variables, but will do next week.

What we have downstream is support to prepopulate GPU page tables from the CPU without using XNACK. I haven't yet figured out if we need to exclude the usm case (prepopulating could apply to usm as well, as a potential optimization), so I don't yet know if we need the info in the plugin.

I think we can go on with this and still be able to support global variables for automatic zero-copy.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Feb 1, 2024

Current upstream implementation of auto zero-copy does not support global variables. They need to be handled differently from unified_shared_memory. Thinking about it, we should have all the information in libomptarget as we surfaced it as a requires flag. That is, in libomptarget we know that we are in auto zero-copy and not usm, and that should be sufficient to decide on how to handle global variables.

I am attending the OpenMP face-2-face this week and I didn't have time to work on the patch to support global variables, but will do next week.

What we have downstream is support to prepopulate GPU page tables from the CPU without using XNACK. I haven't yet figured out if we need to exclude the usm case (prepopulating could apply to usm as well, as a potential optimization), so I don't yet know if we need the info in the plugin.

I think we can go on with this and still be able to support global variables for automatic zero-copy.

Alright, sounds good. I'll leave you to it. Hope you're having fun at the F2F.

Copy link
Member

@carlobertolli carlobertolli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test on an MI300A? If not, please do before merging.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Feb 1, 2024

Did you test on an MI300A? If not, please do before merging.

I don't think I have access to any currently. But I would expect just looking at the code upstream it would be inconsequential because I simply moved the if condition out of the plugins and into libomptarget. Thanks for the review, I'll hold off on merging it until we have a better solution downstream to make @ronlieb's life easier.

searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Feb 21, 2024
This patch moves to libomptarget all OpenMP logic that depends on the requirement flags.
This is due to upcoming trunk patch llvm#80345.
The consequence of this change is that we have far less calls to the plugin
during non-initialization phases.

In addition, it
- introduces diagnostic prints for zero-copy modes, and
- make unified_shared_memory program fail on XNACK-Disabled environments
when env variable OMPX_STRICT_SANITY_CHECKS=true.

Note that unified_shared_memory can run on an XNACK-Disabled
system by setting OMPX_EAGER_ZERO_COPY_MAPS, because it supports
accesses to CPU-allocated memory using GPU page table prefaulting,
instead of XNACK.

Change-Id: I252e6557b3db1a13dd0fb1fe55f2f508361d9957
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Mar 2, 2024
This patch cleans up coarse graining upon mapping for MI200 and it removes
a dependency on the requirement flags in the plugin, to ease downstream merging of
llvm#80345.

Change-Id: I0bad9113c106c238c2089bf8097789a27f6899ea
Summary:
Currently this is only used for the zero-copy handling. However, this
can easily be moved into libomptarget so that we do not need to bother
setting the requires flags in the plugin. The advantage here is that we
no longer need to do this for every device redundently. Additionally,
these requires flags are specifically OpenMP related, so they should
live in libomptarget.
@llvmbot
Copy link
Collaborator

llvmbot commented May 16, 2024

@llvm/pr-subscribers-offload

Author: Joseph Huber (jhuber6)

Changes

Summary:
Currently this is only used for the zero-copy handling. However, this
can easily be moved into libomptarget so that we do not need to bother
setting the requires flags in the plugin. The advantage here is that we
no longer need to do this for every device redundently. Additionally,
these requires flags are specifically OpenMP related, so they should
live in libomptarget.


Full diff: https://github.com/llvm/llvm-project/pull/80345.diff

3 Files Affected:

  • (modified) offload/plugins-nextgen/common/include/PluginInterface.h (+1-14)
  • (modified) offload/plugins-nextgen/common/src/PluginInterface.cpp (-10)
  • (modified) offload/src/device.cpp (+3-9)
diff --git a/offload/plugins-nextgen/common/include/PluginInterface.h b/offload/plugins-nextgen/common/include/PluginInterface.h
index c396099ac6252..d8d71e3e65a4a 100644
--- a/offload/plugins-nextgen/common/include/PluginInterface.h
+++ b/offload/plugins-nextgen/common/include/PluginInterface.h
@@ -958,8 +958,7 @@ struct GenericPluginTy {
 
   /// Construct a plugin instance.
   GenericPluginTy(Triple::ArchType TA)
-      : RequiresFlags(OMP_REQ_UNDEFINED), GlobalHandler(nullptr), JIT(TA),
-        RPCServer(nullptr) {}
+      : GlobalHandler(nullptr), JIT(TA), RPCServer(nullptr) {}
 
   virtual ~GenericPluginTy() {}
 
@@ -1028,12 +1027,6 @@ struct GenericPluginTy {
     return *RPCServer;
   }
 
-  /// Get the OpenMP requires flags set for this plugin.
-  int64_t getRequiresFlags() const { return RequiresFlags; }
-
-  /// Set the OpenMP requires flags for this plugin.
-  void setRequiresFlag(int64_t Flags) { RequiresFlags = Flags; }
-
   /// Initialize a device within the plugin.
   Error initDevice(int32_t DeviceId);
 
@@ -1074,9 +1067,6 @@ struct GenericPluginTy {
   /// Return the number of devices this plugin can support.
   int32_t number_of_devices();
 
-  /// Initializes the OpenMP register requires information.
-  int64_t init_requires(int64_t RequiresFlags);
-
   /// Returns non-zero if the data can be exchanged between the two devices.
   int32_t is_data_exchangable(int32_t SrcDeviceId, int32_t DstDeviceId);
 
@@ -1203,9 +1193,6 @@ struct GenericPluginTy {
   /// device was not initialized yet.
   llvm::SmallVector<GenericDeviceTy *> Devices;
 
-  /// OpenMP requires flags.
-  int64_t RequiresFlags;
-
   /// Pointer to the global handler for this plugin.
   GenericGlobalHandlerTy *GlobalHandler;
 
diff --git a/offload/plugins-nextgen/common/src/PluginInterface.cpp b/offload/plugins-nextgen/common/src/PluginInterface.cpp
index 737a8b2a4064f..253acacc3a9dc 100644
--- a/offload/plugins-nextgen/common/src/PluginInterface.cpp
+++ b/offload/plugins-nextgen/common/src/PluginInterface.cpp
@@ -1618,11 +1618,6 @@ int32_t GenericPluginTy::init_device(int32_t DeviceId) {
 
 int32_t GenericPluginTy::number_of_devices() { return getNumDevices(); }
 
-int64_t GenericPluginTy::init_requires(int64_t RequiresFlags) {
-  setRequiresFlag(RequiresFlags);
-  return OFFLOAD_SUCCESS;
-}
-
 int32_t GenericPluginTy::is_data_exchangable(int32_t SrcDeviceId,
                                              int32_t DstDeviceId) {
   return isDataExchangable(SrcDeviceId, DstDeviceId);
@@ -1964,11 +1959,6 @@ int32_t GenericPluginTy::set_device_offset(int32_t DeviceIdOffset) {
 }
 
 int32_t GenericPluginTy::use_auto_zero_copy(int32_t DeviceId) {
-  // Automatic zero-copy only applies to programs that did
-  // not request unified_shared_memory and are deployed on an
-  // APU with XNACK enabled.
-  if (getRequiresFlags() & OMP_REQ_UNIFIED_SHARED_MEMORY)
-    return false;
   return getDevice(DeviceId).useAutoZeroCopy();
 }
 
diff --git a/offload/src/device.cpp b/offload/src/device.cpp
index 749b4c567f8e4..943c778278730 100644
--- a/offload/src/device.cpp
+++ b/offload/src/device.cpp
@@ -77,15 +77,7 @@ DeviceTy::~DeviceTy() {
 }
 
 llvm::Error DeviceTy::init() {
-  // Make call to init_requires if it exists for this plugin.
-  int32_t Ret = 0;
-  Ret = RTL->init_requires(PM->getRequirements());
-  if (Ret != OFFLOAD_SUCCESS)
-    return llvm::createStringError(
-        llvm::inconvertibleErrorCode(),
-        "Failed to initialize requirements for device %d\n", DeviceID);
-
-  Ret = RTL->init_device(RTLDeviceID);
+  int32_t Ret = RTL->init_device(RTLDeviceID);
   if (Ret != OFFLOAD_SUCCESS)
     return llvm::createStringError(llvm::inconvertibleErrorCode(),
                                    "Failed to initialize device %d\n",
@@ -285,5 +277,7 @@ void DeviceTy::dumpOffloadEntries() {
 }
 
 bool DeviceTy::useAutoZeroCopy() {
+  if (PM->getRequirements() & OMP_REQ_UNIFIED_SHARED_MEMORY)
+    return false;
   return RTL->use_auto_zero_copy(RTLDeviceID);
 }

@jhuber6 jhuber6 merged commit 3abd3d6 into llvm:main May 16, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
offload openmp:libomptarget OpenMP offload runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants