Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Libomptarget] Rework Record & Replay to be a plugin member (#88928) #89097

Merged
merged 1 commit into from
May 16, 2024

Conversation

jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented Apr 17, 2024

Summary:
Previously, the R&R support was global state initialized by a global
constructor. This is bad because it prevents us from adequately
constraining the lifetime of the library. Additionally, we want to
minimize the amount of global state floating around.

This patch moves the R&R support into a plugin member like everything
else. This means there will be multiple copies of the R&R implementation
floating around, but this was already the case given the fact that we
currently handle everything with dynamic libraries.

@llvmbot llvmbot added the openmp:libomptarget OpenMP offload runtime label Apr 17, 2024
@jhuber6
Copy link
Contributor Author

jhuber6 commented Apr 17, 2024

New PR since GH is dumb and doesn't let you re-open a merged PR.

I tested this locally and couldn't reproduce any issues, I have no clue what could be going wrong. Could one of you test this locally and tell me where it's failing if it does?

@jplehr
Copy link
Contributor

jplehr commented Apr 17, 2024

I ran the tests locally and the segmentation fault reproduces. I randomly picked the complex_reduction.cpp test case to run in isolation and it appears to be a segmentation fault in the deinit. For reference: This machine has no CUDA installed, in case that makes a difference.

Thread 1 "complex_reducti" received signal SIGSEGV, Segmentation fault.
0x00007ffff103ebf9 in llvm::omp::target::plugin::GenericPluginTy::deinit() () from /path/bot-tester-builds/openmp/./lib/libomptarget.rtl.cuda.so
(gdb) bt
#0  0x00007ffff103ebf9 in llvm::omp::target::plugin::GenericPluginTy::deinit() () from /path/bot-tester-builds/openmp/./lib/libomptarget.rtl.cuda.so
#1  0x00007ffff1047982 in llvm::omp::target::plugin::PluginTy::deinit() () from /path/bot-tester-builds/openmp/./lib/libomptarget.rtl.cuda.so
#2  0x00007ffff104780d in llvm::omp::target::plugin::PluginTy::~PluginTy() () from /path/bot-tester-builds/openmp/./lib/libomptarget.rtl.cuda.so

@jhuber6
Copy link
Contributor Author

jhuber6 commented Apr 17, 2024

As far as I can tell, the actual handling of this class is all over the place. It was a global reference that takes a single device and, is deinitialized along with the device, but must be initialized externally for a single device? I don't know, I just tried to retain the old semantics. Hopefully it works again.

@jplehr
Copy link
Contributor

jplehr commented Apr 18, 2024

I still get the segmentation fault error when I do a local build of this PR with our buildbot config.

@llvmbot
Copy link
Collaborator

llvmbot commented May 16, 2024

@llvm/pr-subscribers-offload

Author: Joseph Huber (jhuber6)

Changes

Summary:
Previously, the R&R support was global state initialized by a global
constructor. This is bad because it prevents us from adequately
constraining the lifetime of the library. Additionally, we want to
minimize the amount of global state floating around.

This patch moves the R&R support into a plugin member like everything
else. This means there will be multiple copies of the R&R implementation
floating around, but this was already the case given the fact that we
currently handle everything with dynamic libraries.


Full diff: https://github.com/llvm/llvm-project/pull/89097.diff

3 Files Affected:

  • (modified) offload/plugins-nextgen/common/CMakeLists.txt (+1-1)
  • (modified) offload/plugins-nextgen/common/include/PluginInterface.h (+12-1)
  • (modified) offload/plugins-nextgen/common/src/PluginInterface.cpp (+23-11)
diff --git a/offload/plugins-nextgen/common/CMakeLists.txt b/offload/plugins-nextgen/common/CMakeLists.txt
index 6064468ea5b57..1237b5a38dcb6 100644
--- a/offload/plugins-nextgen/common/CMakeLists.txt
+++ b/offload/plugins-nextgen/common/CMakeLists.txt
@@ -60,7 +60,7 @@ target_compile_definitions(PluginCommon PRIVATE
   DEBUG_PREFIX="PluginInterface"
 )
 
-target_compile_options(PluginCommon PUBLIC ${offload_compile_flags})
+target_compile_options(PluginCommon PUBLIC ${offload_compile_flags} -g -O0)
 target_link_options(PluginCommon PUBLIC ${offload_link_flags})
 
 target_include_directories(PluginCommon PUBLIC 
diff --git a/offload/plugins-nextgen/common/include/PluginInterface.h b/offload/plugins-nextgen/common/include/PluginInterface.h
index d8d71e3e65a4a..83f6e8d76fec7 100644
--- a/offload/plugins-nextgen/common/include/PluginInterface.h
+++ b/offload/plugins-nextgen/common/include/PluginInterface.h
@@ -54,6 +54,7 @@ namespace plugin {
 struct GenericPluginTy;
 struct GenericKernelTy;
 struct GenericDeviceTy;
+struct RecordReplayTy;
 
 /// Class that wraps the __tgt_async_info to simply its usage. In case the
 /// object is constructed without a valid __tgt_async_info, the object will use
@@ -958,7 +959,8 @@ struct GenericPluginTy {
 
   /// Construct a plugin instance.
   GenericPluginTy(Triple::ArchType TA)
-      : GlobalHandler(nullptr), JIT(TA), RPCServer(nullptr) {}
+      : GlobalHandler(nullptr), JIT(TA), RPCServer(nullptr),
+        RecordReplay(nullptr) {}
 
   virtual ~GenericPluginTy() {}
 
@@ -1027,6 +1029,12 @@ struct GenericPluginTy {
     return *RPCServer;
   }
 
+  /// Get a reference to the record and replay interface for the plugin.
+  RecordReplayTy &getRecordReplay() {
+    assert(RecordReplay && "RR interface not initialized");
+    return *RecordReplay;
+  }
+
   /// Initialize a device within the plugin.
   Error initDevice(int32_t DeviceId);
 
@@ -1204,6 +1212,9 @@ struct GenericPluginTy {
 
   /// The interface between the plugin and the GPU for host services.
   RPCServerTy *RPCServer;
+
+  /// The interface between the plugin and the GPU for host services.
+  RecordReplayTy *RecordReplay;
 };
 
 namespace Plugin {
diff --git a/offload/plugins-nextgen/common/src/PluginInterface.cpp b/offload/plugins-nextgen/common/src/PluginInterface.cpp
index 253acacc3a9dc..550ebc9c28b25 100644
--- a/offload/plugins-nextgen/common/src/PluginInterface.cpp
+++ b/offload/plugins-nextgen/common/src/PluginInterface.cpp
@@ -39,6 +39,7 @@ using namespace target;
 using namespace plugin;
 
 // TODO: Fix any thread safety issues for multi-threaded kernel recording.
+namespace llvm::omp::target::plugin {
 struct RecordReplayTy {
 
   // Describes the state of the record replay mechanism.
@@ -358,8 +359,7 @@ struct RecordReplayTy {
     }
   }
 };
-
-static RecordReplayTy RecordReplay;
+} // namespace llvm::omp::target::plugin
 
 // Extract the mapping of host function pointers to device function pointers
 // from the entry table. Functions marked as 'indirect' in OpenMP will have
@@ -470,7 +470,7 @@ GenericKernelTy::getKernelLaunchEnvironment(
   // Ctor/Dtor have no arguments, replaying uses the original kernel launch
   // environment. Older versions of the compiler do not generate a kernel
   // launch environment.
-  if (RecordReplay.isReplaying() ||
+  if (GenericDevice.Plugin.getRecordReplay().isReplaying() ||
       Version < OMP_KERNEL_ARG_MIN_VERSION_WITH_DYN_PTR)
     return nullptr;
 
@@ -559,6 +559,7 @@ Error GenericKernelTy::launch(GenericDeviceTy &GenericDevice, void **ArgPtrs,
 
   // Record the kernel description after we modified the argument count and num
   // blocks/threads.
+  RecordReplayTy &RecordReplay = GenericDevice.Plugin.getRecordReplay();
   if (RecordReplay.isRecording()) {
     RecordReplay.saveImage(getName(), getImage());
     RecordReplay.saveKernelInput(getName(), getImage());
@@ -833,6 +834,7 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) {
     delete MemoryManager;
   MemoryManager = nullptr;
 
+  RecordReplayTy &RecordReplay = Plugin.getRecordReplay();
   if (RecordReplay.isRecordingOrReplaying())
     RecordReplay.deinit();
 
@@ -886,7 +888,8 @@ GenericDeviceTy::loadBinary(GenericPluginTy &Plugin,
     return std::move(Err);
 
   // Setup the global device memory pool if needed.
-  if (!RecordReplay.isReplaying() && shouldSetupDeviceMemoryPool()) {
+  if (!Plugin.getRecordReplay().isReplaying() &&
+      shouldSetupDeviceMemoryPool()) {
     uint64_t HeapSize;
     auto SizeOrErr = getDeviceHeapSize(HeapSize);
     if (SizeOrErr) {
@@ -1301,8 +1304,8 @@ Expected<void *> GenericDeviceTy::dataAlloc(int64_t Size, void *HostPtr,
                                             TargetAllocTy Kind) {
   void *Alloc = nullptr;
 
-  if (RecordReplay.isRecordingOrReplaying())
-    return RecordReplay.alloc(Size);
+  if (Plugin.getRecordReplay().isRecordingOrReplaying())
+    return Plugin.getRecordReplay().alloc(Size);
 
   switch (Kind) {
   case TARGET_ALLOC_DEFAULT:
@@ -1338,7 +1341,7 @@ Expected<void *> GenericDeviceTy::dataAlloc(int64_t Size, void *HostPtr,
 
 Error GenericDeviceTy::dataDelete(void *TgtPtr, TargetAllocTy Kind) {
   // Free is a noop when recording or replaying.
-  if (RecordReplay.isRecordingOrReplaying())
+  if (Plugin.getRecordReplay().isRecordingOrReplaying())
     return Plugin::success();
 
   int Res;
@@ -1405,7 +1408,8 @@ Error GenericDeviceTy::launchKernel(void *EntryPtr, void **ArgPtrs,
                                     KernelArgsTy &KernelArgs,
                                     __tgt_async_info *AsyncInfo) {
   AsyncInfoWrapperTy AsyncInfoWrapper(
-      *this, RecordReplay.isRecordingOrReplaying() ? nullptr : AsyncInfo);
+      *this,
+      Plugin.getRecordReplay().isRecordingOrReplaying() ? nullptr : AsyncInfo);
 
   GenericKernelTy &GenericKernel =
       *reinterpret_cast<GenericKernelTy *>(EntryPtr);
@@ -1416,6 +1420,7 @@ Error GenericDeviceTy::launchKernel(void *EntryPtr, void **ArgPtrs,
   // 'finalize' here to guarantee next record-replay actions are in-sync
   AsyncInfoWrapper.finalize(Err);
 
+  RecordReplayTy &RecordReplay = Plugin.getRecordReplay();
   if (RecordReplay.isRecordingOrReplaying() &&
       RecordReplay.isSaveOutputEnabled())
     RecordReplay.saveKernelOutputInfo(GenericKernel.getName());
@@ -1503,6 +1508,9 @@ Error GenericPluginTy::init() {
   RPCServer = new RPCServerTy(*this);
   assert(RPCServer && "Invalid RPC server");
 
+  RecordReplay = new RecordReplayTy();
+  assert(RecordReplay && "Invalid RR interface");
+
   return Plugin::success();
 }
 
@@ -1523,6 +1531,9 @@ Error GenericPluginTy::deinit() {
   if (RPCServer)
     delete RPCServer;
 
+  if (RecordReplay)
+    delete RecordReplay;
+
   // Perform last deinitializations on the plugin.
   return deinitImpl();
 }
@@ -1633,12 +1644,12 @@ int32_t GenericPluginTy::initialize_record_replay(int32_t DeviceId,
       isRecord ? RecordReplayTy::RRStatusTy::RRRecording
                : RecordReplayTy::RRStatusTy::RRReplaying;
 
-  if (auto Err = RecordReplay.init(&Device, MemorySize, VAddr, Status,
-                                   SaveOutput, ReqPtrArgOffset)) {
+  if (auto Err = RecordReplay->init(&Device, MemorySize, VAddr, Status,
+                                    SaveOutput, ReqPtrArgOffset)) {
     REPORT("WARNING RR did not intialize RR-properly with %lu bytes"
            "(Error: %s)\n",
            MemorySize, toString(std::move(Err)).data());
-    RecordReplay.setStatus(RecordReplayTy::RRStatusTy::RRDeactivated);
+    RecordReplay->setStatus(RecordReplayTy::RRStatusTy::RRDeactivated);
 
     if (!isRecord) {
       return OFFLOAD_FAIL;
@@ -1982,6 +1993,7 @@ int32_t GenericPluginTy::get_global(__tgt_device_binary Binary, uint64_t Size,
   assert(DevicePtr && "Invalid device global's address");
 
   // Save the loaded globals if we are recording.
+  RecordReplayTy &RecordReplay = Device.Plugin.getRecordReplay();
   if (RecordReplay.isRecording())
     RecordReplay.addEntry(Name, Size, *DevicePtr);
 

Summary:
Previously, the R&R support was global state initialized by a global
constructor. This is bad because it prevents us from adequately
constraining the lifetime of the library. Additionally, we want to
minimize the amount of global state floating around.

This patch moves the R&R support into a plugin member like everything
else. This means there will be multiple copies of the R&R implementation
floating around, but this was already the case given the fact that we
currently handle everything with dynamic libraries.
Copy link
Contributor

@jplehr jplehr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jplehr jplehr self-requested a review May 16, 2024 19:56
@jhuber6 jhuber6 merged commit f42f57b into llvm:main May 16, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
offload openmp:libomptarget OpenMP offload runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants