Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Libomptarget] Minimize use of 'REPORT' in the plugin interface #78182

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented Jan 15, 2024

Summary:
The REPORT macro is coded to abort() on non-debug builds. There is a
lot of code scattered around that uses this to spuriously halt execution
in the middle of the plugin. This patch simply tries to replace them
with actual error messages or just consumes them.

The main offender here is the malloc and free interface. Right now
I'm just consuming them, which is consisent with how free and malloc
work. But I think the way forward is to make these functions also return
an expected value so users are forced to check it.

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 15, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Joseph Huber (jhuber6)

Changes

Summary:
The REPORT macro is coded to abort() on non-debug builds. There is a
lot of code scattered around that uses this to spuriously halt execution
in the middle of the plugin. This patch simply tries to replace them
with actual error messages or just consumes them.

The main offender here is the malloc and free interface. Right now
I'm just consuming them, which is consisent with how free and malloc
work. But I think the way forward is to make these functions also return
an expected value so users are forced to check it.


Full diff: https://github.com/llvm/llvm-project/pull/78182.diff

3 Files Affected:

  • (modified) openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp (+9-11)
  • (modified) openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp (+23-32)
  • (modified) openmp/libomptarget/plugins-nextgen/cuda/src/rtl.cpp (+6-5)
diff --git a/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp b/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
index e7a38a93c9acbe..086c10c06f28d5 100644
--- a/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
+++ b/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
@@ -2141,13 +2141,11 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy {
       break;
     }
 
-    if (!MemoryPool) {
-      REPORT("No memory pool for the specified allocation kind\n");
+    if (!MemoryPool)
       return OFFLOAD_FAIL;
-    }
 
     if (Error Err = MemoryPool->deallocate(TgtPtr)) {
-      REPORT("%s\n", toString(std::move(Err)).data());
+      consumeError(std::move(Err));
       return OFFLOAD_FAIL;
     }
 
@@ -3300,7 +3298,9 @@ Error Plugin::check(int32_t Code, const char *ErrFmt, ArgsTy... Args) {
   const char *Desc = "Unknown error";
   hsa_status_t Ret = hsa_status_string(ResultCode, &Desc);
   if (Ret != HSA_STATUS_SUCCESS)
-    REPORT("Unrecognized " GETNAME(TARGET_NAME) " error code %d\n", Code);
+    createStringError(inconvertibleErrorCode(),
+                      "Unrecognized " GETNAME(TARGET_NAME) " error code %d\n",
+                      Code);
 
   return createStringError<ArgsTy..., const char *>(inconvertibleErrorCode(),
                                                     ErrFmt, Args..., Desc);
@@ -3320,7 +3320,7 @@ void *AMDGPUMemoryManagerTy::allocate(size_t Size, void *HstPtr,
 
   // Allow all kernel agents to access the allocation.
   if (auto Err = MemoryPool->enableAccess(Ptr, Size, KernelAgents)) {
-    REPORT("%s\n", toString(std::move(Err)).data());
+    consumeError(std::move(Err));
     return nullptr;
   }
   return Ptr;
@@ -3346,15 +3346,13 @@ void *AMDGPUDeviceTy::allocate(size_t Size, void *, TargetAllocTy Kind) {
     break;
   }
 
-  if (!MemoryPool) {
-    REPORT("No memory pool for the specified allocation kind\n");
+  if (!MemoryPool)
     return nullptr;
-  }
 
   // Allocate from the corresponding memory pool.
   void *Alloc = nullptr;
   if (Error Err = MemoryPool->allocate(Size, &Alloc)) {
-    REPORT("%s\n", toString(std::move(Err)).data());
+    consumeError(std::move(Err));
     return nullptr;
   }
 
@@ -3365,7 +3363,7 @@ void *AMDGPUDeviceTy::allocate(size_t Size, void *, TargetAllocTy Kind) {
 
     // Enable all kernel agents to access the buffer.
     if (auto Err = MemoryPool->enableAccess(Alloc, Size, KernelAgents)) {
-      REPORT("%s\n", toString(std::move(Err)).data());
+      consumeError(std::move(Err));
       return nullptr;
     }
   }
diff --git a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp
index 1bd70b85da3414..4a1ea43ce342de 100644
--- a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp
+++ b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp
@@ -130,7 +130,7 @@ struct RecordReplayTy {
     } else if (MemoryOffset) {
       // If we are off and in a situation we cannot just "waste" memory to force
       // a match, we hope adjusting the arguments is sufficient.
-      REPORT(
+      FAILURE_MESSAGE(
           "WARNING Failed to allocate replay memory at required location %p, "
           "got %p, trying to offset argument pointers by %" PRIi64 "\n",
           VAddr, MemoryStart, MemoryOffset);
@@ -147,9 +147,9 @@ struct RecordReplayTy {
     if (Device->supportVAManagement()) {
       auto Err = preAllocateVAMemory(DeviceMemorySize, ReqVAddr);
       if (Err) {
-        REPORT("WARNING VA mapping failed, fallback to heuristic: "
-               "(Error: %s)\n",
-               toString(std::move(Err)).data());
+        FAILURE_MESSAGE("WARNING VA mapping failed, fallback to heuristic: "
+                        "(Error: %s)\n",
+                        toString(std::move(Err)).data());
       }
     }
 
@@ -848,12 +848,8 @@ GenericDeviceTy::loadBinary(GenericPluginTy &Plugin,
   DP("Load data from image " DPxMOD "\n", DPxPTR(InputTgtImage->ImageStart));
 
   auto PostJITImageOrErr = Plugin.getJIT().process(*InputTgtImage, *this);
-  if (!PostJITImageOrErr) {
-    auto Err = PostJITImageOrErr.takeError();
-    REPORT("Failure to jit IR image %p on device %d: %s\n", InputTgtImage,
-           DeviceId, toString(std::move(Err)).data());
-    return nullptr;
-  }
+  if (!PostJITImageOrErr)
+    return PostJITImageOrErr.takeError();
 
   // Load the binary and allocate the image object. Use the next available id
   // for the image id, which is the number of previously loaded images.
@@ -879,8 +875,8 @@ GenericDeviceTy::loadBinary(GenericPluginTy &Plugin,
     uint64_t HeapSize;
     auto SizeOrErr = getDeviceHeapSize(HeapSize);
     if (SizeOrErr) {
-      REPORT("No global device memory pool due to error: %s\n",
-             toString(std::move(SizeOrErr)).data());
+      return Plugin::error("No global device memory pool due to error: %s\n",
+                           toString(std::move(SizeOrErr)).data());
     } else if (auto Err = setupDeviceMemoryPool(Plugin, *Image, HeapSize))
       return std::move(Err);
   }
@@ -952,26 +948,6 @@ Error GenericDeviceTy::setupDeviceEnvironment(GenericPluginTy &Plugin,
 Error GenericDeviceTy::setupDeviceMemoryPool(GenericPluginTy &Plugin,
                                              DeviceImageTy &Image,
                                              uint64_t PoolSize) {
-  // Free the old pool, if any.
-  if (DeviceMemoryPool.Ptr) {
-    if (auto Err = dataDelete(DeviceMemoryPool.Ptr,
-                              TargetAllocTy::TARGET_ALLOC_DEVICE))
-      return Err;
-  }
-
-  DeviceMemoryPool.Size = PoolSize;
-  auto AllocOrErr = dataAlloc(PoolSize, /*HostPtr=*/nullptr,
-                              TargetAllocTy::TARGET_ALLOC_DEVICE);
-  if (AllocOrErr) {
-    DeviceMemoryPool.Ptr = *AllocOrErr;
-  } else {
-    auto Err = AllocOrErr.takeError();
-    REPORT("Failure to allocate device memory for global memory pool: %s\n",
-           toString(std::move(Err)).data());
-    DeviceMemoryPool.Ptr = nullptr;
-    DeviceMemoryPool.Size = 0;
-  }
-
   // Create the metainfo of the device environment global.
   GenericGlobalHandlerTy &GHandler = Plugin.getGlobalHandler();
   if (!GHandler.isSymbolInImage(*this, Image,
@@ -986,6 +962,21 @@ Error GenericDeviceTy::setupDeviceMemoryPool(GenericPluginTy &Plugin,
   if (auto Err = GHandler.writeGlobalToDevice(*this, Image, TrackerGlobal))
     return Err;
 
+  // Free the old pool, if any.
+  if (DeviceMemoryPool.Ptr) {
+    if (auto Err = dataDelete(DeviceMemoryPool.Ptr,
+                              TargetAllocTy::TARGET_ALLOC_DEVICE))
+      return Err;
+  }
+
+  DeviceMemoryPool.Size = PoolSize;
+  auto AllocOrErr = dataAlloc(PoolSize, /*HostPtr=*/nullptr,
+                              TargetAllocTy::TARGET_ALLOC_DEVICE);
+  if (!AllocOrErr)
+    return AllocOrErr.takeError();
+
+  DeviceMemoryPool = {*AllocOrErr, 0};
+
   // Create the metainfo of the device environment global.
   GlobalTy DevEnvGlobal("__omp_rtl_device_memory_pool",
                         sizeof(DeviceMemoryPoolTy), &DeviceMemoryPool);
diff --git a/openmp/libomptarget/plugins-nextgen/cuda/src/rtl.cpp b/openmp/libomptarget/plugins-nextgen/cuda/src/rtl.cpp
index ce6b39898ae95a..e621cf943237e2 100644
--- a/openmp/libomptarget/plugins-nextgen/cuda/src/rtl.cpp
+++ b/openmp/libomptarget/plugins-nextgen/cuda/src/rtl.cpp
@@ -545,7 +545,7 @@ struct CUDADeviceTy : public GenericDeviceTy {
       return nullptr;
 
     if (auto Err = setContext()) {
-      REPORT("Failure to alloc memory: %s\n", toString(std::move(Err)).data());
+      consumeError(std::move(Err));
       return nullptr;
     }
 
@@ -580,7 +580,7 @@ struct CUDADeviceTy : public GenericDeviceTy {
 
     if (auto Err =
             Plugin::check(Res, "Error in cuMemAlloc[Host|Managed]: %s")) {
-      REPORT("Failure to alloc memory: %s\n", toString(std::move(Err)).data());
+      consumeError(std::move(Err));
       return nullptr;
     }
     return MemAlloc;
@@ -592,7 +592,7 @@ struct CUDADeviceTy : public GenericDeviceTy {
       return OFFLOAD_SUCCESS;
 
     if (auto Err = setContext()) {
-      REPORT("Failure to free memory: %s\n", toString(std::move(Err)).data());
+      consumeError(std::move(Err));
       return OFFLOAD_FAIL;
     }
 
@@ -618,7 +618,7 @@ struct CUDADeviceTy : public GenericDeviceTy {
     }
 
     if (auto Err = Plugin::check(Res, "Error in cuMemFree[Host]: %s")) {
-      REPORT("Failure to free memory: %s\n", toString(std::move(Err)).data());
+      consumeError(std::move(Err));
       return OFFLOAD_FAIL;
     }
     return OFFLOAD_SUCCESS;
@@ -1503,7 +1503,8 @@ Error Plugin::check(int32_t Code, const char *ErrFmt, ArgsTy... Args) {
   const char *Desc = "Unknown error";
   CUresult Ret = cuGetErrorString(ResultCode, &Desc);
   if (Ret != CUDA_SUCCESS)
-    REPORT("Unrecognized " GETNAME(TARGET_NAME) " error code %d\n", Code);
+    createStringError(inconvertibleErrorCode(),
+                      "Unrecognized CUDA error code %d\n", Code);
 
   return createStringError<ArgsTy..., const char *>(inconvertibleErrorCode(),
                                                     ErrFmt, Args..., Desc);

Summary:
The `REPORT` macro is coded to `abort()` on non-debug builds. There is a
lot of code scattered around that uses this to spuriously halt execution
in the middle of the plugin. This patch simply tries to replace them
with actual error messages or just consumes them.

The main offender here is the `malloc` and `free` interface. Right now
I'm just consuming them, which is consisent with how `free` and `malloc`
work. But I think the way forward is to make these functions also return
an expected value so users are forced to check it.
@jhuber6
Copy link
Contributor Author

jhuber6 commented Jan 15, 2024

@ronlieb this should apply cleanly downstream.

@shiltian
Copy link
Contributor

The early abortion can help to locate the error much easier because we don't have to wait for the handling of offloading failure that might happen much later than when and where it happens.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Jan 15, 2024

The early abortion can help to locate the error much easier because we don't have to wait for the handling of offloading failure that might happen much later than when and where it happens.

A runtime rolling over and dying when something goes wrong is generally unacceptable, but currently most of these will still die at the plugin boundary. The one concession is the allocation bits, I could probably keep those messages around with a debugging message. I think in the future we should make these allocation routines return an error as well so it forces the user to handle the nullptr case explicitly.

@shiltian
Copy link
Contributor

shiltian commented Jan 15, 2024

A runtime rolling over and dying when something goes wrong is generally unacceptable, but currently most of these will still die at the plugin boundary.

I agree. I think generally in debug build we might want to abort immediately to make it easy to pinpoint where things start to go south. In release build we can just report the error and return failure, etc., and eventually libomptarget will crash due to error.

@jplehr
Copy link
Contributor

jplehr commented Jan 16, 2024

A runtime rolling over and dying when something goes wrong is generally unacceptable, but currently most of these will still die at the plugin boundary.

I agree. I think generally in debug build we might want to abort immediately to make it easy to pinpoint where things start to go south. In release build we can just report the error and return failure, etc., and eventually libomptarget will crash due to error.

With a debug build, do you mean LIBOMPTARGET_DEBUG or CMake Debug build?
If we want to go that route, can we potentially not tie the ability to debug-print stuff and die in case of error? I think it would be good to be able to have a debug log w/o enabling dying immediately. Like have LIBOMPTARGET_DEBUG_ABORT_EARLY or something.

@jdoerfert
Copy link
Member

Generally, fewer switches please, especially CMAKE.

I believe this is conceptually the right way to go. That said, our errors are not great, they should contain source information from where they were fired. That would make the pinpointing similarly easy. I can see it might be a little more involved in 1-2 places, but a single consistent error model is better than 2, or more.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Jan 16, 2024

Generally, fewer switches please, especially CMAKE.

I believe this is conceptually the right way to go. That said, our errors are not great, they should contain source information from where they were fired. That would make the pinpointing similarly easy. I can see it might be a little more involved in 1-2 places, but a single consistent error model is better than 2, or more.

I've been thinking about this. First off I think we should ditch LIBOMPTARGET_DEBUG entirely and just key off of NDEBUG like assertions do. Then we can improve the DP macro to make it easier to use.

Right now we make every error use inconvertibleErrorCode() which could be replaced with an actual value at a minimum. We could also make the interface automatically insert source location information if present. I think the easiest way to do this is to have TLS that contains the source location at a top level, then the rest of the runtime can simply read from it if it's been set. One difficulty would be how to present these errors to the user as part of an API, perhaps we return an index into some table that contains the last error message printed from the error type?

I think we should focus on this once we find a good way to remove the dlopen barrier between the plugins, primarily because it would allow us to have a single state space within the library.

That being said, I think this patch is a reasonable step forward. Future work would be an error handling overhaul.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants