-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Libomptarget] Minimize use of 'REPORT' in the plugin interface #78182
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-backend-amdgpu Author: Joseph Huber (jhuber6) ChangesSummary: The main offender here is the Full diff: https://github.com/llvm/llvm-project/pull/78182.diff 3 Files Affected:
diff --git a/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp b/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
index e7a38a93c9acbe..086c10c06f28d5 100644
--- a/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
+++ b/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
@@ -2141,13 +2141,11 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy {
break;
}
- if (!MemoryPool) {
- REPORT("No memory pool for the specified allocation kind\n");
+ if (!MemoryPool)
return OFFLOAD_FAIL;
- }
if (Error Err = MemoryPool->deallocate(TgtPtr)) {
- REPORT("%s\n", toString(std::move(Err)).data());
+ consumeError(std::move(Err));
return OFFLOAD_FAIL;
}
@@ -3300,7 +3298,9 @@ Error Plugin::check(int32_t Code, const char *ErrFmt, ArgsTy... Args) {
const char *Desc = "Unknown error";
hsa_status_t Ret = hsa_status_string(ResultCode, &Desc);
if (Ret != HSA_STATUS_SUCCESS)
- REPORT("Unrecognized " GETNAME(TARGET_NAME) " error code %d\n", Code);
+ createStringError(inconvertibleErrorCode(),
+ "Unrecognized " GETNAME(TARGET_NAME) " error code %d\n",
+ Code);
return createStringError<ArgsTy..., const char *>(inconvertibleErrorCode(),
ErrFmt, Args..., Desc);
@@ -3320,7 +3320,7 @@ void *AMDGPUMemoryManagerTy::allocate(size_t Size, void *HstPtr,
// Allow all kernel agents to access the allocation.
if (auto Err = MemoryPool->enableAccess(Ptr, Size, KernelAgents)) {
- REPORT("%s\n", toString(std::move(Err)).data());
+ consumeError(std::move(Err));
return nullptr;
}
return Ptr;
@@ -3346,15 +3346,13 @@ void *AMDGPUDeviceTy::allocate(size_t Size, void *, TargetAllocTy Kind) {
break;
}
- if (!MemoryPool) {
- REPORT("No memory pool for the specified allocation kind\n");
+ if (!MemoryPool)
return nullptr;
- }
// Allocate from the corresponding memory pool.
void *Alloc = nullptr;
if (Error Err = MemoryPool->allocate(Size, &Alloc)) {
- REPORT("%s\n", toString(std::move(Err)).data());
+ consumeError(std::move(Err));
return nullptr;
}
@@ -3365,7 +3363,7 @@ void *AMDGPUDeviceTy::allocate(size_t Size, void *, TargetAllocTy Kind) {
// Enable all kernel agents to access the buffer.
if (auto Err = MemoryPool->enableAccess(Alloc, Size, KernelAgents)) {
- REPORT("%s\n", toString(std::move(Err)).data());
+ consumeError(std::move(Err));
return nullptr;
}
}
diff --git a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp
index 1bd70b85da3414..4a1ea43ce342de 100644
--- a/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp
+++ b/openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp
@@ -130,7 +130,7 @@ struct RecordReplayTy {
} else if (MemoryOffset) {
// If we are off and in a situation we cannot just "waste" memory to force
// a match, we hope adjusting the arguments is sufficient.
- REPORT(
+ FAILURE_MESSAGE(
"WARNING Failed to allocate replay memory at required location %p, "
"got %p, trying to offset argument pointers by %" PRIi64 "\n",
VAddr, MemoryStart, MemoryOffset);
@@ -147,9 +147,9 @@ struct RecordReplayTy {
if (Device->supportVAManagement()) {
auto Err = preAllocateVAMemory(DeviceMemorySize, ReqVAddr);
if (Err) {
- REPORT("WARNING VA mapping failed, fallback to heuristic: "
- "(Error: %s)\n",
- toString(std::move(Err)).data());
+ FAILURE_MESSAGE("WARNING VA mapping failed, fallback to heuristic: "
+ "(Error: %s)\n",
+ toString(std::move(Err)).data());
}
}
@@ -848,12 +848,8 @@ GenericDeviceTy::loadBinary(GenericPluginTy &Plugin,
DP("Load data from image " DPxMOD "\n", DPxPTR(InputTgtImage->ImageStart));
auto PostJITImageOrErr = Plugin.getJIT().process(*InputTgtImage, *this);
- if (!PostJITImageOrErr) {
- auto Err = PostJITImageOrErr.takeError();
- REPORT("Failure to jit IR image %p on device %d: %s\n", InputTgtImage,
- DeviceId, toString(std::move(Err)).data());
- return nullptr;
- }
+ if (!PostJITImageOrErr)
+ return PostJITImageOrErr.takeError();
// Load the binary and allocate the image object. Use the next available id
// for the image id, which is the number of previously loaded images.
@@ -879,8 +875,8 @@ GenericDeviceTy::loadBinary(GenericPluginTy &Plugin,
uint64_t HeapSize;
auto SizeOrErr = getDeviceHeapSize(HeapSize);
if (SizeOrErr) {
- REPORT("No global device memory pool due to error: %s\n",
- toString(std::move(SizeOrErr)).data());
+ return Plugin::error("No global device memory pool due to error: %s\n",
+ toString(std::move(SizeOrErr)).data());
} else if (auto Err = setupDeviceMemoryPool(Plugin, *Image, HeapSize))
return std::move(Err);
}
@@ -952,26 +948,6 @@ Error GenericDeviceTy::setupDeviceEnvironment(GenericPluginTy &Plugin,
Error GenericDeviceTy::setupDeviceMemoryPool(GenericPluginTy &Plugin,
DeviceImageTy &Image,
uint64_t PoolSize) {
- // Free the old pool, if any.
- if (DeviceMemoryPool.Ptr) {
- if (auto Err = dataDelete(DeviceMemoryPool.Ptr,
- TargetAllocTy::TARGET_ALLOC_DEVICE))
- return Err;
- }
-
- DeviceMemoryPool.Size = PoolSize;
- auto AllocOrErr = dataAlloc(PoolSize, /*HostPtr=*/nullptr,
- TargetAllocTy::TARGET_ALLOC_DEVICE);
- if (AllocOrErr) {
- DeviceMemoryPool.Ptr = *AllocOrErr;
- } else {
- auto Err = AllocOrErr.takeError();
- REPORT("Failure to allocate device memory for global memory pool: %s\n",
- toString(std::move(Err)).data());
- DeviceMemoryPool.Ptr = nullptr;
- DeviceMemoryPool.Size = 0;
- }
-
// Create the metainfo of the device environment global.
GenericGlobalHandlerTy &GHandler = Plugin.getGlobalHandler();
if (!GHandler.isSymbolInImage(*this, Image,
@@ -986,6 +962,21 @@ Error GenericDeviceTy::setupDeviceMemoryPool(GenericPluginTy &Plugin,
if (auto Err = GHandler.writeGlobalToDevice(*this, Image, TrackerGlobal))
return Err;
+ // Free the old pool, if any.
+ if (DeviceMemoryPool.Ptr) {
+ if (auto Err = dataDelete(DeviceMemoryPool.Ptr,
+ TargetAllocTy::TARGET_ALLOC_DEVICE))
+ return Err;
+ }
+
+ DeviceMemoryPool.Size = PoolSize;
+ auto AllocOrErr = dataAlloc(PoolSize, /*HostPtr=*/nullptr,
+ TargetAllocTy::TARGET_ALLOC_DEVICE);
+ if (!AllocOrErr)
+ return AllocOrErr.takeError();
+
+ DeviceMemoryPool = {*AllocOrErr, 0};
+
// Create the metainfo of the device environment global.
GlobalTy DevEnvGlobal("__omp_rtl_device_memory_pool",
sizeof(DeviceMemoryPoolTy), &DeviceMemoryPool);
diff --git a/openmp/libomptarget/plugins-nextgen/cuda/src/rtl.cpp b/openmp/libomptarget/plugins-nextgen/cuda/src/rtl.cpp
index ce6b39898ae95a..e621cf943237e2 100644
--- a/openmp/libomptarget/plugins-nextgen/cuda/src/rtl.cpp
+++ b/openmp/libomptarget/plugins-nextgen/cuda/src/rtl.cpp
@@ -545,7 +545,7 @@ struct CUDADeviceTy : public GenericDeviceTy {
return nullptr;
if (auto Err = setContext()) {
- REPORT("Failure to alloc memory: %s\n", toString(std::move(Err)).data());
+ consumeError(std::move(Err));
return nullptr;
}
@@ -580,7 +580,7 @@ struct CUDADeviceTy : public GenericDeviceTy {
if (auto Err =
Plugin::check(Res, "Error in cuMemAlloc[Host|Managed]: %s")) {
- REPORT("Failure to alloc memory: %s\n", toString(std::move(Err)).data());
+ consumeError(std::move(Err));
return nullptr;
}
return MemAlloc;
@@ -592,7 +592,7 @@ struct CUDADeviceTy : public GenericDeviceTy {
return OFFLOAD_SUCCESS;
if (auto Err = setContext()) {
- REPORT("Failure to free memory: %s\n", toString(std::move(Err)).data());
+ consumeError(std::move(Err));
return OFFLOAD_FAIL;
}
@@ -618,7 +618,7 @@ struct CUDADeviceTy : public GenericDeviceTy {
}
if (auto Err = Plugin::check(Res, "Error in cuMemFree[Host]: %s")) {
- REPORT("Failure to free memory: %s\n", toString(std::move(Err)).data());
+ consumeError(std::move(Err));
return OFFLOAD_FAIL;
}
return OFFLOAD_SUCCESS;
@@ -1503,7 +1503,8 @@ Error Plugin::check(int32_t Code, const char *ErrFmt, ArgsTy... Args) {
const char *Desc = "Unknown error";
CUresult Ret = cuGetErrorString(ResultCode, &Desc);
if (Ret != CUDA_SUCCESS)
- REPORT("Unrecognized " GETNAME(TARGET_NAME) " error code %d\n", Code);
+ createStringError(inconvertibleErrorCode(),
+ "Unrecognized CUDA error code %d\n", Code);
return createStringError<ArgsTy..., const char *>(inconvertibleErrorCode(),
ErrFmt, Args..., Desc);
|
Summary: The `REPORT` macro is coded to `abort()` on non-debug builds. There is a lot of code scattered around that uses this to spuriously halt execution in the middle of the plugin. This patch simply tries to replace them with actual error messages or just consumes them. The main offender here is the `malloc` and `free` interface. Right now I'm just consuming them, which is consisent with how `free` and `malloc` work. But I think the way forward is to make these functions also return an expected value so users are forced to check it.
@ronlieb this should apply cleanly downstream. |
The early abortion can help to locate the error much easier because we don't have to wait for the handling of offloading failure that might happen much later than when and where it happens. |
A runtime rolling over and dying when something goes wrong is generally unacceptable, but currently most of these will still die at the plugin boundary. The one concession is the allocation bits, I could probably keep those messages around with a debugging message. I think in the future we should make these allocation routines return an error as well so it forces the user to handle the |
I agree. I think generally in debug build we might want to abort immediately to make it easy to pinpoint where things start to go south. In release build we can just report the error and return failure, etc., and eventually |
With a debug build, do you mean |
Generally, fewer switches please, especially CMAKE. I believe this is conceptually the right way to go. That said, our errors are not great, they should contain source information from where they were fired. That would make the pinpointing similarly easy. I can see it might be a little more involved in 1-2 places, but a single consistent error model is better than 2, or more. |
I've been thinking about this. First off I think we should ditch Right now we make every error use I think we should focus on this once we find a good way to remove the That being said, I think this patch is a reasonable step forward. Future work would be an error handling overhaul. |
Summary:
The
REPORT
macro is coded toabort()
on non-debug builds. There is alot of code scattered around that uses this to spuriously halt execution
in the middle of the plugin. This patch simply tries to replace them
with actual error messages or just consumes them.
The main offender here is the
malloc
andfree
interface. Right nowI'm just consuming them, which is consisent with how
free
andmalloc
work. But I think the way forward is to make these functions also return
an expected value so users are forced to check it.