Skip to content

Commit

Permalink
[OpenMP][FIX] Avoid races in the handling of to be deleted mapping en…
Browse files Browse the repository at this point in the history
…tries

If we decided to delete a mapping entry we did not act on it right away
but first issued and waited for memory copies. In the meantime some
other thread might reuse the entry. While there was some logic to avoid
colliding on the actual "deletion" part, there were two races happening:

1) The data transfer back of the thread deleting the entry and
   the data transfer back of the thread taking over the entry raced.
2) The update to the shadow map happened regardless if the entry was
   actually reused by another thread which left the shadow map in a
   inconsistent state.

To fix both issues we will now update the shadow map and delete the
entry only if we are sure the thread is responsible for deletion, hence
no other thread took over the entry and reused it. We also wait for a
potential former data transfer from the device to finish before we issue
another one that would race with it.

Fixes #54216

Differential Revision: https://reviews.llvm.org/D121058
  • Loading branch information
jdoerfert committed Mar 29, 2022
1 parent ba93e4e commit b316126
Show file tree
Hide file tree
Showing 4 changed files with 214 additions and 104 deletions.
40 changes: 30 additions & 10 deletions openmp/libomptarget/include/device.h
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
#include <memory>
#include <mutex>
#include <set>
#include <thread>
#include <vector>

#include "ExclusiveAccess.h"
Expand Down Expand Up @@ -60,7 +61,8 @@ struct HostDataToTargetTy {
struct StatesTy {
StatesTy(uint64_t DRC, uint64_t HRC)
: DynRefCount(DRC), HoldRefCount(HRC),
MayContainAttachedPointers(false) {}
MayContainAttachedPointers(false), DeleteThreadId(std::thread::id()) {
}
/// The dynamic reference count is the standard reference count as of OpenMP
/// 4.5. The hold reference count is an OpenMP extension for the sake of
/// OpenACC support.
Expand Down Expand Up @@ -98,6 +100,14 @@ struct HostDataToTargetTy {
/// mechanism for D2H, and if the event cannot be shared between them, Event
/// should be written as <tt>void *Event[2]</tt>.
void *Event = nullptr;

/// The id of the thread responsible for deleting this entry. This thread
/// set the reference count to zero *last*. Other threads might reuse the
/// entry while it is marked for deletion but not yet deleted (e.g., the
/// data is still being moved back). If another thread reuses the entry we
/// will have a non-zero reference count *or* the thread will have changed
/// this id, effectively taking over deletion responsibility.
std::thread::id DeleteThreadId;
};
// When HostDataToTargetTy is used by std::set, std::set::iterator is const
// use unique_ptr to make States mutable.
Expand Down Expand Up @@ -138,6 +148,14 @@ struct HostDataToTargetTy {
/// Returns OFFLOAD_FAIL if something went wrong, OFFLOAD_SUCCESS otherwise.
int addEventIfNecessary(DeviceTy &Device, AsyncInfoTy &AsyncInfo) const;

/// Indicate that the current thread expected to delete this entry.
void setDeleteThreadId() const {
States->DeleteThreadId = std::this_thread::get_id();
}

/// Return the thread id of the thread expected to delete this entry.
std::thread::id getDeleteThreadId() const { return States->DeleteThreadId; }

/// Set the event bound to this data map.
void setEvent(void *Event) const { States->Event = Event; }

Expand Down Expand Up @@ -172,7 +190,7 @@ struct HostDataToTargetTy {
if (ThisRefCount > 0)
--ThisRefCount;
else
assert(OtherRefCount > 0 && "total refcount underflow");
assert(OtherRefCount >= 0 && "total refcount underflow");
}
return getTotalRefCount();
}
Expand Down Expand Up @@ -362,14 +380,16 @@ struct DeviceTy {
bool UseHoldRefCount, bool &IsHostPtr,
bool MustContain = false,
bool ForceDelete = false);
/// For the map entry for \p HstPtrBegin, decrement the reference count
/// specified by \p HasHoldModifier and, if the the total reference count is
/// then zero, deallocate the corresponding device storage and remove the map
/// entry. Return \c OFFLOAD_SUCCESS if the map entry existed, and return
/// \c OFFLOAD_FAIL if not. It is the caller's responsibility to skip calling
/// this function if the map entry is not expected to exist because
/// \p HstPtrBegin uses shared memory.
int deallocTgtPtr(void *HstPtrBegin, int64_t Size, bool HasHoldModifier);

/// Deallocate \p LR and remove the entry. Assume the total reference count is
/// zero and the calling thread is the deleting thread for \p LR. \p HDTTMap
/// ensure the caller holds exclusive access and can modify the map. Return \c
/// OFFLOAD_SUCCESS if the map entry existed, and return \c OFFLOAD_FAIL if
/// not. It is the caller's responsibility to skip calling this function if
/// the map entry is not expected to exist because \p HstPtrBegin uses shared
/// memory.
int deallocTgtPtr(HDTTMapAccessorTy &HDTTMap, LookupResult LR, int64_t Size);

int associatePtr(void *HstPtrBegin, void *TgtPtrBegin, int64_t Size);
int disassociatePtr(void *HstPtrBegin);

Expand Down
98 changes: 52 additions & 46 deletions openmp/libomptarget/src/device.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
#include <cstdint>
#include <cstdio>
#include <string>
#include <thread>

int HostDataToTargetTy::addEventIfNecessary(DeviceTy &Device,
AsyncInfoTy &AsyncInfo) const {
Expand Down Expand Up @@ -207,9 +208,10 @@ TargetPointerResultTy DeviceTy::getTargetPointer(
((LR.Flags.ExtendsBefore || LR.Flags.ExtendsAfter) && IsImplicit)) {
auto &HT = *LR.Entry;
const char *RefCountAction;
assert(HT.getTotalRefCount() > 0 && "expected existing RefCount > 0");
if (UpdateRefCount) {
// After this, RefCount > 1.
// After this, reference count >= 1. If the reference count was 0 but the
// entry was still there we can reuse the data on the device and avoid a
// new submission.
HT.incRefCount(HasHoldModifier);
RefCountAction = " (incremented)";
} else {
Expand Down Expand Up @@ -349,27 +351,30 @@ DeviceTy::getTgtPtrBegin(void *HstPtrBegin, int64_t Size, bool &IsLast,
if (lr.Flags.IsContained ||
(!MustContain && (lr.Flags.ExtendsBefore || lr.Flags.ExtendsAfter))) {
auto &HT = *lr.Entry;
// We do not zero the total reference count here. deallocTgtPtr does that
// atomically with removing the mapping. Otherwise, before this thread
// removed the mapping in deallocTgtPtr, another thread could retrieve the
// mapping, increment and decrement back to zero, and then both threads
// would try to remove the mapping, resulting in a double free.
IsLast = HT.decShouldRemove(UseHoldRefCount, ForceDelete);
const char *RefCountAction;
if (!UpdateRefCount) {
RefCountAction = " (update suppressed)";
} else if (ForceDelete) {

if (ForceDelete) {
HT.resetRefCount(UseHoldRefCount);
assert(IsLast == HT.decShouldRemove(UseHoldRefCount) &&
"expected correct IsLast prediction for reset");
if (IsLast)
RefCountAction = " (reset, deferred final decrement)";
else {
HT.decRefCount(UseHoldRefCount);
RefCountAction = " (reset)";
}
}

const char *RefCountAction;
if (!UpdateRefCount) {
RefCountAction = " (update suppressed)";
} else if (IsLast) {
RefCountAction = " (deferred final decrement)";
// Mark the entry as to be deleted by this thread. Another thread might
// reuse the entry and take "ownership" for the deletion while this thread
// is waiting for data transfers. That is fine and the current thread will
// simply skip the deletion step then.
HT.setDeleteThreadId();
HT.decRefCount(UseHoldRefCount);
assert(HT.getTotalRefCount() == 0 &&
"Expected zero reference count when deletion is scheduled");
if (ForceDelete)
RefCountAction = " (reset, delayed deletion)";
else
RefCountAction = " (decremented, delayed deletion)";
} else {
HT.decRefCount(UseHoldRefCount);
RefCountAction = " (decremented)";
Expand Down Expand Up @@ -411,37 +416,38 @@ void *DeviceTy::getTgtPtrBegin(HDTTMapAccessorTy &HDTTMap, void *HstPtrBegin,
return NULL;
}

int DeviceTy::deallocTgtPtr(void *HstPtrBegin, int64_t Size,
bool HasHoldModifier) {
HDTTMapAccessorTy HDTTMap = HostDataToTargetMap.getExclusiveAccessor();

int DeviceTy::deallocTgtPtr(HDTTMapAccessorTy &HDTTMap, LookupResult LR,
int64_t Size) {
// Check if the pointer is contained in any sub-nodes.
int Ret = OFFLOAD_SUCCESS;
LookupResult lr = lookupMapping(HDTTMap, HstPtrBegin, Size);
if (lr.Flags.IsContained || lr.Flags.ExtendsBefore || lr.Flags.ExtendsAfter) {
auto &HT = *lr.Entry;
if (HT.decRefCount(HasHoldModifier) == 0) {
DP("Deleting tgt data " DPxMOD " of size %" PRId64 "\n",
DPxPTR(HT.TgtPtrBegin), Size);
deleteData((void *)HT.TgtPtrBegin);
INFO(OMP_INFOTYPE_MAPPING_CHANGED, DeviceID,
"Removing map entry with HstPtrBegin=" DPxMOD ", TgtPtrBegin=" DPxMOD
", Size=%" PRId64 ", Name=%s\n",
DPxPTR(HT.HstPtrBegin), DPxPTR(HT.TgtPtrBegin), Size,
(HT.HstPtrName) ? getNameFromMapping(HT.HstPtrName).c_str()
: "unknown");
void *Event = lr.Entry->getEvent();
HDTTMap->erase(lr.Entry);
delete lr.Entry;
if (Event && destroyEvent(Event) != OFFLOAD_SUCCESS) {
REPORT("Failed to destroy event " DPxMOD "\n", DPxPTR(Event));
Ret = OFFLOAD_FAIL;
}
}
} else {
if (!(LR.Flags.IsContained || LR.Flags.ExtendsBefore ||
LR.Flags.ExtendsAfter)) {
REPORT("Section to delete (hst addr " DPxMOD ") does not exist in the"
" allocated memory\n",
DPxPTR(HstPtrBegin));
DPxPTR(LR.Entry->HstPtrBegin));
return OFFLOAD_FAIL;
}

auto &HT = *LR.Entry;
// Verify this thread is still in charge of deleting the entry.
assert(HT.getTotalRefCount() == 0 &&
HT.getDeleteThreadId() == std::this_thread::get_id() &&
"Trying to delete entry that is in use or owned by another thread.");

DP("Deleting tgt data " DPxMOD " of size %" PRId64 "\n",
DPxPTR(HT.TgtPtrBegin), Size);
deleteData((void *)HT.TgtPtrBegin);
INFO(OMP_INFOTYPE_MAPPING_CHANGED, DeviceID,
"Removing map entry with HstPtrBegin=" DPxMOD ", TgtPtrBegin=" DPxMOD
", Size=%" PRId64 ", Name=%s\n",
DPxPTR(HT.HstPtrBegin), DPxPTR(HT.TgtPtrBegin), Size,
(HT.HstPtrName) ? getNameFromMapping(HT.HstPtrName).c_str() : "unknown");
void *Event = LR.Entry->getEvent();
HDTTMap->erase(LR.Entry);
delete LR.Entry;

int Ret = OFFLOAD_SUCCESS;
if (Event && destroyEvent(Event) != OFFLOAD_SUCCESS) {
REPORT("Failed to destroy event " DPxMOD "\n", DPxPTR(Event));
Ret = OFFLOAD_FAIL;
}

Expand Down
Loading

0 comments on commit b316126

Please sign in to comment.