-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[Offload] Initialize all platforms before plugin device creation #160702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-offload Author: Piotr Balcer (pbalcer) ChangesDevices store a raw pointer to back to their owning Platform. Platforms are stored directly inside of a vector. Modifying this vector risks invalidating all the platform pointers stored in devices. This change moves the host platform emplace to happen before device object creation, eliminating the risk of vector reallocation after platform pointers are set. This popped up when trying to use #158900 with liboffload. The platforms vector was reallocated and liboffload segfaulted when iterating over the platforms. This is at attempt at the simplest possible fix. Since the number of platforms is fixed, I initially wanted to refactor this code so that it became impossible to modify the platforms container after Context is created, but, when that patch grew large, I pivoted to this instead. Another option is to use shared_ptr/weak_ptr to allocate the Platform instead of storing it in the vector directly. Given, #159636, which requires a non-trivial change to this same code anyway, I'm thinking I can do follow-up refactor that tackles both issues at the same time. Full diff: https://github.com/llvm/llvm-project/pull/160702.diff 1 Files Affected:
diff --git a/offload/liboffload/src/OffloadImpl.cpp b/offload/liboffload/src/OffloadImpl.cpp
index 08a2e25b97d85..0701ff12c4953 100644
--- a/offload/liboffload/src/OffloadImpl.cpp
+++ b/offload/liboffload/src/OffloadImpl.cpp
@@ -256,9 +256,18 @@ Error initPlugins(OffloadContext &Context) {
pluginNameToBackend(#Name)}); \
} while (false);
#include "Shared/Targets.def"
+ // Host platform must be added before initializing plugin devices because
+ // devices contain a pointer back to the owning platform, and modifying the
+ // Platforms vector risks reallocating the underlying storage, thus invalidating
+ // all the platform pointers.
+ Context.Platforms.emplace_back(
+ ol_platform_impl_t{nullptr, OL_PLATFORM_BACKEND_HOST});
// Preemptively initialize all devices in the plugin
for (auto &Platform : Context.Platforms) {
+ if (Platform.BackendType == OL_PLATFORM_BACKEND_HOST)
+ continue;
+
auto Err = Platform.Plugin->init();
[[maybe_unused]] std::string InfoMsg = toString(std::move(Err));
for (auto DevNum = 0; DevNum < Platform.Plugin->number_of_devices();
@@ -273,10 +282,11 @@ Error initPlugins(OffloadContext &Context) {
}
}
}
+ // The Context.Platforms cannot be modified after this point without updating
+ // all the Device.Platform pointers.
// Add the special host device
- auto &HostPlatform = Context.Platforms.emplace_back(
- ol_platform_impl_t{nullptr, OL_PLATFORM_BACKEND_HOST});
+ auto &HostPlatform = Context.Platforms.back();
HostPlatform.Devices.emplace_back(
std::make_unique<ol_device_impl_t>(-1, nullptr, nullptr, InfoTreeNode{}));
Context.HostDevice()->Platform = &HostPlatform;
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How difficult would it be to just make SmallVector<ol_platform_impl_t, 4> Platforms{};
take a unique pointer instead? In the future I really want to remove this restriction that all platforms and devices must be initialized at init time, as it's extremely expensive if you don't end up needing them.
I was thinking of
I did see #159636, maybe we could represent that as a fixed-size collection of optionals that are initialized on first use? Given we have an enum of backends ( |
Devices store a raw pointer to back to their owning Platform. Platforms are stored directly inside of a vector. Modifying this vector risks invalidating all the platform pointers stored in devices. This change moves the host platform emplace to happen before device object creation, eliminating the risk of vector reallocation after platform pointers are set.
I figured we'd just keep a reference since the platform outlives all the devices, but I don't know the code intimately.
Since all access goes through this, I was wondering if we turned it first into a smart pointers we could then wrap that interface with some kind of lazy initializer that checks if it's active and does so on first use. Would mean we have some fixed overhead on every single API call, but I'm assuming it's worthwhile. |
Closing in favor of #160888 |
Devices store a raw pointer to back to their owning Platform. Platforms are stored directly inside of a vector. Modifying this vector risks invalidating all the platform pointers stored in devices.
This change moves the host platform emplace to happen before device object creation, eliminating the risk of vector reallocation after platform pointers are set.
This popped up when trying to use #158900 with liboffload. The platforms vector was reallocated and liboffload segfaulted when iterating over the platforms. This is an attempt at the simplest possible fix.
Since the number of platforms is fixed, I initially wanted to refactor this code so that it became impossible to modify the platforms container after Context is created, but, when that patch grew large, I pivoted to this instead. Another option is to use shared_ptr/weak_ptr to allocate the Platform instead of storing it in the vector directly.
Given, #159636, which requires a non-trivial change to this same code anyway, I'm thinking I can do follow-up refactor that tackles both issues at the same time.
Thoughts?