forked from intel/llvm
-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specifying the implications of profiling a graph submission. #52
Labels
Graph Specification
Extension Specification related
Comments
EwanC
added a commit
that referenced
this issue
Mar 10, 2023
Elaborates on what the semantics are of queries on events returned by executable graph submission. The event enters the running state when the first node starts executing on device, and completes once the last node finishes. Since a graph could be divided up into multiple PI commands-buffers that may have work scheduled in-between this could give a pessimistic view of graph execution time, however I don't think we can guarantee better. In the future we could add explicit graph profiling nodes for a more fine grained view of performance, but that's out of scope for now. Closes Issue #52
EwanC
added a commit
that referenced
this issue
Mar 16, 2023
Elaborates on what the semantics are of queries on events returned by executable graph submission. The event enters the running state when the first node starts executing on device, and completes once the last node finishes. Since a graph could be divided up into multiple PI commands-buffers that may have work scheduled in-between this could give a pessimistic view of graph execution time, however I don't think we can guarantee better. In the future we could add explicit graph profiling nodes for a more fine grained view of performance, but that's out of scope for now. Closes Issue #52
reble
pushed a commit
that referenced
this issue
May 15, 2023
…callback The `TypeSystemMap::m_mutex` guards against concurrent modifications of members of `TypeSystemMap`. In particular, `m_map`. `TypeSystemMap::ForEach` iterates through the entire `m_map` calling a user-specified callback for each entry. This is all done while `m_mutex` is locked. However, there's nothing that guarantees that the callback itself won't call back into `TypeSystemMap` APIs on the same thread. This lead to double-locking `m_mutex`, which is undefined behaviour. We've seen this cause a deadlock in the swift plugin with following backtrace: ``` int main() { std::unique_ptr<int> up = std::make_unique<int>(5); volatile int val = *up; return val; } clang++ -std=c++2a -g -O1 main.cpp ./bin/lldb -o “br se -p return” -o run -o “v *up” -o “expr *up” -b ``` ``` frame #4: std::lock_guard<std::mutex>::lock_guard frame #5: lldb_private::TypeSystemMap::GetTypeSystemForLanguage <<<< Lock #2 frame #6: lldb_private::TypeSystemMap::GetTypeSystemForLanguage frame #7: lldb_private::Target::GetScratchTypeSystemForLanguage ... frame #26: lldb_private::SwiftASTContext::LoadLibraryUsingPaths frame #27: lldb_private::SwiftASTContext::LoadModule frame #30: swift::ModuleDecl::collectLinkLibraries frame #31: lldb_private::SwiftASTContext::LoadModule frame #34: lldb_private::SwiftASTContext::GetCompileUnitImportsImpl frame #35: lldb_private::SwiftASTContext::PerformCompileUnitImports frame #36: lldb_private::TypeSystemSwiftTypeRefForExpressions::GetSwiftASTContext frame #37: lldb_private::TypeSystemSwiftTypeRefForExpressions::GetPersistentExpressionState frame #38: lldb_private::Target::GetPersistentSymbol frame #41: lldb_private::TypeSystemMap::ForEach <<<< Lock #1 frame #42: lldb_private::Target::GetPersistentSymbol frame #43: lldb_private::IRExecutionUnit::FindInUserDefinedSymbols frame #44: lldb_private::IRExecutionUnit::FindSymbol frame #45: lldb_private::IRExecutionUnit::MemoryManager::GetSymbolAddressAndPresence frame #46: lldb_private::IRExecutionUnit::MemoryManager::findSymbol frame #47: non-virtual thunk to lldb_private::IRExecutionUnit::MemoryManager::findSymbol frame #48: llvm::LinkingSymbolResolver::findSymbol frame #49: llvm::LegacyJITSymbolResolver::lookup frame #50: llvm::RuntimeDyldImpl::resolveExternalSymbols frame #51: llvm::RuntimeDyldImpl::resolveRelocations frame #52: llvm::MCJIT::finalizeLoadedModules frame #53: llvm::MCJIT::finalizeObject frame #54: lldb_private::IRExecutionUnit::ReportAllocations frame #55: lldb_private::IRExecutionUnit::GetRunnableInfo frame #56: lldb_private::ClangExpressionParser::PrepareForExecution frame #57: lldb_private::ClangUserExpression::TryParse frame #58: lldb_private::ClangUserExpression::Parse ``` Our solution is to simply iterate over a local copy of `m_map`. **Testing** * Confirmed on manual reproducer (would reproduce 100% of the time before the patch) Differential Revision: https://reviews.llvm.org/D149949
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Motivated by feedback at - intel#5626 (comment)
We should specify what it means to get profiling information from the event returned from the queue submission of a graph. All the wording in the core SYCL spec around profiling is centred on command-groups - https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#table.event.profilinginfo - so we can't defer to that.
The text was updated successfully, but these errors were encountered: