Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specifying the implications of profiling a graph submission. #52

Closed
EwanC opened this issue Dec 12, 2022 · 0 comments
Closed

Specifying the implications of profiling a graph submission. #52

EwanC opened this issue Dec 12, 2022 · 0 comments
Assignees
Labels
Graph Specification Extension Specification related

Comments

@EwanC
Copy link
Collaborator

EwanC commented Dec 12, 2022

Motivated by feedback at - intel#5626 (comment)

We should specify what it means to get profiling information from the event returned from the queue submission of a graph. All the wording in the core SYCL spec around profiling is centred on command-groups - https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#table.event.profilinginfo - so we can't defer to that.

@EwanC EwanC added the Graph Specification Extension Specification related label Dec 12, 2022
EwanC added a commit that referenced this issue Mar 10, 2023
Elaborates on what the semantics are of queries on events
returned by executable graph submission.

The event enters the running state when the first node starts
executing on device, and completes once the last node finishes.

Since a graph could be divided up into multiple PI commands-buffers
that may have work scheduled in-between this could give a pessimistic
view of graph execution time, however I don't think we can guarantee
better.

In the future we could add explicit graph profiling nodes for a
more fine grained view of performance, but that's out of scope
for now.

Closes Issue #52
@EwanC EwanC self-assigned this Mar 10, 2023
EwanC added a commit that referenced this issue Mar 16, 2023
Elaborates on what the semantics are of queries on events
returned by executable graph submission.

The event enters the running state when the first node starts
executing on device, and completes once the last node finishes.

Since a graph could be divided up into multiple PI commands-buffers
that may have work scheduled in-between this could give a pessimistic
view of graph execution time, however I don't think we can guarantee
better.

In the future we could add explicit graph profiling nodes for a
more fine grained view of performance, but that's out of scope
for now.

Closes Issue #52
@EwanC EwanC closed this as completed Mar 16, 2023
reble pushed a commit that referenced this issue May 15, 2023
…callback

The `TypeSystemMap::m_mutex` guards against concurrent modifications
of members of `TypeSystemMap`. In particular, `m_map`.

`TypeSystemMap::ForEach` iterates through the entire `m_map` calling
a user-specified callback for each entry. This is all done while
`m_mutex` is locked. However, there's nothing that guarantees that
the callback itself won't call back into `TypeSystemMap` APIs on the
same thread. This lead to double-locking `m_mutex`, which is undefined
behaviour. We've seen this cause a deadlock in the swift plugin with
following backtrace:

```

int main() {
    std::unique_ptr<int> up = std::make_unique<int>(5);

    volatile int val = *up;
    return val;
}

clang++ -std=c++2a -g -O1 main.cpp

./bin/lldb -o “br se -p return” -o run -o “v *up” -o “expr *up” -b
```

```
frame #4: std::lock_guard<std::mutex>::lock_guard
frame #5: lldb_private::TypeSystemMap::GetTypeSystemForLanguage <<<< Lock #2
frame #6: lldb_private::TypeSystemMap::GetTypeSystemForLanguage
frame #7: lldb_private::Target::GetScratchTypeSystemForLanguage
...
frame #26: lldb_private::SwiftASTContext::LoadLibraryUsingPaths
frame #27: lldb_private::SwiftASTContext::LoadModule
frame #30: swift::ModuleDecl::collectLinkLibraries
frame #31: lldb_private::SwiftASTContext::LoadModule
frame #34: lldb_private::SwiftASTContext::GetCompileUnitImportsImpl
frame #35: lldb_private::SwiftASTContext::PerformCompileUnitImports
frame #36: lldb_private::TypeSystemSwiftTypeRefForExpressions::GetSwiftASTContext
frame #37: lldb_private::TypeSystemSwiftTypeRefForExpressions::GetPersistentExpressionState
frame #38: lldb_private::Target::GetPersistentSymbol
frame #41: lldb_private::TypeSystemMap::ForEach                 <<<< Lock #1
frame #42: lldb_private::Target::GetPersistentSymbol
frame #43: lldb_private::IRExecutionUnit::FindInUserDefinedSymbols
frame #44: lldb_private::IRExecutionUnit::FindSymbol
frame #45: lldb_private::IRExecutionUnit::MemoryManager::GetSymbolAddressAndPresence
frame #46: lldb_private::IRExecutionUnit::MemoryManager::findSymbol
frame #47: non-virtual thunk to lldb_private::IRExecutionUnit::MemoryManager::findSymbol
frame #48: llvm::LinkingSymbolResolver::findSymbol
frame #49: llvm::LegacyJITSymbolResolver::lookup
frame #50: llvm::RuntimeDyldImpl::resolveExternalSymbols
frame #51: llvm::RuntimeDyldImpl::resolveRelocations
frame #52: llvm::MCJIT::finalizeLoadedModules
frame #53: llvm::MCJIT::finalizeObject
frame #54: lldb_private::IRExecutionUnit::ReportAllocations
frame #55: lldb_private::IRExecutionUnit::GetRunnableInfo
frame #56: lldb_private::ClangExpressionParser::PrepareForExecution
frame #57: lldb_private::ClangUserExpression::TryParse
frame #58: lldb_private::ClangUserExpression::Parse
```

Our solution is to simply iterate over a local copy of `m_map`.

**Testing**

* Confirmed on manual reproducer (would reproduce 100% of the time
  before the patch)

Differential Revision: https://reviews.llvm.org/D149949
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Graph Specification Extension Specification related
Projects
None yet
Development

No branches or pull requests

1 participant