[pull] main from llvm:main #702

pull · 2025-11-15T05:51:04Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

@GeorgeHuyubo

Another attempt at resolving the deadlock issue @GeorgeHuyubo discovered (his previous [attempt](#160225)). This change can be summarized as the following: * Plumb through a boolean flag to force no preload in `GetOrCreateModules` all the way through to `LoadModuleAtAddress`. * Parallelize `Module::PreloadSymbols` separately from `Target::GetOrCreateModule` and its caller `LoadModuleAtAddress` (this is what avoids the deadlock). These changes roughly maintain the performance characteristics of the previous implementation of parallel module loading. Testing on targets with between 5000 and 14000 modules, I saw similar numbers as before, often more than 10% faster in the new implementation across multiple trials for these massive targets. I think it's because we have less lock contention with this approach. # The deadlock See [bt.txt](https://github.com/user-attachments/files/22524471/bt.txt) for a sample backtrace of LLDB when the deadlock occurs. As @GeorgeHuyubo explains in his [PR](#160225), the deadlock occurs from an ABBA deadlock that happens when a thread context-switches out of `Module::PreloadSymbols`, goes into `Target::GetOrCreateModule` for another module, possibly entering this block: ``` if (!module_sp) { // The platform is responsible for finding and caching an appropriate // module in the shared module cache. if (m_platform_sp) { error = m_platform_sp->GetSharedModule( module_spec, m_process_sp.get(), module_sp, &search_paths, &old_modules, &did_create_module); } else { error = Status::FromErrorString("no platform is currently set"); } } ``` `Module::PreloadSymbols` holds a module-level mutex, and then `GetSharedModule` *attempts* to hold the mutex of the global shared `ModuleList`. So, this thread holds the module mutex, and waits on the global shared `ModuleList` mutex. A competing thread may execute `Target::GetOrCreateModule`, enter the same block as above, grabbing the global shared `ModuleList` mutex. Then, in `ModuleList::GetSharedModule`, we eventually call `ModuleList::FindModules` which eventually waits for the `Module` mutex held by the first thread (via `Module::GetUUID`). Thus, we deadlock. ## Reproducing the deadlock It might be worth noting that I've never been able to observe this deadlock issue during live debugging (e.g. launching or attaching to processes), however we were able to consistently reproduce this issue with coredumps when using the following settings: ``` (lldb) settings set target.parallel-module-load true (lldb) settings set target.preload-symbols true (lldb) settings set symbols.load-on-demand false (lldb) target create --core /some/core/file/here # deadlock happens ``` ## How this change avoids this deadlock This change avoids concurrent executions of `Module::PreloadSymbols` with `Target::GetOrCreateModule` by waiting until after the `Target::GetOrCreateModule` executions to run `Module::PreloadSymbols` in parallel. This avoids the ordering of holding a Module lock *then* the ModuleList lock, as `Target::GetOrCreateModule` executions maintain the ordering of the shared ModuleList lock first (from what I've read and tested). ## Why not read-write lock? Some feedback in #160225 was to modify mutexes used in these components with read-write locks. This might be a good idea overall, but I don't think it would *easily* resolve this specific deadlock. `Module::PreloadSymbols` would probably need a write lock to Module, so even if we had a read lock in `Module::GetUUID` we would still contend. Maybe the `ModuleList` lock could be a read lock that converts to a write lock if it chooses to update the module, but it seems likely that some thread would try to update the shared module list and then the write lock would contend again. Perhaps with deeper architectural changes, we could fix this issue? # Other attempts One downside of this approach (and the former approach of parallel module loading) is that each DYLD would need to implement this pattern themselves. With @clayborg's help, I looked at a few other approaches: * In `Target::GetOrCreateModule`, backgrounding the `Module::PreloadSymbols` call by adding it directly to the thread pool via `Debugger::GetThreadPool().async()`. This required adding a lock to `Module::SetLoadAddress` (probably should be one there already) since `ObjectFileELF::SetLoadAddress` is not thread-safe (updates sections). Unfortunately, during execution, this causes the preload symbols to run synchronously with `Target::GetOrCreateModule`, preventing us from truly parallelizing the execution. * In `Module::PreloadSymbols`, backgrounding the `symtab` and `sym_file` `PreloadSymbols` calls individually, but similar issues as the above. * Passing a callback function like swiftlang#10746 instead of the boolean I use in this change. It's functionally the same change IMO, with some design tradeoffs: * Pro: the caller doesn't need to explicitly call `Module::PreloadSymbols` itself, and can instead call whatever function is passed into the callback. * Con: the caller needs to delay the execution of the callback such that it occurs after the `GetOrCreateModule` logic, otherwise we run into the same issue. I thought this would be trickier for the caller, requiring some kinda condition variable or otherwise storing the calls to execute afterwards. # Test Plan: ``` ninja check-lldb ``` --------- Co-authored-by: Tom Yang <toyang@fb.com>

…sCommutable Need to check if the non-copyable element is an instruction before actually trying to check its NSW attribute.

This is an alternative solution to the issue described in #167990, which can be summarized as that we cannot target Python 3.8 with the stable API and support building for Python 3.13 and later due to the buffer protocol. The approach taken in this PR, and proposed by Ismail, is to sidesteps the issue by dropping support for the buffer protocol. The only two users are SBFile::Read and SBFile::Write. Instead, we support PyBytes and PyByteArray which are the builtin types that conform to the buffer protocol. Technically, this means a small regression, where those methods could previously take custom types that conform to Python's buffer protocol. Like Ismail, I think this is acceptable given the alternatives. Co-authored-by: Med Ismail Bennani <ismail@bennani.ma>

docker-compose could not find the configuration. After this patch it should be able to.

We missed a handful of uses of the Python private API in the SWIG typemaps because they are included before we include the Python header that defines Py_LIMITED_API. This fixes that and guards the last private use on whether or not you're targeting the limited API. Unfortunately there doesn't appear to be an alternative, so we have to resort to being slightly less defensive.

To avoid scaling offsets back and forth. This is also what SelectionDAG equivalent (ComputeValueVTs) does, and will allow to reuse ComputeValueTypes with less effort.

…DWARF32 DWARF units in .dwp files in LLDB. (#167997) This patch is updating the reading capabilities of the LLDB DWARF parser for a llvm-dwp patch #167457 that will emit .dwp files where the compile units are DWARF32 and the .debug_str_offsets tables will be emitted as DWARF64 to allow .debug_str sections that exceed 4GB in size.

...according to Coding Guidelines: `[[nodiscard]]` should be applied to functions where discarding the return value is most likely a correctness issue. Changes to: - [x] `inout_ptr()` - [x] `out_ptr()` At the time of impelentation the `[[nodiscard]]` policy has not been established yet. --------- Co-authored-by: Hristo Hristov <zingam@outlook.com>

Since insns are always stored LE, on a BE system the opcodes will be loaded byte-reversed. Therefore, define two sets of opcodes, one for LE and one for BE.

Not all RegisterId values are registers, so Id is a more appropriate name. Use asMCReg() in some places that assumed it was a register.

See #167700 (comment)

Currently SpecialCaseList created at least twice, one on by `Driver`, for diagnostics only, and then the real one by the `ASTContext`. Also, deppending on enabled sanitizers, not all sections will be used. In both cases there is unnecessary RadixTree construction. This patch changes `GlobMatcher` to do initialization lazily only when needed. And remove empty one from `RegexMatcher`. This saves saves 0.5% of clang time building large project.

`getLanesWithProperty()` is called with virtual registers only.

Some cases are relying on SIFixSGPRCopies to force VALU reg_sequence inputs with SGPR inputs to use all VGPR inputs, but this doesn't always happen if the reg_sequence isn't invalid. Make sure we use a vgpr up-front here so we don't rely on something later.

As in title. Without this, fpext behaves in selectionDAG as always having no fast-math flags.

This PR improves the lowering of vectors of fp16 when using fpext. Previously vectors of fp16 were scalarized leading to lots of extra instructions. Now, vectors of fp16 will be lowered when extended to fp64 via the preexisting lowering logic for extends. To make use of the existing logic, we need to add elements until we reach the next power of 2.

Handle this for consistency with the zext case.

…168168) This probably should have turned into a regular integer constant earlier. This is to defend against future regressions.

zhyty and others added 19 commits November 14, 2025 15:58

[SLP]Check if the copyable element is a sub instruciton with abs in i…

326d4e9

…sCommutable Need to check if the non-copyable element is an instruction before actually trying to check its NSW attribute.

[libcxx][Github] Attempt to Fix libcxx Container Push

55f634c

docker-compose could not find the configuration. After this patch it should be able to.

[GlobalISel] Return byte offsets from computeValueLLTs (NFC) (#166747)

4eea157

To avoid scaling offsets back and forth. This is also what SelectionDAG equivalent (ComputeValueVTs) does, and will allow to reuse ComputeValueTypes with less effort.

[gn] port c29b29b (_LIBCPP_ASSERTION_SEMANTIC_DEFAULT)

5305a53

[Unwind][AArch64] Match sigreturn instructions in big endian (#167139)

6214dcc

Since insns are always stored LE, on a BE system the opcodes will be loaded byte-reversed. Therefore, define two sets of opcodes, one for LE and one for BE.

[RDF] Rename RegisterId field in RegisterRef Reg->Id. NFC (#168154)

5442aa1

Not all RegisterId values are registers, so Id is a more appropriate name. Use asMCReg() in some places that assumed it was a register.

[gn] Let tablegen() add root_build_dir to include search path

1f3e2c6

See #167700 (comment)

[AMDGPU] Delete some dead code (NFC) (#167891)

321a97e

`getLanesWithProperty()` is called with virtual registers only.

[SelectionDAGBuilder] Propagate fast-math flags to fpext (#167574)

e7b41df

As in title. Without this, fpext behaves in selectionDAG as always having no fast-math flags.

AMDGPU: Use vgpr to implement divergent i32->i64 anyext (#168167)

d8f6e10

Handle this for consistency with the zext case.

AMDGPU: Consider isVGPRImm when forming constant from build_vector (#…

9fecebf

…168168) This probably should have turned into a regular integer constant earlier. This is to defend against future regressions.

pull bot locked and limited conversation to collaborators Nov 15, 2025

pull bot added the ⤵️ pull label Nov 15, 2025

pull bot merged commit 9fecebf into optimizecompile:main Nov 15, 2025
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[pull] main from llvm:main #702

[pull] main from llvm:main #702

Uh oh!

pull bot commented Nov 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

[pull] main from llvm:main #702

[pull] main from llvm:main #702

Uh oh!

Conversation

pull bot commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

pull bot commented Nov 15, 2025 •

edited

Loading