Skip to content

Conversation

@pull
Copy link

@pull pull bot commented Nov 18, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

nico and others added 24 commits November 17, 2025 18:57
LDS block size should be 2048 bytes (512 dwords) based on current spec.
)"

This reverts commit bde9062.

This caused failures on Darwin that were not caught by upstream
buildbots. Reverting for now to give myself some time to fix.
There seem to be cases where the workflow status is completed but the
jobs have not completed. We need to gracefully handle these changes to
avoid a crash loop in the metrics container.
Arm64EC indirect calls use a function __os_arm64x_check_icall... this
has one obvious return value, x11, which is the function to call.
However, it actually returns one other important value: x9, which is the
final destination for the emulator after the call. If the call is
calling x64 code, x9 is used by the thunk.

Previously, we didn't model this, and it mostly worked because the
compiler usually doesn't modify x9 in the narrow window between the
check, and the call. That said, it can happen in some cases; one
reliable way is to do an indirect tail-call with stack protectors
enabled. (You can also just get unlucky with register allocation, but
it's harder to write a testcase for that.)

This patch uses the cfguardtarget bundle to simplify the calling
convention handling, for similar reasons that x64 uses it: modifying
arbitrary calls is difficult without a separate marking.

Fixes #167430.
Add documentation about CMAKE_OSX_SYSROOT so that folks bringing up on
OSX can have a clean test run.
These APIs are MachO specific, and the interfaces are about to be
extended to support more MachO-specific behavior. For now it makes sense
to group them with other MachO specific APIs in MachO.h.
This patch adds the posix function `inet_addr`. Since most of the
parsing logic is delegated to `inet_aton`, I have only included some
basic smoke tests for testing purposes.
When scanning an interface source (dylib or TBD file), consider
"fallback" architectures (CPUType / CPUSubType pairs) in addition to the
process's CPUType / CPUSubType.

Background:

When dyld loads a dylib into a process it may load dylib or slice whose
CPU type / subtype isn't an exact match for the process's CPU type /
subtype. E.g. arm64 processes can load arm64e dylibs / slices.

When building an interface we need to follow the same logic, otherwise
we risk generating a spurious "does not contain a compatible slice"
error. E.g. If we're running an arm64 JIT'd program and loading an
interface from a TBD file, and if no arm64 slice is present in that
file, then we should fall back to looking for an arm64e slice.

rdar://164510783
This patch makes objc-imageinfo.S work with the internal shell. The test
uses a subshell to temporarily change the directory. The internal shell
does not support subshells, so this construct was replaced with a
pushd/popd sequence.
It's not reachable because the custom parser will accept or fail the
whole instruction.
https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant

---------

Co-authored-by: Hristo Hristov <zingam@outlook.com>
Co-authored-by: Nikolas Klauser <nikolasklauser@berlin.de>
This is a small code size optimization that lets us avoid both shifting
and comparing to a constant if we need the shifted value anyway. On most
architectures the zero comparison is cheaper than a constant comparison
(or free if the shift sets flags).

Although this change appears to remove the optimization entirely, we
continue to do this transform if there is one use because of the code
below the removed code that transforms the shift into an and, followed
by the PR10267 case in InstCombinerImpl::foldICmpAndConstConst that
transforms the and into a ult/ugt. Added a test case to verify this
explicitly.

Per [1] reduces clang .text size by 0.09% and dynamic instruction count
by 0.01%.

[1] https://llvm-compile-time-tracker.com/compare.php?from=1f38d49ebe96417e368a567efa4d650b8a9ac30f&to=0873787a12b8f2eab019d8211ace4bccc1807343&stat=size-text

Reviewers: nikic, dtcxzyw

Reviewed By: dtcxzyw

Pull Request: #168007
#167836)

This change fixes the SM requirement of the f32 to tf32 conversion with
`rna` rounding mode and `.satfinite` modifier. The current requirement
specified is `sm_89` but this conversion is supported from `sm_80`
onwards after it was added in PTX 8.1.

PTX Spec Reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cvt
Add pass options to run lowerings to NVVM without pattern rollback. This
makes the dialect conversions easier to debug and improves
performance/memory usage.
@pull pull bot locked and limited conversation to collaborators Nov 18, 2025
@pull pull bot added the ⤵️ pull label Nov 18, 2025
@pull pull bot merged commit 951ab04 into optimizecompile:main Nov 18, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.