Skip to content

Conversation

@medhatiwari
Copy link
Contributor

Add AVX512 kshiftli/kshiftri mask intrinsics to be used in constexpr.

Enables constexpr evaluation for:

  • _kshiftli_mask8/16/32/64
  • _kshiftri_mask8/16/32/64

Fixes #162056

@github-actions
Copy link

github-actions bot commented Dec 3, 2025

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:X86 clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:bytecode Issues for the clang bytecode constexpr interpreter labels Dec 3, 2025
@llvmbot
Copy link
Member

llvmbot commented Dec 3, 2025

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-backend-x86

Author: Medha Tiwari (medhatiwari)

Changes

Add AVX512 kshiftli/kshiftri mask intrinsics to be used in constexpr.

Enables constexpr evaluation for:

  • _kshiftli_mask8/16/32/64
  • _kshiftri_mask8/16/32/64

Fixes #162056


Full diff: https://github.com/llvm/llvm-project/pull/170480.diff

6 Files Affected:

  • (modified) clang/include/clang/Basic/BuiltinsX86.td (+6-6)
  • (modified) clang/lib/AST/ByteCode/InterpBuiltin.cpp (+22)
  • (modified) clang/lib/AST/ExprConstant.cpp (+22)
  • (modified) clang/test/CodeGen/X86/avx512bw-builtins.c (+17)
  • (modified) clang/test/CodeGen/X86/avx512dq-builtins.c (+9)
  • (modified) clang/test/CodeGen/X86/avx512f-builtins.c (+9)
diff --git a/clang/include/clang/Basic/BuiltinsX86.td b/clang/include/clang/Basic/BuiltinsX86.td
index b62928008dd2e..560f94ff2427e 100644
--- a/clang/include/clang/Basic/BuiltinsX86.td
+++ b/clang/include/clang/Basic/BuiltinsX86.td
@@ -3148,28 +3148,28 @@ let Features = "avx512bw", Attributes = [NoThrow, Const, Constexpr] in {
   def kxordi : X86Builtin<"unsigned long long int(unsigned long long int, unsigned long long int)">;
 }
 
-let Features = "avx512dq", Attributes = [NoThrow, Const] in {
+let Features = "avx512dq", Attributes = [NoThrow, Const, Constexpr] in {
   def kshiftliqi : X86Builtin<"unsigned char(unsigned char, _Constant unsigned int)">;
 }
 
-let Features = "avx512f", Attributes = [NoThrow, Const] in {
+let Features = "avx512f", Attributes = [NoThrow, Const, Constexpr] in {
   def kshiftlihi : X86Builtin<"unsigned short(unsigned short, _Constant unsigned int)">;
 }
 
-let Features = "avx512bw", Attributes = [NoThrow, Const] in {
+let Features = "avx512bw", Attributes = [NoThrow, Const, Constexpr] in {
   def kshiftlisi : X86Builtin<"unsigned int(unsigned int, _Constant unsigned int)">;
   def kshiftlidi : X86Builtin<"unsigned long long int(unsigned long long int, _Constant unsigned int)">;
 }
 
-let Features = "avx512dq", Attributes = [NoThrow, Const] in {
+let Features = "avx512dq", Attributes = [NoThrow, Const, Constexpr] in {
   def kshiftriqi : X86Builtin<"unsigned char(unsigned char, _Constant unsigned int)">;
 }
 
-let Features = "avx512f", Attributes = [NoThrow, Const] in {
+let Features = "avx512f", Attributes = [NoThrow, Const, Constexpr] in {
   def kshiftrihi : X86Builtin<"unsigned short(unsigned short, _Constant unsigned int)">;
 }
 
-let Features = "avx512bw", Attributes = [NoThrow, Const] in {
+let Features = "avx512bw", Attributes = [NoThrow, Const, Constexpr] in {
   def kshiftrisi : X86Builtin<"unsigned int(unsigned int, _Constant unsigned int)">;
   def kshiftridi : X86Builtin<"unsigned long long int(unsigned long long int, _Constant unsigned int)">;
 }
diff --git a/clang/lib/AST/ByteCode/InterpBuiltin.cpp b/clang/lib/AST/ByteCode/InterpBuiltin.cpp
index 12e5e7d142aa4..eec6ff66bfc02 100644
--- a/clang/lib/AST/ByteCode/InterpBuiltin.cpp
+++ b/clang/lib/AST/ByteCode/InterpBuiltin.cpp
@@ -4255,6 +4255,28 @@ bool InterpretBuiltin(InterpState &S, CodePtr OpPC, const CallExpr *Call,
           return APInt(sizeof(unsigned char) * 8, (A | B) == 0);
         });
 
+  case clang::X86::BI__builtin_ia32_kshiftliqi:
+  case clang::X86::BI__builtin_ia32_kshiftlihi:
+  case clang::X86::BI__builtin_ia32_kshiftlisi:
+  case clang::X86::BI__builtin_ia32_kshiftlidi:
+    return interp__builtin_elementwise_int_binop(
+        S, OpPC, Call, [](const APSInt &LHS, const APSInt &RHS) {
+          if (RHS.uge(LHS.getBitWidth()))
+            return APInt::getZero(LHS.getBitWidth());
+          return LHS.shl(RHS.getZExtValue());
+        });
+
+  case clang::X86::BI__builtin_ia32_kshiftriqi:
+  case clang::X86::BI__builtin_ia32_kshiftrihi:
+  case clang::X86::BI__builtin_ia32_kshiftrisi:
+  case clang::X86::BI__builtin_ia32_kshiftridi:
+    return interp__builtin_elementwise_int_binop(
+        S, OpPC, Call, [](const APSInt &LHS, const APSInt &RHS) {
+          if (RHS.uge(LHS.getBitWidth()))
+            return APInt::getZero(LHS.getBitWidth());
+          return LHS.lshr(RHS.getZExtValue());
+        });
+
   case clang::X86::BI__builtin_ia32_lzcnt_u16:
   case clang::X86::BI__builtin_ia32_lzcnt_u32:
   case clang::X86::BI__builtin_ia32_lzcnt_u64:
diff --git a/clang/lib/AST/ExprConstant.cpp b/clang/lib/AST/ExprConstant.cpp
index e707c10e5089c..5443e284beb9d 100644
--- a/clang/lib/AST/ExprConstant.cpp
+++ b/clang/lib/AST/ExprConstant.cpp
@@ -17054,6 +17054,28 @@ bool IntExprEvaluator::VisitBuiltinCallExpr(const CallExpr *E,
     return Success(Val, E);
   }
 
+  case X86::BI__builtin_ia32_kshiftliqi:
+  case X86::BI__builtin_ia32_kshiftlihi:
+  case X86::BI__builtin_ia32_kshiftlisi:
+  case X86::BI__builtin_ia32_kshiftlidi: {
+    return HandleMaskBinOp([](const APSInt &LHS, const APSInt &RHS) {
+      if (RHS.uge(LHS.getBitWidth()))
+        return APSInt(APInt::getZero(LHS.getBitWidth()), LHS.isUnsigned());
+      return APSInt(LHS.shl(RHS.getZExtValue()), LHS.isUnsigned());
+    });
+  }
+
+  case X86::BI__builtin_ia32_kshiftriqi:
+  case X86::BI__builtin_ia32_kshiftrihi:
+  case X86::BI__builtin_ia32_kshiftrisi:
+  case X86::BI__builtin_ia32_kshiftridi: {
+    return HandleMaskBinOp([](const APSInt &LHS, const APSInt &RHS) {
+      if (RHS.uge(LHS.getBitWidth()))
+        return APSInt(APInt::getZero(LHS.getBitWidth()), LHS.isUnsigned());
+      return APSInt(LHS.lshr(RHS.getZExtValue()), LHS.isUnsigned());
+    });
+  }
+
   case clang::X86::BI__builtin_ia32_vec_ext_v4hi:
   case clang::X86::BI__builtin_ia32_vec_ext_v16qi:
   case clang::X86::BI__builtin_ia32_vec_ext_v8hi:
diff --git a/clang/test/CodeGen/X86/avx512bw-builtins.c b/clang/test/CodeGen/X86/avx512bw-builtins.c
index fd50ab97d2c42..f67e88d63577c 100644
--- a/clang/test/CodeGen/X86/avx512bw-builtins.c
+++ b/clang/test/CodeGen/X86/avx512bw-builtins.c
@@ -556,6 +556,23 @@ __mmask64 test_kshiftri_mask64(__m512i A, __m512i B, __m512i C, __m512i D) {
   return _mm512_mask_cmpneq_epu8_mask(_kshiftri_mask64(_mm512_cmpneq_epu8_mask(A, B), 32), C, D);
 }
 
+TEST_CONSTEXPR(_kshiftli_mask32(0x00000001, 1) == 0x00000002);
+TEST_CONSTEXPR(_kshiftli_mask32(0x00000001, 31) == 0x80000000);
+TEST_CONSTEXPR(_kshiftli_mask32(0x00000001, 32) == 0x00000000);
+TEST_CONSTEXPR(_kshiftli_mask32(0x0000FFFF, 8) == 0x00FFFF00);
+TEST_CONSTEXPR(_kshiftri_mask32(0x80000000, 1) == 0x40000000);
+TEST_CONSTEXPR(_kshiftri_mask32(0x80000000, 31) == 0x00000001);
+TEST_CONSTEXPR(_kshiftri_mask32(0x80000000, 32) == 0x00000000);
+TEST_CONSTEXPR(_kshiftri_mask32(0xFFFF0000, 8) == 0x00FFFF00);
+TEST_CONSTEXPR(_kshiftli_mask64(0x0000000000000001ULL, 1) == 0x0000000000000002ULL);
+TEST_CONSTEXPR(_kshiftli_mask64(0x0000000000000001ULL, 63) == 0x8000000000000000ULL);
+TEST_CONSTEXPR(_kshiftli_mask64(0x0000000000000001ULL, 64) == 0x0000000000000000ULL);
+TEST_CONSTEXPR(_kshiftli_mask64(0x00000000FFFFFFFFULL, 16) == 0x0000FFFFFFFF0000ULL);
+TEST_CONSTEXPR(_kshiftri_mask64(0x8000000000000000ULL, 1) == 0x4000000000000000ULL);
+TEST_CONSTEXPR(_kshiftri_mask64(0x8000000000000000ULL, 63) == 0x0000000000000001ULL);
+TEST_CONSTEXPR(_kshiftri_mask64(0x8000000000000000ULL, 64) == 0x0000000000000000ULL);
+TEST_CONSTEXPR(_kshiftri_mask64(0xFFFFFFFF00000000ULL, 16) == 0x0000FFFFFFFF0000ULL);
+
 unsigned int test_cvtmask32_u32(__m512i A, __m512i B) {
   // CHECK-LABEL: test_cvtmask32_u32
   return _cvtmask32_u32(_mm512_cmpneq_epu16_mask(A, B));
diff --git a/clang/test/CodeGen/X86/avx512dq-builtins.c b/clang/test/CodeGen/X86/avx512dq-builtins.c
index b8d9587af0394..839ae7ab5d5ea 100644
--- a/clang/test/CodeGen/X86/avx512dq-builtins.c
+++ b/clang/test/CodeGen/X86/avx512dq-builtins.c
@@ -372,6 +372,15 @@ __mmask8 test_kshiftri_mask8(__m512i A, __m512i B, __m512i C, __m512i D) {
   return _mm512_mask_cmpneq_epu64_mask(_kshiftri_mask8(_mm512_cmpneq_epu64_mask(A, B), 2), C, D);
 }
 
+TEST_CONSTEXPR(_kshiftli_mask8(0x01, 1) == 0x02);
+TEST_CONSTEXPR(_kshiftli_mask8(0x01, 7) == 0x80);
+TEST_CONSTEXPR(_kshiftli_mask8(0x01, 8) == 0x00);
+TEST_CONSTEXPR(_kshiftli_mask8(0x0F, 2) == 0x3C);
+TEST_CONSTEXPR(_kshiftri_mask8(0x80, 1) == 0x40);
+TEST_CONSTEXPR(_kshiftri_mask8(0x80, 7) == 0x01);
+TEST_CONSTEXPR(_kshiftri_mask8(0x80, 8) == 0x00);
+TEST_CONSTEXPR(_kshiftri_mask8(0xF0, 2) == 0x3C);
+
 unsigned int test_cvtmask8_u32(__m512i A, __m512i B) {
   // CHECK-LABEL: test_cvtmask8_u32
   // CHECK: zext i8 %{{.*}} to i32
diff --git a/clang/test/CodeGen/X86/avx512f-builtins.c b/clang/test/CodeGen/X86/avx512f-builtins.c
index 49e606e4ee1cb..8cb859e94556f 100644
--- a/clang/test/CodeGen/X86/avx512f-builtins.c
+++ b/clang/test/CodeGen/X86/avx512f-builtins.c
@@ -9581,6 +9581,15 @@ __mmask16 test_kshiftri_mask16(__m512i A, __m512i B, __m512i C, __m512i D) {
   return _mm512_mask_cmpneq_epu32_mask(_kshiftri_mask16(_mm512_cmpneq_epu32_mask(A, B), 1), C, D);
 }
 
+TEST_CONSTEXPR(_kshiftli_mask16(0x0001, 1) == 0x0002);
+TEST_CONSTEXPR(_kshiftli_mask16(0x0001, 15) == 0x8000);
+TEST_CONSTEXPR(_kshiftli_mask16(0x0001, 16) == 0x0000);
+TEST_CONSTEXPR(_kshiftli_mask16(0x00FF, 4) == 0x0FF0);
+TEST_CONSTEXPR(_kshiftri_mask16(0x8000, 1) == 0x4000);
+TEST_CONSTEXPR(_kshiftri_mask16(0x8000, 15) == 0x0001);
+TEST_CONSTEXPR(_kshiftri_mask16(0x8000, 16) == 0x0000);
+TEST_CONSTEXPR(_kshiftri_mask16(0xFF00, 4) == 0x0FF0);
+
 unsigned int test_cvtmask16_u32(__m512i A, __m512i B) {
   // CHECK-LABEL: test_cvtmask16_u32
   // CHECK: bitcast <16 x i1> %{{.*}} to i16

@RKSimon RKSimon self-requested a review December 3, 2025 15:17
…cement

Signed-off-by: Medha Tiwari <medhatiwari@ibm.com>
@medhatiwari medhatiwari requested a review from RKSimon December 3, 2025 16:38
Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - cheers

@RKSimon RKSimon enabled auto-merge (squash) December 3, 2025 18:13
@RKSimon RKSimon merged commit 907c94b into llvm:main Dec 3, 2025
7 of 9 checks passed
@github-actions
Copy link

github-actions bot commented Dec 3, 2025

@medhatiwari Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

naveen-seth pushed a commit to naveen-seth/llvm-project that referenced this pull request Dec 3, 2025
commit 1054a6e9dee0198da0a3d234fd3254aa9e143319
Author: Florian Hahn <flo@fhahn.com>
Date:   Wed Dec 3 21:13:11 2025 +0000

    [SCEV] Handle non-constant start values in AddRec UDiv canonicalization. (#170474)

    Follow-up to https://github.com/llvm/llvm-project/pull/169576 to enable
    UDiv canonicalization if the start of the AddRec is not constant.

    The fold is not restricted to constant start values, as long as we are
    able to compute a constant remainder. The fold is only applied if the
    subtraction of the remainder can be folded into to start expression, but
    that is just to avoid creating more complex AddRecs.

    For reference, the proof from #169576 is
    https://alive2.llvm.org/ce/z/iu2tav

    PR: https://github.com/llvm/llvm-project/pull/170474

commit 095f8e07933636bba726e3a903f215ce9fc7e2dd
Author: Florian Hahn <flo@fhahn.com>
Date:   Wed Dec 3 21:06:36 2025 +0000

    [LV] Add more tests for finding the first-iv of argmin.

    Adds more test coverage for
    https://github.com/llvm/llvm-project/pull/170223.

commit 2fb2d7eb412f25fbe48f47a31b017a87d2398f8a
Author: Valentin Clement (バレンタイン クレメン) <clementval@gmail.com>
Date:   Wed Dec 3 13:05:28 2025 -0800

    [flang][cuda] Change how to handle static shared memory variables (#170388)

    Generate one global per static shared variable so the alignment can be
    set separately. Dynamic shared memory is unchanged.

commit d2accd386f3e9727309c97ecea1e22f11b617237
Author: Aiden Grossman <aidengrossman@google.com>
Date:   Wed Dec 3 13:02:02 2025 -0800

    [Github] Make issue-write workflow support reading from multiple files

    This is so that we can read from multiple files emitted by the premerge
    workflow.

    Reviewers: tstellar, cmtice

    Reviewed By: cmtice

    Pull Request: https://github.com/llvm/llvm-project/pull/170411

commit b13b41a891dbd4bb7b49f3bfec4ebe4f42983f58
Author: Vladislav Dzhidzhoev <vdzhidzhoev@accesssoftek.com>
Date:   Wed Dec 3 22:01:34 2025 +0100

    Revert "[LLDB] Add SBFrameExtensions Tests (#169236)" (#170555)

    This reverts commit 5e5937c3d2e493a48837b2bdf179a53e8b80a66a, since the
    added test fails on the `lldb-x86_64-win` buildbot.

    https://lab.llvm.org/buildbot/#/builders/211/builds/4246

commit 04c81a99735c04b2018eeb687e74f9860e1d0e1b
Author: Matt Arsenault <Matthew.Arsenault@amd.com>
Date:   Wed Dec 3 16:00:12 2025 -0500

    CodeGen: Add LibcallLoweringInfo analysis pass (#168622)

    The libcall lowering decisions should be program dependent,
    depending on the current module's RuntimeLibcallInfo. We need
    another related analysis derived from that plus the current
    function's subtarget to provide concrete lowering decisions.

    This takes on a somewhat unusual form. It's a Module analysis,
    with a lookup keyed on the subtarget. This is a separate module
    analysis from RuntimeLibraryAnalysis to avoid that depending on
    codegen. It's not a function pass to avoid depending on any
    particular function, to avoid repeated subtarget map lookups in
    most of the use passes, and to avoid any recomputation in the
    common case of one subtarget (and keeps it reusable across
    repeated compilations).

    This also switches ExpandFp and PreISelIntrinsicLowering as
    a sample function and module pass. Note this is not yet wired
    up to SelectionDAG, which is still using the LibcallLoweringInfo
    constructed inside of TargetLowering.

commit 5cbd294ca2390069181d984644dac6ca34b5e95c
Author: Aiden Grossman <aidengrossman@google.com>
Date:   Wed Dec 3 20:57:45 2025 +0000

    [Github] Fix issue-write workflow after #170216

    This changed the name of one of the outputs that issue-write used to
    control whether or not it ran. This patch should fix that.

commit 3d598c33350a6691807441666f9c5014c18aff39
Author: Koakuma <koachan@protonmail.com>
Date:   Thu Dec 4 03:38:48 2025 +0700

    [SPARC] Remove CCIfConsecutiveRegs for f128 returns (#170133)

    It appears that using it will result in callers mistakenly thinking that
    complex f128 returns is done the sret-way, when it should be returned in
    registers.

commit 58dd3a4fef51b11d4ea5f6c4f7c349589fb12255
Author: Aiden Grossman <aidengrossman@google.com>
Date:   Wed Dec 3 20:26:09 2025 +0000

    [Github] Also run test issue write when touching issue-write.yml

    We should actually run the test workflow when touching the workflow we
    are attempting to test.

commit 7b3ec5191a701dcebbf3c05a53b938ddd5f3c2d1
Author: Ramkumar Ramachandra <ramkumar.ramachandra@codasip.com>
Date:   Wed Dec 3 20:25:52 2025 +0000

    [VPlan] Consolidate logic for narrowToSingleScalars (NFCI) (#167360)

    The logic for narrowing to single scalar recipes is in two different
    places: narrowToSingleScalarRecipes and legalizeAndOptimizeInductions.
    Consolidate them.

commit 562d911857d9e050b002b9904d64d0f08bf4a762
Author: Aiden Grossman <aidengrossman@google.com>
Date:   Wed Dec 3 12:23:21 2025 -0800

    [Github] Make unprivileged-download-artifact download multiple artifacts

    This is designed to allow a workflow (e.g., premerge) upload comments
    across multiple jobs. Subsequent PRs will wire this up within the
    issue-write workflow to support reading comments from multiple files.

    Reviewers: tstellar, cmtice

    Reviewed By: cmtice

    Pull Request: https://github.com/llvm/llvm-project/pull/170216

commit 8f6e95ef45d20709f338b0753a362c172a51eff7
Author: Ahmed Nour <ahmednour.mohamed2012@gmail.com>
Date:   Wed Dec 3 22:19:54 2025 +0200

    [Clang][X86] Add constexpr support for permute4x64_pd and permute4x64_epi64 (#170442)

    This PR adds constexpr support for the AVX2 cross-lane permute
    intrinsics _mm256_permute4x64_pd and _mm256_permute4x64_epi64

    Resolves https://github.com/llvm/llvm-project/issues/169304

commit 6164b0785efcf6d9565cdcf42eada2187897e434
Author: Aiden Grossman <aidengrossman@google.com>
Date:   Wed Dec 3 12:17:54 2025 -0800

    [Github] Add workflow to test the issue write workflow

    This does not test most of the functionality (i.e., that writing to an
    existing comment still works), but does ensure that the plumbing works
    and things are not completely broken.

    Reviewers: tstellar, cmtice

    Reviewed By: cmtice

    Pull Request: https://github.com/llvm/llvm-project/pull/170209

commit c5e6f4e99d6a1d74614cdfd866cf0f81ecc43984
Author: Florian Hahn <flo@fhahn.com>
Date:   Wed Dec 3 20:14:58 2025 +0000

    [AArch64] Add unrolling test with -mcpu=apple-a17.

    Currently Apple unrolling preferences are not applied to apple-a17.

commit 43b69166e7df5f82c15b7536e61f251428df07af
Author: Aiden Grossman <aidengrossman@google.com>
Date:   Wed Dec 3 20:14:23 2025 +0000

    Revert "[clangd] Enable lit internal shell by default (#170186)"

    This reverts commit 671a8ce6bed475830ee9eb67cd3afb950e5a17e1.

    This stil broke the clangd-ubuntu-tsan bot. It seems like somehow the
    PATH variable is not getting propagated in the
    system-include-extractor.test test.

commit c656bf30e6fd84bbc2aa8d7b8bacf32ee7d13d09
Author: Aiden Grossman <aidengrossman@google.com>
Date:   Wed Dec 3 20:09:32 2025 +0000

    [Github] Add user of issue-write for #170209

    So that we can actually test the workflow before comitting into tree.

commit fc1e91112b8388ec684b8f59c5b03337331db8c2
Author: Charles Zablit <c_zablit@apple.com>
Date:   Wed Dec 3 21:14:05 2025 +0100

    [lldb] ensure comment conforms to LLVM guidelines (#170533)

    This patch is a follow up to
    https://github.com/llvm/llvm-project/pull/170471.

commit 671a8ce6bed475830ee9eb67cd3afb950e5a17e1
Author: Aiden Grossman <aidengrossman@google.com>
Date:   Wed Dec 3 12:05:39 2025 -0800

    Reapply "[clangd] Enable lit internal shell by default" (#170186)

    This reverts commit 4cfbc44ebe26692c209655c37aeb0b6cbf1d479b.

    This was failing due to a missing chmod binary on one of the bots
    (clangd-ubuntu-tsan). This patch fixes that by explicitly checking for
    the presence of a chmod binary. This should not be necessary (I have
    added a TODO for myself to update once I have looked at the bot setup
    which I am currently waiting on access to) as check-llvm works with
    requiring chmod unconditionally.

commit 4715e525648dde9abc50dfc93fa2cd3a67708cc7
Author: Fateme Hosseini <quic_fhossein@quicinc.com>
Date:   Wed Dec 3 14:05:07 2025 -0600

    [Hexagon] Add an option to use fast FP to int convert for some HVX cases (#169562)

    Lowering several flavors of fptosi for HVX can be done faster, but
    violates c/c++ convention on some arch tags. Nevertheless customers are
    using direct intrinsics with "incorrect" rounding mode and want compiler
    to do the same.

    Default behavior is not changed.

    Patch By: Fateme Hosseini

    Co-authored-by: Sergei Larin <slarin@codeaurora.org>
    Co-authored-by: Sergei Larin <slarin@qti.qualcomm.com>

commit 50916a4adc106e140fc389097aa21eb93ea2f798
Author: Florian Hahn <flo@fhahn.com>
Date:   Wed Dec 3 19:48:23 2025 +0000

    [VPlan] Use predicate in VPInstruction::computeCost for selects. (#170278)

    In some cases, the lowering a select depends on the predicate. If the
    condition of a select is a compare instruction, thread the predicate
    through to the TTI hook.

    PR: https://github.com/llvm/llvm-project/pull/170278

commit c5fa1f8c4bcc097ec8336bda8ef0b0a223abc2e6
Author: Valeriy Savchenko <vsavchenko@apple.com>
Date:   Wed Dec 3 19:34:21 2025 +0000

    [DAGCombiner] Handle type-promoted constants in UDIV lowering (#169491)

commit d041d5d4e07ba0eddd5120efd66520b3984a2b9b
Author: Daniel Thornburgh <dthorn@google.com>
Date:   Wed Dec 3 11:24:56 2025 -0800

    [clang] "modular_format" attribute for functions using format strings (#147431)

    This provides a C language `modular_format` attribute. This combines
    with information from the existing `format` to set the new IR
    `modular-format` attribute.

    The purpose of these attributes is to enable "modular printf". A
    statically linked libc can provide a modular variant of printf that only
    weakly references implementation routines. The link-time symbol `printf`
    would strongly reference aspect symbols (e.g. for float, fixed point,
    etc.) that are provided by those routines, restoring the status quo.
    However, the compiler could transform calls with constant format strings
    to calls to the modular printf instead, and at the same time, it would
    emit strong references to the aspect symbols that are needed to
    implement the format string. Then, the printf implementation would
    contain only the union of the aspects requested.

    See issue #146159 for context.

commit bdf90227abd55b24821b126a50ab89e49a39a2b5
Author: Jason Rice <ricejasonf@gmail.com>
Date:   Wed Dec 3 11:15:00 2025 -0800

    [MLIR] Test generated build functions with move-only parameter types (#170391)

    This adds a test of the MLIR TableGen `OpBuilder` syntax with move-only
    parameters types. Additionally, an overload is added to test defining a
    builder outside of the TableGen interface.

commit d7cc82b9c53fa03dd25f7ae9b8f07871a89e7b56
Author: Philip Reames <preames@rivosinc.com>
Date:   Wed Dec 3 11:06:40 2025 -0800

    [IndVars] Split NumElimCmp statistic into three pieces (#170514)

    Only one of the three update paths actual eliminates the comparison.

    While here, use early return to clarify the code structure.

commit 33a80a7d8e34b4448f7a3af64ba1ec3a56c1e553
Author: Charles Zablit <c_zablit@apple.com>
Date:   Wed Dec 3 19:59:45 2025 +0100

    [lldb][windows] fix a use before allocation crash (#170530)

commit 4ca61f56619c6ed2e4a1113682503bdb3da79b35
Author: Yonah Goldberg <ygoldberg@nvidia.com>
Date:   Wed Dec 3 10:58:30 2025 -0800

    [NFC][SROA] Clean up rewritePartition type selection process (#169106)

    This change reverts
    https://github.com/llvm/llvm-project/commit/257251247a267c3fa30fdeef17ffa4987d8a52e5,
    which landed on Aug 8, 2022. This change addressed the problem that if
    you have IR that looks something like:

    ```
    %alloca = alloca <4 x float>
    store <4 x float> %data, ptr %alloca
    %load = load half, ptr %alloca
    ```

    `getCommonType` would return `<4 x float>` because the `load half` isn't
    to the entire partition, so we skip the first `getTypePartition` check.
    https://github.com/llvm/llvm-project/commit/257251247a267c3fa30fdeef17ffa4987d8a52e5
    added a later check that sees that `<4 x float>` is not vector
    promotable because of the `load half`, and then calls
    `getTypePartition`, which changes the `sliceTy` to `< 8 x half>`, which
    is vector promotable because the store can be changed to `store <8 x
    half>`. So we set the `sliceTy` to `<8 x half>`, we can promote the
    alloca, and everyone is happy.

    This code became unnecessary after
    https://github.com/llvm/llvm-project/commit/529eafd9beff233ba8debfc73e0b5c04cac36835,
    which landed ~3 months later, which fixes the issue in a different way.
    `isVectorPromotionViable` was already smart enough to try `<8 x half>`
    as a type candidate because it sees the `load half`. However, this
    candidate didn't work because it conflicts with `store <4 x float>`.
    This commit added logic to try integer-ifying candidates if there is no
    common type. So the `<8 x half>` candidate gets converted to `<8 x
    i16>`, which works because we can convert the alloca to `alloca <8 x
    i16>` and the load to `load i16`, allowing promotion.

    After
    https://github.com/llvm/llvm-project/commit/529eafd9beff233ba8debfc73e0b5c04cac36835,
    the original commit is pointless. It tries to refine the `SliceTy`, but
    if `isVectorPromotionViable` succeeds, it returns a new type to use and
    we will ignore the `SliceTy`.

    This change is my first patch to try to simplify the type selection
    process in rewritePartition. I had some other ideas that I tried in
    https://github.com/llvm/llvm-project/pull/167771 and
    https://github.com/llvm/llvm-project/pull/168796, but they need
    refinement.

commit c2472be3fb359e640587f84ea668c98a2d86888b
Author: Simon Pilgrim <llvm-dev@redking.me.uk>
Date:   Wed Dec 3 18:40:48 2025 +0000

    [VectorCombine][X86] foldShuffleOfIntrinsics - provide the arguments to a getShuffleCost call (#170465)

    Ensure the arguments are passed to the getShuffleCost calls to improve
    cost analysis, in particular if these are constant the costs will be
    recognised as free

    Noticed while reviewing #170052

commit 907c94b3c2cc271a06afe9fe149d954578188c31
Author: Medha Tiwari <75640645+medhatiwari@users.noreply.github.com>
Date:   Thu Dec 4 00:08:40 2025 +0530

    [X86][Clang] Add constexpr support for AVX512 kshift intrinsics (#170480)

    Add AVX512 kshiftli/kshiftri mask intrinsics to be used in constexpr.

    Enables constexpr evaluation for:
    - `_kshiftli_mask8/16/32/64`
    - `_kshiftri_mask8/16/32/64`

    Fixes #162056

commit 7c33b8247d7ed0f8ff0e5ac8cc899ca3d6f8d183
Author: Krzysztof Parzyszek <Krzysztof.Parzyszek@amd.com>
Date:   Wed Dec 3 12:30:53 2025 -0600

    [flang][OpenMP] Move two functions to check-omp-loop.cpp, NFC (#170526)

    These are checks for clauses that apply to loop constructs.

commit 106edbdabef8bcd914ec1720f7fa6adb07aa4e6b
Author: Jacob Lalonde <jalalonde@fb.com>
Date:   Wed Dec 3 10:29:18 2025 -0800

    [LLDB] Fix deadlock in module callback when running in parallel (#168425)

    When the target is being created, the target list acquires the mutex for
    the duration of the target creation process. However if a module
    callback is enabled and is being called in parallel there exists an
    opportunity to deadlock if the callback calls into targetlist. I've
    created a minimum repro
    [here](https://gist.github.com/Jlalond/2557e06fa09825f338eca08b1d21884f).

    ```
    command script import dead-lock-example (from above gist)
    ...
    target create a.out
    [hangs]
    ```

    This looks like a straight forward fix, where `CreateTargetInternal`
    doesn't access any state directly, and instead calls methods which they
    themselves are thread-safe. So I've moved the lock to when we update the
    list with the created target. I'm not sure if this is a comprehensive
    fix, but it does fix my above example and in my (albeit limited)
    testing, doesn't cause any strange change in behavior.

commit a8ccd42ab23af6848929a638cd6b099953c7e491
Author: Tom Stellard <tstellar@redhat.com>
Date:   Wed Dec 3 10:27:28 2025 -0800

    workflows: Factor out artifact attestation and upload into a composite action (#169621)

    Also, switch the release-sources workflow over to use this new action.
    As a result of this change, the attestation file for the sources will be
    renamed from attestation.jsonl to $TAG-sources-attestation.jsonl.

commit 2221f4a06ec2409f7396ce4408442f115aca1ae0
Author: Jay Foad <jay.foad@amd.com>
Date:   Wed Dec 3 18:26:40 2025 +0000

    [AMDGPU] Add a RUN line to check VGPR MSBs for VOPD pairs (#170494)

    Some tests were added in #157168. This patch makes failures more obvious
    because they will hit an "Invalid VOPD pair was created" assertion
    during VGPR lowering.

commit 63ea3537d55f75be0d6fb92aef16465b291fa9ed
Author: Nathan Corbyn <n_corbyn@apple.com>
Date:   Wed Dec 3 18:17:54 2025 +0000

    [libunwind](TestOnly) Mark failing tests as unsupported on Apple targets (#170488)

    #167642 introduced a number of test failures on one of our stage 2
    builds:
    https://ci.swift.org/job/llvm.org/job/clang-stage2-Rthinlto/1403/. This
    PR marks these tests as unsupported on `.*-apple.*` targets.

commit 0006cd694f8640cb3820d16c9d49d1155b06cda6
Author: Jasmine Tang <jjasmine@igalia.com>
Date:   Wed Dec 3 18:08:40 2025 +0000

    [CIR] Upstream builtin scatter from ClangIR incubator (#170353)

    Part of [#167752](https://github.com/llvm/llvm-project/issues/167752)

commit 94232f9f560f84d2ae7f50b2d1df5bc26b2ce69e
Author: Jan André Reuter <j.reuter@fz-juelich.de>
Date:   Wed Dec 3 19:03:34 2025 +0100

    [OpenMP][OMPT] Use global thread id for `codeptr_ra` in `end_critical` (#169826)

    When a critical construct has finished, it will trigger a
    critical-released event. If a tool is attached, and the `mutex_released`
    callback was registered, the tool with receive an event containing the
    `codeptr_ra`, the return address of the callback invocation.

    All the way back in 82e94a593433f36734e2d34898d353a2ecb65b8b, this
    `codeptr_ra` was implemented by calling `__ompt_load_return_address`
    with a fixed global thread id of `0`. However, this approach results in
    a race-condition, and can yield incorrect results to the tool.

    `__ompt_load_return_address(0)` points to the current return address of
    the thread 0 in `__kmp_threads`. This thread may already execute some
    other construct. A tool might therefore receive the return address of
    e.g. some `libomp` internals, or other parts of the user code.
    Additionally, a call to `__ompt_load_return_address` resets the
    `th.ompt_thread_info.return_address` to `NULL`, therefore also affecting
    the return address of thread 0. Another dispatched event, e.g.
    parallel-begin might therefore not transfer any `codeptr_ra`.

    To fix this, replace the fixed thread id by the `global_tid`, which is
    stored just before dispatching the `mutex_released` callback.

    Signed-off-by: Jan André Reuter <j.reuter@fz-juelich.de>

commit 540fd18568deb299a35b009d34ce32f96e3944bd
Author: Matt Arsenault <Matthew.Arsenault@amd.com>
Date:   Wed Dec 3 13:01:21 2025 -0500

    DAG: Avoid using getLibcallName when looking for a divrem call (#170413)

    Also introduce an error if it's not available, which is not yet
    testable.

commit cdb501064f35dbe5a1d49721daf59eca261057e9
Author: Matt Arsenault <Matthew.Arsenault@amd.com>
Date:   Wed Dec 3 13:01:04 2025 -0500

    DAG: Avoid more uses of getLibcallName (#170402)

commit 8d6c5cddf245d652bb2badc065848d280ef8aa9f
Author: Matt Arsenault <Matthew.Arsenault@amd.com>
Date:   Wed Dec 3 13:00:45 2025 -0500

    DAG: Use LibcallImpl in various getLibFunc helpers (#170400)

    Avoid using getLibcallName in favor of querying the
    libcall impl, and getting the ABI details from that.

commit 14ed98271bb55cfb72ba1045fb1dec6c285a7456
Author: Charles Zablit <c_zablit@apple.com>
Date:   Wed Dec 3 18:42:46 2025 +0100

    [NFC][lldb][windows] refactor the creation of inherited handles (#170301)

    Co-authored-by: Saleem Abdulrasool <compnerd@compnerd.org>

commit 817ab49ece9b0ccafd9a01ad7bd910c102f161c2
Author: Andy Kaylor <akaylor@nvidia.com>
Date:   Wed Dec 3 09:30:16 2025 -0800

    [CIR][NFC] Add infrastructure for AArch64 builtins (#170386)

    This change adds the basic code structure for handling AArch64 builtins.
    The structure of this code is brought over from classic codegen to make
    implementing missing builtins easier. In some cases, the handling
    involved too much logic for a simple NFC change, so those parts were
    replaced with a MissingFeature assert.

    The actual handling for all builtins is left for later changes.

commit bd4c21b3c8a897e5ca467134d26ec6d831c8087a
Author: Mehdi Amini <joker.eph@gmail.com>
Date:   Thu Aug 21 08:04:12 2025 -0700

    [MLIR] Apply clang-tidy fixes for performance-move-const-arg in NVGPUTransformOps.cpp (NFC)

commit c379f7cc0151fdf39cca8bfaf65e701308c77de0
Author: Sang Ik Lee <sang.ik.lee@intel.com>
Date:   Wed Dec 3 09:22:18 2025 -0800

    [MLIR][XeGPU] Add integration with XeGPU load / store ops to / from memref subview. (#170385)

    Add XeGPU integration test for missing usage case: base memory from
    memref subview.

commit 70dd63b7804255daba4154c7cc5061c1072923f7
Author: Craig Topper <craig.topper@sifive.com>
Date:   Wed Dec 3 09:22:01 2025 -0800

    [RISCV] Move tuning features below non-tuning features. Put CPU family in their own section. NFC (#170352)

    We had 4 features after all the tuning features, but there didn't seem
    to be particular reason for it.

    Put the CPU family tuning features in their own section after the tuning
    features instead of in the middle.

commit 93832466cc40c142eb39d96876f98b49927c255b
Author: Sebastian Pop <spop@nvidia.com>
Date:   Wed Dec 3 11:19:56 2025 -0600

    [DA] Fix zero coeff bug in Strong SIV test with runtime assumptions (#149991) (#155037)

    Fix GitHub issue #149991 where Strong SIV test incorrectly concludes
    'none!' for symbolic coefficients that could be zero, leading to 0/0
    undef behavior.

    The Strong SIV test was incorrectly concluding "no dependence" when the
    coefficient is symbolic and the delta (difference between source and
    destination) is zero.

    When delta=0, the Strong SIV test divides delta/coeff to get the
    distance.
    The bug occurs when coeff is an unknown symbolic value: if coeff=0 at
    runtime,
    then 0/0 is undefined and all iterations access the same memory
    location,
    creating a true dependence that was being missed.

commit d18d53fda8755a6f29be00b9bf0a6672a85dd444
Author: Sebastian Pop <spop@nvidia.com>
Date:   Wed Dec 3 11:16:05 2025 -0600

    [DA] add testcases for bug #148435 (#154980)

    Add regression tests from issue #148435 .

commit 0ffabf4d084ffb40345c4660c2182b7067475df5
Author: Jan Svoboda <jan_svoboda@apple.com>
Date:   Wed Dec 3 09:11:43 2025 -0800

    [clang][deps] Use the caching VFS even in the 'preprocess' mode (#168970)

    The dependency scanner worker's VFS originally unconditionally did two
    things: file system access caching and dependency directives extraction.
    That's why `clang-scan-deps -mode preprocess` avoided using the VFS
    entirely. Since then, the dependency directives extraction was made
    lazy/on-demand/optional, meaning it should be possible to use only the
    caching parts of the VFS. This PR does exactly that, speeding up
    `clang-scan-deps -mode preprocess` on my config of Clang/LLVM from ~80s
    to ~38s. (For comparison, `clang-scan-deps -mode
    preprocess-dependency-directives` runs in ~13s.)

    (The real motivation was to simplify the VFS handling in the scanner,
    this is just a nice side-effect.)

commit 838ad0efbf57dfcd6c42c2c5497b30f26492e925
Author: Nicolai Hähnle <nicolai.haehnle@amd.com>
Date:   Wed Dec 3 09:11:05 2025 -0800

    AMDGPU: Generalize and normalize some tests to avoid future churn (#170508)

commit 836935197b8ff38459bb86c5aa592ef018311250
Author: Tarun Prabhu <tarun@lanl.gov>
Date:   Wed Dec 3 10:08:33 2025 -0700

    [flang][docs] Fix title and text in the release notes page

    The title of the release notes page always showed "|version|
    (In-Progress)". This has been fixed so the release version is shown as
    expected. '(In-Progress)' is now only shown on non-release branches.
    Unlike clang, flang does not use ${LLVM_VERSION_SUFFIX}, so even on
    non-release branches the 'git' suffix will not be shown.

commit 9f9e15f71553a2cfad040b87cb8e9a3ab5bee808
Author: Amr Hesham <amr96@programmer.net>
Date:   Wed Dec 3 17:17:56 2025 +0100

    [CIR] Upstream SizeOf for VariableArrayType (#169993)

    Upstream SizeOf support for VariableArrayType

commit c752bb9203954ebb2c6032d462e020fbbad4757e
Author: Philip Reames <preames@rivosinc.com>
Date:   Wed Dec 3 08:16:22 2025 -0800

    [IndVars] Strengthen inference of samesign flags (#170363)

    When reviewing another change, I noticed that we were failing to infer
    samsign for two cases: 1) an unsigned comparison, and 2) when both
    arguments were known negative.

    Using CVP and InstCombine as a reference, we need to be careful to not
    allow eq/ne comparisons. I'm a bit unclear on the why of that, and for
    now am going with the low risk change. I may return to investigate that
    in a follow up.

    Compile time results look like noise to me, see:
    https://llvm-compile-time-tracker.com/compare.php?from=49a978712893fcf9e5f40ac488315d029cf15d3d&to=2ddb263604fd7d538e09dc1f805ebc30eb3ffab0&stat=instructions:u

commit ec6a15f84db135186f5075e15146c7f2ec532d3a
Author: Folkert de Vries <folkert@folkertdev.nl>
Date:   Wed Dec 3 17:04:16 2025 +0100

    [X86] optimize masked truncated saturating stores (#169827)

    Combine the saturating operation into the masked truncating store.

    https://godbolt.org/z/n1YfavKP6

    ```asm
    _mm256_mask_cvtusepi16_storeu_epi8_manual: # @_mm256_mask_cvtusepi16_storeu_epi8_manual
            kmovd   k1, esi
            vmovdqa ymm0, ymmword ptr [rdx]
            vpminuw ymm0, ymm0, ymmword ptr [rip + .LCPI0_0]
            vpmovwb xmmword ptr [rdi] {k1}, ymm0
            vzeroupper
            ret
    _mm256_mask_cvtusepi16_storeu_epi8_intrinsic: # @_mm256_mask_cvtusepi16_storeu_epi8_intrinsic
            kmovd   k1, esi
            vmovdqa ymm0, ymmword ptr [rdx]
            vpmovuswb       xmmword ptr [rdi] {k1}, ymm0
            vzeroupper
            ret
    ```

commit bd21095d8ba0bff04f5718096601638ecf9270db
Author: Hongyu Chen <xxs_chy@outlook.com>
Date:   Wed Dec 3 23:55:59 2025 +0800

    [MachineBasicBlock] Don't split loop header successor if the terminator is unanalyzable (#170146)

    Fixes https://github.com/llvm/llvm-project/issues/170051
    The previous implementation allows splitting the successor if it's the
    loop header, regardless of whether the terminator of `this` is
    analyzable.

commit 58d74febfa3958f7d870c9dca35eb20264c211e8
Author: Bertik23 <39457484+Bertik23@users.noreply.github.com>
Date:   Wed Dec 3 16:50:55 2025 +0100

    [SupportLSP] Add ShowMessageParams (#164626)

    Adds ShowMessageParams to LSP support according to the [LSP
    specification](https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#showMessageRequestParams).

commit eb7db0b9ecceed9719f841cc789ecaa6d5c9aaef
Author: Charitha Saumya <136391709+charithaintc@users.noreply.github.com>
Date:   Wed Dec 3 07:48:00 2025 -0800

    [mlir][xegpu] Change `index` arithmetic ops to `arith` ops. (#170390)

    Index ops cause some issues during SIMT distribution because they don't
    have the `Elementwise` mappable trait. This PR replaces all index
    arithmetic ops with matching `arith` dialect ops.

commit 267865a7b54dd84dc22f147623ec281d34bf7a3f
Author: Philip Reames <preames@rivosinc.com>
Date:   Wed Dec 3 07:31:29 2025 -0800

    [SCEV] Factor out utility for proving same sign of two SCEVs [nfc] (#170376)

    This is a slightly different API than ConstantRange's
    areInsensitiveToSignednessOfICmpPredicate. The only actual difference
    (beyond naming) is the handling of empty ranges (i.e. unreachable code).
    I wanted to keep the existing SCEV behavior for the unreachable code as
    we should be folding that to poison, not reasoning about samesign. I
    tried the other variant locally, and saw no test changes.

commit ccd4e7b1ed3858c64b4667787929b939513bc929
Author: John Brawn <john.brawn@arm.com>
Date:   Wed Dec 3 15:28:46 2025 +0000

    [LSR] Make OptimizeLoopTermCond able to handle some non-cmp conditions (#165590)

    Currently OptimizeLoopTermCond can only convert a cmp instruction to
    using a postincrement induction variable, which means it can't handle
    predicated loops where the termination condition comes from
    get_active_lane_mask. Relax this restriction so that we can handle any
    kind of instruction, though only if it's the instruction immediately
    before the branch (except for possibly an extractelement).

commit c128fd9bebf7d281ac7cf12d8258573e8928672b
Author: Oleksandr T. <oleksandr.tarasiuk@outlook.com>
Date:   Wed Dec 3 17:24:33 2025 +0200

    [Clang] prevent crash on invalid nested name specifiers with a single colon (#169246)

    Fixes #167905

    ---

    This patch addresses an issue where invalid nested name specifier
    sequences containing a single colon (`a:c::`) could be treated during
    recovery as valid scope specifiers, which in turn led to a crash

    https://github.com/llvm/llvm-project/blob/c543615744d61e0967b956c402e310946d741570/clang/lib/Parse/ParseExprCXX.cpp#L404-L418

    For malformed inputs like `a:c::`, the single colon recovery incorrectly
    triggers and produces an `annot_cxxscope`. When tentative parsing later
    runs

    https://github.com/llvm/llvm-project/blob/996213c6ea0dc2e47624c6b06c0833a882c1c1f7/clang/lib/Parse/ParseTentative.cpp#L1739-L1740

    the classifier returns `Ambiguous`, which doesn't stop parsing. The
    parser then enters the

    https://github.com/llvm/llvm-project/blob/996213c6ea0dc2e47624c6b06c0833a882c1c1f7/clang/lib/Parse/ParseTentative.cpp#L1750-L1752

    and consumes the invalid scope annotation, eventually reaching `EOF` and
    crashing.

commit d0f5a49fb6f3604dbb7d6692ad0f81ed1cdf3a86
Author: Nikita Popov <npopov@redhat.com>
Date:   Wed Dec 3 16:21:47 2025 +0100

    [Support] Support debug counters in non-assertion builds (#170468)

    This enables the use of debug counters in (non-assertion) release
    builds. This is useful to enable debugging without having to switch to
    an assertion-enabled build, which may not always be easy.

    After some recent improvements, always supporting debug counters no
    longer has measurable overhead.

commit 5ab8c3a590681b557b117827f8cfcded6dd72015
Author: Sohaib Iftikhar <sohaib1692@gmail.com>
Date:   Wed Dec 3 16:05:25 2025 +0100

    [LLDB|BUILD] Fix for c50802cb (#170484)

    Fix after #170236

commit 4c09e45f1d54730bd1e50efdca8df5c768558376
Author: Mehdi Amini <joker.eph@gmail.com>
Date:   Thu Aug 21 11:03:23 2025 -0700

    [MLIR] Apply clang-tidy fixes for llvm-qualified-auto in OpenMPToLLVMIRTranslation.cpp (NFC)

commit 45b697e610fd24b4114d78f9d7819fa5e9461371
Author: Björn Pettersson <bjorn.a.pettersson@ericsson.com>
Date:   Wed Dec 3 15:45:26 2025 +0100

    [MemoryBuiltins] Consider index type size when aggregating gep offsets (#132365)

    [MemoryBuiltins] Consider index type size when aggregating gep offsets
    Main goal here is to fix some bugs seen with LowerConstantIntrinsics
    pass and the lowering of llvm.objectsize.

    In ObjectSizeOffsetVisitor::computeImpl we are using an external
    analysis together with stripAndAccumulateConstantOffsets. The idea
    is to compute the Min/Max value of individual offsets within a GEP.
    The bug solved here is that when doing the Min/Max comparisons the
    external analysis wasn't considering the index type size (given by
    the data layout), it was simply using the type from the IR. Since a
    GEP is defined as sext/truncating indices we need to consider the
    index type size in the external analysis.

    This solves a regression (false ubsan warnings) seen after commit

    https://github.com/llvm/llvm-project/commit/02b8ee281947f6cb39c7eb3c4bbba59322e9015b
    (https://github.com/llvm/llvm-project/pull/117849).

commit 045331e4a035fa5dd4e91db03c5c7d6335443c03
Author: Mehdi Amini <joker.eph@gmail.com>
Date:   Thu Aug 21 13:29:40 2025 -0700

    [MLIR] Apply clang-tidy fixes for performance-unnecessary-value-param in SymbolTableTest.cpp (NFC)

commit b1d06058a39579cfc6ea48c496a1f63f023c5cb5
Author: Oleksandr T. <oleksandr.tarasiuk@outlook.com>
Date:   Wed Dec 3 16:20:12 2025 +0200

    [Clang] adjust caret placement for the suggested attribute location for enum class (#168092)

    Fixes #163224

    ---

    This patch addresses the issue by correcting the caret insertion
    location for attributes incorrectly positioned before an enum. The
    location is now derived from the associated `EnumDecl`: for named enums,
    the attribute is placed before the identifier, while for anonymous enum
    definitions, it is placed before the opening brace, with a fallback to
    the semicolon when no brace is present.

    For example:

    ```cpp
      [[nodiscard]] enum class E1 {};
    ```

    is now suggested as:

    ```cpp
      enum class [[nodiscard]] E1 {};
    ```

commit be3204a59d53f1e44080b99813fb69db0672b5d1
Author: sstwcw <su3e8a96kzlver@posteo.net>
Date:   Wed Dec 3 14:14:01 2025 +0000

    [clang-format] Ignore C++ keywords when formatting Verilog (#167984)

    In the sample below, the `private` identifier is the name of the type,
    and the `try` identifier is the name of the variable.

    new

    ```SystemVerilog
    begin
      private try;
    end
    ```

    old

    ```SystemVerilog
    begin
    private
      try
        ;
    end
    ```

commit 75c85bafb830e5a7bd7fda13d2648180538ff513
Author: sstwcw <su3e8a96kzlver@posteo.net>
Date:   Wed Dec 3 14:13:42 2025 +0000

    [clang-format] Continue aligned lines without parentheses (#167979)

    before, with the options `AlignConsecutiveDeclarations` and
    `AlignConsecutiveAssignments` enabled

    ```C++
    veryverylongvariablename = somethingelse;
    shortervariablename      = anotherverylonglonglongvariablename + //
                          somevariablethatwastoolongtofitonthesamerow;

    double i234 = 0;
    auto   v    = false ? type{}
                        : type{
                         1,
                     };
    ```

    after

    ```C++
    veryverylongvariablename = somethingelse;
    shortervariablename      = anotherverylonglonglongvariablename + //
                               somevariablethatwastoolongtofitonthesamerow;

    double i234 = 0;
    auto   v    = false ? type{}
                        : type{
                              1,
                          };
    ```

    Fixes #126873.

    Fixes #57612.

    Previously, the part for determining whether aligning a line should move
    the next line relied on having a pair of tokens such as parentheses
    surrounding both lines. There are often no such tokens. For example in
    the first block above. This patch removes the requirement for those
    tokens.

    Now the program keeps track of how the position is calculated. The
    alignment step moves the next line if its position is based on a column
    to the right of the token that gets aligned.

    The column that the position of the line is based on is more detailed
    than the `IsAligned` property that the program used before this patch.
    It enables the program to handle cases where parts that should not
    usually move with the previous line and parts that should are nested
    like in the second block above. That is why the patch uses it instead of
    fake parentheses.

commit f83f6f565f408c8d24ff024146a002f6a1ea77c7
Author: Mehdi Amini <joker.eph@gmail.com>
Date:   Wed Dec 3 14:59:34 2025 +0100

    Fix lit testing to support standalone testing (#170365)

    To be able to test lit without having a configuration of LLVM, we need
    to support invocations that are not going through the lit.site.cfg and
    thus don't have a llvm_config set-up.

commit cb5362a43329c0e9747e1d63202b00d461db4831
Author: Krzysztof Parzyszek <Krzysztof.Parzyszek@amd.com>
Date:   Wed Dec 3 07:35:33 2025 -0600

    [flang][OpenMP] Rename OmpLoopRangeClause to OmpLooprangeClause, NFC (#170370)

    The convention is to change spelling from snake_case to UpperCamel, and
    use the result as a stem in derived names, e.g.
    - spelling is "some_clause" -> stem is SomeClause
    - spelling is "someclause" -> stem is Someclause

    Member of the OmpClause variant is <stem> itself, e.g. Looprange as in
    parser::OmpClause::Looprange.

    Specific clause class name is Omp<stem>Clause, e.g. OmpLooprangeClause.

commit 21d006c4828a2f547e861c23549796834a377d2b
Author: Eugene Epshteyn <eepshteyn@nvidia.com>
Date:   Wed Dec 3 08:29:43 2025 -0500

    [flang] Support kind/index lookup inside of EQUIVALENCE (#170056)

    Turn off "in EQUIVALENCE" check for processing of array subscripts,
    since subscripts themselves are not part of the EQUIVALENCE.

    Fixes #169590

commit 00c8e615e30a6f38698b7bb7e426f83abb8b5798
Author: Lukacma <Marian.Lukac@arm.com>
Date:   Wed Dec 3 12:55:19 2025 +0000

    [AArch64] Add bitcasts for lowering saturating add/sub and shift intrinsics.  (#161840)

    This is followup patch to #157680 . In this patch, we are adding
    explicit bitcasts to floating-point type when lowering saturating
    add/sub and shift NEON scalar intrinsics using SelectionDAG, so they can
    be picked up by patterns added in first part of this series. To do that,
    we have to create new nodes for these intrinsics, which operate on
    floating-point types and wrap them in bitcast nodes.

commit 8b94997a475192d0e519d03cf009f5c51d6a389e
Author: Charles Zablit <c_zablit@apple.com>
Date:   Wed Dec 3 13:53:17 2025 +0100

    [lldb][windows] fix invalid corefile error message (#170471)

commit 2fc12754009b835f00dd8b604096b68bad96e3c1
Author: Pengcheng Wang <wangpengcheng.pp@bytedance.com>
Date:   Wed Dec 3 20:42:12 2025 +0800

    [RISCV] Fix corner cases after #170070 (#170438)

    There are two fixes:

    1. Clear kill flags for `FalseReg` in foldVMergeToMask or we can't
    pass the MachineVerifier because of using a killed virtual register.
    2. Restrict `lookThruCopies` to only look through COPYs with
    one non-debug use.

    This was found when backporting #170070 to 21.x branch.

commit 6af1c3f3a927497081d114f202501667cbbf80c2
Author: Yingwei Zheng <dtcxzyw2333@gmail.com>
Date:   Wed Dec 3 20:37:30 2025 +0800

    [ValueTracking] Support scalable vector splats in computeKnownBits (#170345)

    Similar to https://github.com/llvm/llvm-project/pull/170325, this patch
    adds support for scalable vector splats in computeKnownBits.

commit 2e87463603171a61713c9b9c3c07fc90b31a555e
Author: Nathan Corbyn <n_corbyn@apple.com>
Date:   Wed Dec 3 12:15:39 2025 +0000

    [Clang] Fix `PPChainedCallbacks::EmbedFileNotFound()` (#170293)

    We've had internal test failures since #166188 landed. The root cause is
    that `PPChainedCallbacks::EmbedFileNotFound()` incorrectly calls
    `PPCallbacks::FileNotFound()` not `PPCallbacks::EmbedFileNotFound()`.

commit 09efb48991dd86ed6a2db89a3eb126aff7337090
Author: LLVM GN Syncbot <llvmgnsyncbot@gmail.com>
Date:   Wed Dec 3 12:12:15 2025 +0000

    [gn build] Port e9bda498e6a0

commit e9bda498e6a061354b3a3e97c29b93e775d721d3
Author: Ebuka Ezike <yerimyah1@gmail.com>
Date:   Wed Dec 3 12:09:23 2025 +0000

    [lldb] add libstdcpp span formatter (#168705)

commit e947139f082f16c654e6536a90221e15bc0fc96c
Author: Benjamin Maxwell <benjamin.maxwell@arm.com>
Date:   Wed Dec 3 12:06:03 2025 +0000

    [SDAG] Disable illegal extract_subvector splitting for scalable vectors (#170315)

    The "half spanning" legalization of extract_subvector is only valid for
    fixed-length vectors. This patch disables it for scalable vectors and
    makes more careful use of ElementCount in the lowering.

    Fixes regression from https://github.com/llvm/llvm-project/pull/154101,
    which was encountered here:
    https://github.com/llvm/llvm-project/pull/166748#issuecomment-3600498185

    Note: We could optimize this case given the known vscale, but this patch
    only attempts to fix the miscompile.

commit 22d354a2f25e3817ab2e9816eff43fc7ad4de472
Author: Hamza Hassanain <53662962+HamzaHassanain@users.noreply.github.com>
Date:   Wed Dec 3 13:59:37 2025 +0200

    [X86][Clang] Support constexpr evaluation of cvtpd2ps intrinsics (#169980)

    This patch implements constant evaluation support for the following X86
    intrinsics:
    - _mm_cvtpd_ps, _mm256_cvtpd_ps (Packed Double to Float)
    - _mm_cvtsd_ss (Scalar Double to Float merge)
    - Masked variants of the above

    It implements the strict "Exact and Finite" rule: conversions that are
    inexact, infinite, or NaN are rejected in constexpr contexts.

    Fixes #169370

commit d68f5432532bb2bb641258b9f9236f0eba53c4fd
Author: Baranov Victor <bar.victor.2002@gmail.com>
Date:   Wed Dec 3 14:35:09 2025 +0300

    [clang-tidy] Remove 'clang-analyzer-*' checks from default checks. (#157306)

    Closes https://github.com/llvm/llvm-project/issues/146482.

commit 4497c53298a6121dae51da490b3c228beb053e89
Author: Timm Baeder <tbaeder@redhat.com>
Date:   Wed Dec 3 12:21:36 2025 +0100

    [clang][bytecode] Accept current PC argument in Function::dump() (#170449)

    This is useful since we can highlight the opcode that OpPC points to.

commit dd9a516e0eb3b3a55890adbdc2221e70a3bf7719
Author: LLVM GN Syncbot <llvmgnsyncbot@gmail.com>
Date:   Wed Dec 3 11:14:16 2025 +0000

    [gn build] Port aeb36a925234

commit 0dcbc870ed9baa54dc7c46e483d40a26dff28f96
Author: Med Ismail Bennani <ismail@bennani.ma>
Date:   Wed Dec 3 03:10:49 2025 -0800

    [lldb/docs] Add ScriptingFrameProvider documentation to the website

    This patch adds the documentation for ScriptedFrameProviders to the
    lldb website.

    Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>

commit 4286a474b476e300079efa127d084593e833b1d6
Author: Med Ismail Bennani <ismail@bennani.ma>
Date:   Wed Dec 3 03:10:22 2025 -0800

    Revert "[lldb/docs] Add ScriptingFrameProvider documentation to the website"

    This reverts commit bfde296d081098605bdad0e4487c4bad9ca19c95.

commit 1ca763b76423a17a2101a4579b5d74bade4f0ce4
Author: Martin Storsjö <martin@martin.st>
Date:   Wed Dec 3 13:09:14 2025 +0200

    [llvm-readobj] [ARMWinEH] Fix printing of packed unwind with H=1, RegI=RegF=0, CR!=1 (#170294)

    In these cases, there are no other GPRs or float registers that would
    have been backed up before the register homing area, that would have
    allocated space on the stack for the saved registers.

    Normally, the register homing part of the prologue consists of 4 nop
    unwind codes. However, if we haven't allocated stack space for those
    arguments yet, there's no space to store them in. The previous printout,
    printing "stp x0, x1, [sp, #-N]!" wouldn't work when interpreted as a
    nop unwind code.

    Based on "dumpbin -unwindinfo", and from empirical inspection with
    RtlVirtualUnwind, it turns out that the homing of argument registers is
    done outside of the prologue. In these cases, "dumpbin -unwindinfo"
    prints an annotation "(argument registers homed post-prolog)".

    Adjust the printout accordingly. In these cases, the later stack
    allocation (either "stp x29, x30, [sp, #-LocSZ]! or "sub sp, sp,
    #LocSZ") is adjusted to include the space the homed registers (i.e. be
    the full size from FrameSize).

commit 6822e3c91b5df96ea980c94655a5d547c5f510b8
Author: Simon Pilgrim <llvm-dev@redking.me.uk>
Date:   Wed Dec 3 11:07:26 2025 +0000

    [VectorCombine][X86] Add tests showing failure to push a shuffle through a fma with multiple constants (#170458)

    Despite 2 of the 3 arguments of the fma intrinsics calls being constant
    (free shuffle), foldShuffleOfIntrinsics fails to fold the shuffle
    through

commit aeb36a92523427b63466555d92b35bd3aa26ee40
Author: Lang Hames <lhames@gmail.com>
Date:   Wed Dec 3 22:03:23 2025 +1100

    [ORC] Port CallableTraitsHelper from the new ORC runtime. (#170441)

    The code for this commit was taken with minimal modification to fit LLVM
    style from llvm-project/orc-rt/include/CallableTraitsHelper.h and
    llvm-project/orc-rt/unittests/CallableTraitsHelperTest.cpp (originally
    commited in 40fce325011)

    CallableTraitsHelper identifies the return type and argument types of a
    callable type and passes those to an implementation class template to
    operate on. E.g. the CallableArgInfoImpl class exposes these types as
    typedefs.

    Porting CallableTraitsHelper from the new ORC runtime will allow us to
    simplify existing and upcoming "callable-traits" classes in ORC.

commit c0f0936f5a47270d47486f6d5860b5f8e30e0e32
Author: Felipe de Azevedo Piovezan <fpiovezan@apple.com>
Date:   Wed Dec 3 10:51:27 2025 +0000

    [lldb] Fix ThreadPlanStepOut::DoPlanExplainsStop inspection of BreakpointSite (#169799)

    Suppose two threads are performing the exact same step out plan. They
    will both have an internal breakpoint set at their parent frame. Now
    supposed both of those breakpoints are in the same address (i.e. the
    same BreakpointSite).

    At the end of `ThreadPlanStepOut::DoPlanExplainsStop`, we see this:

    ```
    // If there was only one owner, then we're done.  But if we also hit
    // some user breakpoint on our way out, we should mark ourselves as
    // done, but also not claim to explain the stop, since it is more
    // important to report the user breakpoint than the step out
    // completion.

    if (site_sp->GetNumberOfConstituents() == 1)
      return true;
    ```

    In other words, the plan looks at the name number of constituents of the
    site to decide whether it explains the stop, the logic being that a
    _user_ might have put a breakpoint there. However, the implementation is
    not correct; in particular, it will fail in the situation described
    above. We should only care about non-internal breakpoints that would
    stop for the current thread.

    It is tricky to test this, as it depends on the timing of threads, but I
    was able to consistently reproduce the issue with a swift program using
    concurrency.

    rdar://165481473

commit bfde296d081098605bdad0e4487c4bad9ca19c95
Author: Med Ismail Bennani <ismail@bennani.ma>
Date:   Wed Dec 3 02:46:01 2025 -0800

    [lldb/docs] Add ScriptingFrameProvider documentation to the website

    This patch adds the documentation for ScriptedFrameProviders to the
    lldb website.

    Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>

commit 4e4763a8a4659dc252429a003c613f762d5a1083
Author: Felipe de Azevedo Piovezan <fpiovezan@apple.com>
Date:   Wed Dec 3 10:43:48 2025 +0000

    [lldb] Handle backwards branches in UnwindAssemblyInstEmulation (#169633)

    This allows the unwinder to handle code with mid-function epilogues
    where the subsequent code is reachable through a backwards branch.

    Two changes are required to accomplish this:

    1. Do not enqueue the subsequent instruction if the current instruction
       is a barrier(*).
    2. When processing an instruction, stop ignoring branches with negative
       offsets.

    (*) As per the definition in LLVM's MC layer, a barrier is any
    instruction that "stops control flow from executing the instruction
    immediately following it". See `MCInstrDesc::isBarrier` in MCInstrDesc.h

    Part of a sequence of PRs:
    [lldb][NFCI] Rewrite UnwindAssemblyInstEmulation in terms of a CFG visit
    #169630
    [lldb][NFC] Rename forward_branch_offset to branch_offset in
    UnwindAssemblyInstEmulation #169631
    [lldb] Add DisassemblerLLVMC::IsBarrier API #169632
    [lldb] Handle backwards branches in UnwindAssemblyInstEmulation #169633

    commit-id:fd266c13

commit 49774448d69b55f5c46aef2147b45537fd61276a
Author: mitchell <mitchell.xu2@gmail.com>
Date:   Wed Dec 3 18:41:45 2025 +0800

    [clang-tidy] Fix false positive in `readability-redundant-typename` (#170034)

    Closes #169166

    ---------

    Co-authored-by: Victor Chernyakin <chernyakin.victor.j@outlook.com>

commit f17abc280c708c16f622be2de2ab7d0710cc8bc1
Author: Gil Rapaport <gil.rapaport@mobileye.com>
Date:   Wed Dec 3 12:41:29 2025 +0200

    [mlir][emitc] Add address-of and dereference ops (#72569)

    EmitC currently models C's `&` and `*` operators via its `apply` op,
    which has several drawbacks:

    - Its pre-lvalue semantics combines dereferencing with memory access.

    - Representing multiple opcodes (selected by an attribute) in a single
    op complicates the code by adding a second, attribute-based selection
    layer on top of MLIR's standard `isa<>` mechanism.

    This patch adds two distinct, lvalue-based ops to model these C operators.
    EmitC passes were converted to use the new ops instead of `apply`, which
    is now deprecated.

commit 2697c8cb459c1705f6c3a60c908462ca099e657f
Author: Fabian Ritter <fabian.ritter@amd.com>
Date:   Wed Dec 3 11:13:52 2025 +0100

    [LowerMemIntrinsics] Factor control flow generation out of the memcpy lowering (#169039)

    So far, memcpy with known size, memcpy with unknown size, memmove with known
    size, and memmove with unknown size have individual optimized loop lowering
    implementations, while memset and memset.pattern use an unoptimized loop
    lowering. This patch extracts the parts of the memcpy lowerings (for known and
    unknown sizes) that generate the control flow for the loop expansion into an
    `insertLoopExpansion` function. The `createMemCpyLoop(Unk|K)nownSize` functions
    then only collect the necessary arguments for `insertLoopExpansion`, call it,
    and fill the generated loop basic blocks.

    The immediate benefit of this is that logic from the two memcpy lowerings is
    deduplicated. Moreover, it enables follow-up patches that will use
    `insertLoopExpansion` to optimize the memset and memset.pattern implementations
    similarly to memcpy, since they can use the exact same control flow patterns.

    The test changes are due to more consistent and useful basic block names in the
    loop expansion and an improvement in basic block ordering: previously, the
    basic block that determines if the residual loop is executed would be put at
    the end of the function, now it is put before the residual loop body.
    Otherwise, the generated code should be equivalent.

    This patch doesn't affect memmove; deduplicating its logic would also be nice,
    but to extract all CF generation from the memmove lowering,
    `insertLoopExpansion` would need to be able to also create code that iterates
    backwards over the argument buffers. That would make `insertLoopExpansion` a
    lot more complex for a code path that's only used for memmove, so it's probably
    not worth refactoring.

    For SWDEV-543208.

commit 8feb6762ba9fb83f8e13ef9486c3b743e1b5cfa7
Author: Pierre van Houtryve <pierre.vanhoutryve@amd.com>
Date:   Wed Dec 3 10:37:58 2025 +0100

    [AMDGPU] Take BUF instructions into account in mayAccessScratchThroughFlat (#170274)

    BUF instructions can access the scratch address space, so
    SIInsertWaitCnt needs to be able
    to track the SCRATCH_WRITE_ACCESS event for such BUF instructions.

    The release-vgprs.mir test had to be updated because BUF instructions
    w/o a MMO are now
    tracked as a SCRATCH_WRITE_ACCESS. I added a MMO that touches global to
    keep the test result unchanged. I also added a couple of testcases with no MMO to test the corrected behavior.

commit 5ccf8c90d1e4020d5f9bc255fe521aa0763f2b2b
Author: Tom Eccles <tom.eccles@arm.com>
Date:   Wed Dec 3 09:36:22 2025 +0000

    [flang] implement VECTOR VECTORLENGTH directive (#170114)

    This should match exactly the llvm attributes generated by classic
    flang.

commit 114ca6522e4ea425115adb778c39fd89745a6853
Author: Shih-Po Hung <shihpo.hung@sifive.com>
Date:   Wed Dec 3 17:24:40 2025 +0800

    [TTI] Use MemIntrinsicCostAttributes for getStridedOpCost (#170436)

    - Following #168029. This is a step toward a unified interface for
    masked/gather-scatter/strided/expand-compress cost modeling.
    - Replace the ad-hoc parameter list with a single attributes object.

    API change:
    ```
    - InstructionCost getStridedMemoryOpCost(unsigned Opcode, Type *DataTy, const Value *Ptr,
                                                                           bool VariableMask, Align Alignment,
                                                                           TTI::TargetCostKind CostKind,
                                                                           const Instruction *I = nullptr);
    + InstructionCost getStridedMemoryOpCost(MemIntrinsicCostAttributes,
    +                                                                      CostKind);
    ```

    Notes:
    - NFCI intended: callers populate MemIntrinsicCostAttributes with same
    information as before.

commit 9296223b28029095c1e734ba9373b9bcfc853d7b
Author: mitchell <mitchell.xu2@gmail.com>
Date:   Wed Dec 3 17:22:34 2025 +0800

    [clang-tidy] Fix `cppcoreguidelines-pro-type-member-init` check (#169832)

    Closes [#169677](https://github.com/llvm/llvm-project/issues/169677)

    ---------

    Co-authored-by: EugeneZelenko <eugene.zelenko@gmail.com>

commit 2b725ab8bf08b0bde29910ec4fa1c610eaaffa63
Author: Felipe de Azevedo Piovezan <fpiovezan@apple.com>
Date:   Wed Dec 3 09:08:05 2025 +0000

    [lldb] Add DisassemblerLLVMC::IsBarrier API (#169632)

    This will allow the instruction emulation unwinder to reason about
    instructions that prevent the subsequent instruction from executing.

    Part of a sequence of PRs:
    [lldb][NFCI] Rewrite UnwindAssemblyInstEmulation in terms of a CFG visit
    #169630
    [lldb][NFC] Rename forward_branch_offset to branch_offset in
    UnwindAssemblyInstEmulation #169631
    [lldb] Add DisassemblerLLVMC::IsBarrier API #169632
    [lldb] Handle backwards branches in UnwindAssemblyInstEmulation #169633

    commit-id:bb5df4aa

commit 7cdb27a4b3757879446596d6f042f87b5119c638
Author: Aaditya <115080342+easyonaadit@users.noreply.github.com>
Date:   Wed Dec 3 14:36:25 2025 +0530

    [NFC][AMDGPU] Refactor wave reduce test files (#170440)

    Separate out float wave-reduce intrinsic tests from the overloaded call.
    Moved float add/sub/min/max ops from:
    `llvm.amdgcn.reduce.add/sub/min/max` to
    `llvm.amdgcn.reduce.fadd/fsub/fmin/fmax`.

commit 8b7a07a5f7e7b2a96417665f807cbf79a3161a76
Author: Ebuka Ezike <yerimyah1@gmail.com>
Date:   Wed Dec 3 08:55:11 2025 +0000

    [lldb]  Fix abi_tag parsing for operator<< and operator-named tags (#170224)

    The parser now correctly handles:
    - abi_tags attached to operator<<: `operator<<[abi:SOMETAG]`
    - abi_tags with "operator" as the tag name: `func[abi:operator]`

commit 4b0a9759395f3e9cbefa9c194ca331f4d88003bf
Author: Hongyu Chen <xxs_chy@outlook.com>
Date:   Wed Dec 3 16:53:25 2025 +0800

    [OpenCL][NVPTX] Don't set calling convention for OpenCL kernel (#170170)

    Fixes #154772
    We previously set `ptx_kernel` for all kernels. But it's incorrect to
    add `ptx_kernel` to the stub version of kernel introduced in #115821.
    This patch copies the workaround of AMDGPU.

commit 6638d59c972512d45da474c214abc67ec3cfe333
Author: Felipe de Azevedo Piovezan <fpiovezan@apple.com>
Date:   Wed Dec 3 08:31:34 2025 +0000

    [lldb][NFC] Rename forward_branch_offset to branch_offset in UnwindAssemblyInstEmulation (#169631)

    This will reduce the diff in subsequent patches

    Part of a sequence of PRs:
    [lldb][NFCI] Rewrite UnwindAssemblyInstEmulation in terms of a CFG visit
    #169630
    [lldb][NFC] Rename forward_branch_offset to branch_offset in
    UnwindAssemblyInstEmulation #169631
    [lldb] Add DisassemblerLLVMC::IsBarrier API #169632
    [lldb] Handle backwards branches in UnwindAssemblyInstEmulation #169633
    commit-id:5e758a22

commit c5ecdec9fb84e6865fe44f69e380afa1291c2adf
Author: Ebuka Ezike <yerimyah1@gmail.com>
Date:   Wed Dec 3 08:30:35 2025 +0000

    [lldb-dap] start all sent protocol message from number one. (#170378)

    This aligns with the DAP
    [specification](https://microsoft.github.io/debug-adapter-protocol//specification.html#Base_Protocol_ProtocolMessage)

    Force it to be an error in test cases.

commit cd86b2ab32bb2c444fb48e41a40f43c80a7eaeae
Author: Vikash Gupta <Vikash.Gupta@amd.com>
Date:   Wed Dec 3 13:56:15 2025 +0530

    [CodeGen] Add MO_LaneMask type and a new COPY_LANEMASK instruction (#151944)

    Introduce MO_LaneMask as new machine operand type. This can be used to
    hold liveness infomation at sub-register granularity for register-type
    operands. We also introduce a new COPY_LANEMASK instruction that uses
    MO_lanemask operand to perform partial copy from source register
    opernad.

    One such use case of MO_LaneMask can be seen in #151123, where it can be
    used to store live regUnits information corresponding to the source
    register of the COPY instructions, later can be used during CopyPhysReg
    expansion.

commit ae4289f0e6e1bf61f45f88870aec220c9164800b
Author: Shih-Po Hung <shihpo.hung@sifive.com>
Date:   Wed Dec 3 16:24:56 2025 +0800

    [Hexagon][NFC] Drop no-op getMaskedMemoryOpCost/getGatherScatterOpCost stubs (#170426)

    These stubs (from 4bdf1aa416b02) don’t actually override anything.
    Removing them eliminates the need for a local getMemIntrinsicCost()
    forwarder in #169885.

commit befa4e85e4fab6a109203903a2fbeb979164cd2e
Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
Date:   Wed Dec 3 00:14:32 2025 -0800

    [AMDGPU] Avoid undefs in hazard-gfx1250-flat-scr-hi.mir. NFC (#170396)

commit 5ee6cff90ba5d8e08066eeeef0c27aa0b6f24d2c
Author: Jonas Hahnfeld <jonas.hahnfeld@cern.ch>
Date:   Wed Dec 3 09:10:56 2025 +0100

    [clang] Propagate definition data to all redecls (#170090)

    Fix the propagation added in commit 0d490ae55f to include all redecls,
    not only previous ones. This fixes another instance of the assertion
    "Cannot get layout of forward declarations" in getASTRecordLayout().

    Kudos to Alexander Kornienko for providing an initial version of the
    reproducer that I further simplified.

    Fixes #170084

commit 98182f4d209ded292cb6030f45bcae132096acae
Author: Sven van Haastregt <sven.vanhaastregt@arm.com>
Date:   Wed Dec 3 08:58:31 2025 +0100

    Move CodeGenFunction::EmitScalarOrConstFoldImmArg; NFC (#170286)

    This function is called from various .cpp files under `TargetBuiltins/`,
    and was moved unintentionally into `AMDGPU.cpp` in PR #132252. Move it
    to a common place.

commit e6110cb3395b22a941cba4726c9e36308e5b5613
Author: Matthias Springer <me@m-sp.org>
Date:   Wed Dec 3 08:35:05 2025 +0100

    [mlir][Transforms] Fix crash in `-remove-dead-values` on private functions (#169269)

    This commit fixes two crashes in the `-remove-dead-values` pass related
    to private functions.

    Private functions are considered entirely "dead" by the liveness
    analysis, which drives the `-remove-dead-values` pass.

    The `-remove-dead-values` pass removes dead block arguments from private
    functions. Private functions are entirely dead, so all of their block
    arguments are removed. However, the pass did not correctly update all
    users of these dropped block arguments.

    1. A side-effecting operation must be removed if one of its operands is
    dead. Otherwise, the operation would end up with a NULL operand. Note:
    The liveness analysis would not have marked an SSA value as "dead" if it
    had a reachable side-effecting users. (Therefore, it is safe to erase
    such side-effecting operations.)
    2. A branch operation must be removed if one of its non-forwarded
    operands is dead. (E.g., the condition value of a `cf.cond_br`.)
    Whenever a terminator is removed, a `ub.unrechable` operation is
    inserted. This fixes #158760.

commit 30f479fa2b08d6e480939a57384996f7a276eb91
Author: Henrich Lauko <xlauko@mail.muni.cz>
Date:   Wed Dec 3 08:20:50 2025 +0100

    [CIR] Use default attribute printer/parser (NFC) (#170366)

commit 042a38f0bfe5c9f49df5d4cb5e23092e512c9fbe
Author: Nikita Popov <npopov@redhat.com>
Date:   Wed Dec 3 07:55:06 2025 +0100

    [Support] Optimize DebugCounter (#170305)

    Currently, DebugCounters work by creating a unique counter ID during
    registration, and then using that ID to look up the counter information
    in the global registry.

    However, this means that anything working with counters has to always go
    through the global instance. This includes the fast path that checks
    whether any counters are enabled.

    Instead, we can drop the counter IDs, and make the counter variables use
    CounterInfo themselves. We can then directly check whether the specific
    counter is active without going through the global registry. This is
    both faster for the fast-path where all counters are disabled, and also
    faster for the case where only one counter is active (as the fast-path
    can now still be used for all the disabled counters).

    After this change, disabled counters become essentially free at runtime,
    and we should be able to enable them in non-assert builds as well.

commit 9f634c6777701794a6ed5577857ffb8f202513b8
Author: Jianjian Guan <jacquesguan@me.com>
Date:   Wed Dec 3 14:19:20 2025 +0800

    [RISCV][GISel] Fix legalize G_EXTRACT_SUBVECTOR (#169877)

    Fix wrong mask type that used by G_VSLIDEDOWN_VL.

commit 689b3cc7c700b1687cf4aaaf4ef2c81a4e988917
Author: Jinjie Huang <huangjinjie@bytedance.com>
Date:   Wed Dec 3 14:08:20 2025 +0800

    [clang] Support header shadowing diagnostics in Clang header search (#162491)

    When including a header file, multiple files with the same name may
    exist across different search paths, like:
     &emsp;&emsp;|-- main.cpp
    &emsp;&emsp;|-- **header.h**
    &emsp;&emsp;|-- include
    &emsp;&emsp;|&emsp; └── **header.h**
    The compiler usually picks the first match it finds (typically following
    MSVC rules for current/include-chain paths first, then regular -I
    paths), which may not be the user’s intended header.
    This silent behavior can lead to subtle runtime API mismatches or
    increase the cost of resolving errors such as “error: use of undeclared
    identifier”, especially in large projects.

    Therefore, this patch tries to provide a diagnostic message without
    changing the current header selection. It does this by performing an
    additional search for duplicate filenames across all search paths (both
    MSVC rules and standard paths). This informs the user about a potential
    "header shadowing" issue and clarifies which header path was actually
    used.

    Since header searching is much cheaper than file loading, the added
    overhead should be within an acceptable range -- assuming the diagnostic
    message is valuable.

commit 73036cf9113b4748d4fbb28037e8714ff2486238
Author: Baranov Victor <bar.victor.2002@gmail.com>
Date:   Wed Dec 3 09:02:50 2025 +0300

    [clang-tidy][NFC] Fix miscellaneous clang-tidy warnings (#170424)

commit d05370e6863e28fcf988b8491dc583fcf5e4e1be
Author: Baranov Victor <bar.victor.2002@gmail.com>
Date:   Wed Dec 3 08:56:24 2025 +0300

    [clang-tidy][NFC] Enable readability-any-all-of check (#167134)

    Closes https://github.com/llvm/llvm-project/issues/156161.
    Assisted-by: Claude Sonnet 4.5 via Claude Code

commit 822fc449985553c609e44915374f935672c0db50
Author: Rajat Bajpai <rbajpai@nvidia.com>
Date:   Wed Dec 3 10:49:17 2025 +0530

    [LLVM][Intrinsics] Adds an API to automatically resolve overload types (#169007)

    Currently, the getOrInsertDeclaration API requires callers to explicitly
    provide overload types for overloaded intrinsics, placing a significant
    burden on callers who must determine whether overload types are needed.
    This typically results in conditional logic at each call site to check
    if the intrinsic is overloaded and manually match the intrinsic
    signature.

    This patch introduces a new getOrInsertDeclaration overload that
    automatically deduces overload types from the provided return type and
    argument types, then uses this API to simplify
    IRBuilder::CreateIntrinsic. The new API uses
    Intrinsic::matchIntrinsicSignature internally to resolve overloaded
    t…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:X86 clang:bytecode Issues for the clang bytecode constexpr interpreter clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[X86][Clang] VectorExprEvaluator::VisitCallExpr / InterpretBuiltin - add AVX512 kshift intrinsics to be used in constexp

3 participants