Skip to content

Conversation

@adams381
Copy link
Contributor

This PR migrates support for wide string literals from the incubator to upstream.

Changes

  • Implement wide string literal support in getConstantArrayFromStringLiteral
  • Handle wchar_t, char16_t, and char32_t string literals
  • Collect code units and create constant arrays with IntAttr elements
  • Use ZeroAttr for null-filled strings

Testing

  • Copied wide-string.cpp test file from incubator
  • Expanded test to include wchar_t test cases (incubator only had char16_t and char32_t)
  • All tests pass

Implement support for wide string literals (wchar_t, char16_t, char32_t)
in getConstantArrayFromStringLiteral. This migrates the feature from
the incubator to upstream.

The implementation handles wide character string literals by:
- Collecting code units using getCodeUnit()
- Creating constant arrays with IntAttr elements
- Using ZeroAttr for null-filled strings

Add test file wide-string.cpp copied from incubator, expanded with
wchar_t test cases.
@llvmbot llvmbot added clang Clang issues not falling into any other category ClangIR Anything related to the ClangIR project labels Dec 10, 2025
@llvmbot
Copy link
Member

llvmbot commented Dec 10, 2025

@llvm/pr-subscribers-clangir

Author: None (adams381)

Changes

This PR migrates support for wide string literals from the incubator to upstream.

Changes

  • Implement wide string literal support in getConstantArrayFromStringLiteral
  • Handle wchar_t, char16_t, and char32_t string literals
  • Collect code units and create constant arrays with IntAttr elements
  • Use ZeroAttr for null-filled strings

Testing

  • Copied wide-string.cpp test file from incubator
  • Expanded test to include wchar_t test cases (incubator only had char16_t and char32_t)
  • All tests pass

Full diff: https://github.com/llvm/llvm-project/pull/171541.diff

2 Files Affected:

  • (modified) clang/lib/CIR/CodeGen/CIRGenModule.cpp (+34-3)
  • (added) clang/test/CIR/CodeGen/wide-string.cpp (+40)
diff --git a/clang/lib/CIR/CodeGen/CIRGenModule.cpp b/clang/lib/CIR/CodeGen/CIRGenModule.cpp
index 41a5d9db83e2b..c0dcd3f55f328 100644
--- a/clang/lib/CIR/CodeGen/CIRGenModule.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenModule.cpp
@@ -21,7 +21,9 @@
 #include "clang/AST/GlobalDecl.h"
 #include "clang/AST/RecordLayout.h"
 #include "clang/Basic/SourceManager.h"
+#include "clang/CIR/Dialect/IR/CIRAttrs.h"
 #include "clang/CIR/Dialect/IR/CIRDialect.h"
+#include "clang/CIR/Dialect/IR/CIRTypes.h"
 #include "clang/CIR/Interfaces/CIROpInterfaces.h"
 #include "clang/CIR/MissingFeatures.h"
 
@@ -31,6 +33,8 @@
 #include "mlir/IR/MLIRContext.h"
 #include "mlir/IR/Verifier.h"
 
+#include <algorithm>
+
 using namespace clang;
 using namespace clang::CIRGen;
 
@@ -960,9 +964,36 @@ CIRGenModule::getConstantArrayFromStringLiteral(const StringLiteral *e) {
     return builder.getString(str, eltTy, finalSize);
   }
 
-  errorNYI(e->getSourceRange(),
-           "getConstantArrayFromStringLiteral: wide characters");
-  return mlir::Attribute();
+  auto arrayTy = mlir::dyn_cast<cir::ArrayType>(convertType(e->getType()));
+  assert(arrayTy && "string literals must be emitted as an array type");
+
+  auto arrayEltTy = mlir::dyn_cast<cir::IntType>(arrayTy.getElementType());
+  assert(arrayEltTy &&
+         "string literal elements must be emitted as integral type");
+
+  auto arraySize = arrayTy.getSize();
+  auto literalSize = e->getLength();
+
+  // Collect the code units.
+  SmallVector<uint32_t, 32> elementValues;
+  elementValues.reserve(arraySize);
+  for (unsigned i = 0; i < literalSize; ++i)
+    elementValues.push_back(e->getCodeUnit(i));
+  elementValues.resize(arraySize);
+
+  // If the string is full of null bytes, emit a #cir.zero instead.
+  if (std::all_of(elementValues.begin(), elementValues.end(),
+                  [](uint32_t x) { return x == 0; }))
+    return cir::ZeroAttr::get(arrayTy);
+
+  // Otherwise emit a constant array holding the characters.
+  SmallVector<mlir::Attribute, 32> elements;
+  elements.reserve(arraySize);
+  for (uint64_t i = 0; i < arraySize; ++i)
+    elements.push_back(cir::IntAttr::get(arrayEltTy, elementValues[i]));
+
+  auto elementsAttr = mlir::ArrayAttr::get(&getMLIRContext(), elements);
+  return builder.getConstArray(elementsAttr, arrayTy);
 }
 
 bool CIRGenModule::supportsCOMDAT() const {
diff --git a/clang/test/CIR/CodeGen/wide-string.cpp b/clang/test/CIR/CodeGen/wide-string.cpp
new file mode 100644
index 0000000000000..9f145d022a943
--- /dev/null
+++ b/clang/test/CIR/CodeGen/wide-string.cpp
@@ -0,0 +1,40 @@
+// RUN: %clang_cc1 -std=c++17 -triple x86_64-unknown-linux-gnu -fclangir -emit-cir %s -o %t.cir
+// RUN: FileCheck --input-file=%t.cir %s
+
+const char16_t *test_utf16() {
+  return u"你好世界";
+}
+
+// CHECK: cir.global "private" constant cir_private dso_local @{{.+}} = #cir.const_array<[#cir.int<20320> : !u16i, #cir.int<22909> : !u16i, #cir.int<19990> : !u16i, #cir.int<30028> : !u16i, #cir.int<0> : !u16i]> : !cir.array<!u16i x 5>
+
+const char32_t *test_utf32() {
+  return U"你好世界";
+}
+
+// CHECK: cir.global "private" constant cir_private dso_local @{{.+}} = #cir.const_array<[#cir.int<20320> : !u32i, #cir.int<22909> : !u32i, #cir.int<19990> : !u32i, #cir.int<30028> : !u32i, #cir.int<0> : !u32i]> : !cir.array<!u32i x 5>
+
+const char16_t *test_zero16() {
+  return u"\0\0\0\0";
+}
+
+// CHECK: cir.global "private" constant cir_private dso_local @{{.+}} = #cir.zero : !cir.array<!u16i x 5>
+
+const char32_t *test_zero32() {
+  return U"\0\0\0\0";
+}
+
+// CHECK: cir.global "private" constant cir_private dso_local @{{.+}} = #cir.zero : !cir.array<!u32i x 5>
+
+#include <stddef.h>
+
+const wchar_t *test_wchar() {
+  return L"1234";
+}
+
+// CHECK: cir.global "private" constant cir_private dso_local @{{.+}} = #cir.const_array<[#cir.int<49> : !s32i, #cir.int<50> : !s32i, #cir.int<51> : !s32i, #cir.int<52> : !s32i, #cir.int<0> : !s32i]> : !cir.array<!s32i x 5>
+
+const wchar_t *test_wchar_zero() {
+  return L"";
+}
+
+// CHECK: cir.global "private" constant cir_private dso_local @{{.+}} = #cir.zero : !cir.array<!s32i x 1>

@llvmbot
Copy link
Member

llvmbot commented Dec 10, 2025

@llvm/pr-subscribers-clang

Author: None (adams381)

Changes

This PR migrates support for wide string literals from the incubator to upstream.

Changes

  • Implement wide string literal support in getConstantArrayFromStringLiteral
  • Handle wchar_t, char16_t, and char32_t string literals
  • Collect code units and create constant arrays with IntAttr elements
  • Use ZeroAttr for null-filled strings

Testing

  • Copied wide-string.cpp test file from incubator
  • Expanded test to include wchar_t test cases (incubator only had char16_t and char32_t)
  • All tests pass

Full diff: https://github.com/llvm/llvm-project/pull/171541.diff

2 Files Affected:

  • (modified) clang/lib/CIR/CodeGen/CIRGenModule.cpp (+34-3)
  • (added) clang/test/CIR/CodeGen/wide-string.cpp (+40)
diff --git a/clang/lib/CIR/CodeGen/CIRGenModule.cpp b/clang/lib/CIR/CodeGen/CIRGenModule.cpp
index 41a5d9db83e2b..c0dcd3f55f328 100644
--- a/clang/lib/CIR/CodeGen/CIRGenModule.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenModule.cpp
@@ -21,7 +21,9 @@
 #include "clang/AST/GlobalDecl.h"
 #include "clang/AST/RecordLayout.h"
 #include "clang/Basic/SourceManager.h"
+#include "clang/CIR/Dialect/IR/CIRAttrs.h"
 #include "clang/CIR/Dialect/IR/CIRDialect.h"
+#include "clang/CIR/Dialect/IR/CIRTypes.h"
 #include "clang/CIR/Interfaces/CIROpInterfaces.h"
 #include "clang/CIR/MissingFeatures.h"
 
@@ -31,6 +33,8 @@
 #include "mlir/IR/MLIRContext.h"
 #include "mlir/IR/Verifier.h"
 
+#include <algorithm>
+
 using namespace clang;
 using namespace clang::CIRGen;
 
@@ -960,9 +964,36 @@ CIRGenModule::getConstantArrayFromStringLiteral(const StringLiteral *e) {
     return builder.getString(str, eltTy, finalSize);
   }
 
-  errorNYI(e->getSourceRange(),
-           "getConstantArrayFromStringLiteral: wide characters");
-  return mlir::Attribute();
+  auto arrayTy = mlir::dyn_cast<cir::ArrayType>(convertType(e->getType()));
+  assert(arrayTy && "string literals must be emitted as an array type");
+
+  auto arrayEltTy = mlir::dyn_cast<cir::IntType>(arrayTy.getElementType());
+  assert(arrayEltTy &&
+         "string literal elements must be emitted as integral type");
+
+  auto arraySize = arrayTy.getSize();
+  auto literalSize = e->getLength();
+
+  // Collect the code units.
+  SmallVector<uint32_t, 32> elementValues;
+  elementValues.reserve(arraySize);
+  for (unsigned i = 0; i < literalSize; ++i)
+    elementValues.push_back(e->getCodeUnit(i));
+  elementValues.resize(arraySize);
+
+  // If the string is full of null bytes, emit a #cir.zero instead.
+  if (std::all_of(elementValues.begin(), elementValues.end(),
+                  [](uint32_t x) { return x == 0; }))
+    return cir::ZeroAttr::get(arrayTy);
+
+  // Otherwise emit a constant array holding the characters.
+  SmallVector<mlir::Attribute, 32> elements;
+  elements.reserve(arraySize);
+  for (uint64_t i = 0; i < arraySize; ++i)
+    elements.push_back(cir::IntAttr::get(arrayEltTy, elementValues[i]));
+
+  auto elementsAttr = mlir::ArrayAttr::get(&getMLIRContext(), elements);
+  return builder.getConstArray(elementsAttr, arrayTy);
 }
 
 bool CIRGenModule::supportsCOMDAT() const {
diff --git a/clang/test/CIR/CodeGen/wide-string.cpp b/clang/test/CIR/CodeGen/wide-string.cpp
new file mode 100644
index 0000000000000..9f145d022a943
--- /dev/null
+++ b/clang/test/CIR/CodeGen/wide-string.cpp
@@ -0,0 +1,40 @@
+// RUN: %clang_cc1 -std=c++17 -triple x86_64-unknown-linux-gnu -fclangir -emit-cir %s -o %t.cir
+// RUN: FileCheck --input-file=%t.cir %s
+
+const char16_t *test_utf16() {
+  return u"你好世界";
+}
+
+// CHECK: cir.global "private" constant cir_private dso_local @{{.+}} = #cir.const_array<[#cir.int<20320> : !u16i, #cir.int<22909> : !u16i, #cir.int<19990> : !u16i, #cir.int<30028> : !u16i, #cir.int<0> : !u16i]> : !cir.array<!u16i x 5>
+
+const char32_t *test_utf32() {
+  return U"你好世界";
+}
+
+// CHECK: cir.global "private" constant cir_private dso_local @{{.+}} = #cir.const_array<[#cir.int<20320> : !u32i, #cir.int<22909> : !u32i, #cir.int<19990> : !u32i, #cir.int<30028> : !u32i, #cir.int<0> : !u32i]> : !cir.array<!u32i x 5>
+
+const char16_t *test_zero16() {
+  return u"\0\0\0\0";
+}
+
+// CHECK: cir.global "private" constant cir_private dso_local @{{.+}} = #cir.zero : !cir.array<!u16i x 5>
+
+const char32_t *test_zero32() {
+  return U"\0\0\0\0";
+}
+
+// CHECK: cir.global "private" constant cir_private dso_local @{{.+}} = #cir.zero : !cir.array<!u32i x 5>
+
+#include <stddef.h>
+
+const wchar_t *test_wchar() {
+  return L"1234";
+}
+
+// CHECK: cir.global "private" constant cir_private dso_local @{{.+}} = #cir.const_array<[#cir.int<49> : !s32i, #cir.int<50> : !s32i, #cir.int<51> : !s32i, #cir.int<52> : !s32i, #cir.int<0> : !s32i]> : !cir.array<!s32i x 5>
+
+const wchar_t *test_wchar_zero() {
+  return L"";
+}
+
+// CHECK: cir.global "private" constant cir_private dso_local @{{.+}} = #cir.zero : !cir.array<!s32i x 1>

Comment on lines 967 to 968
auto arrayTy = mlir::dyn_cast<cir::ArrayType>(convertType(e->getType()));
assert(arrayTy && "string literals must be emitted as an array type");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto arrayTy = mlir::dyn_cast<cir::ArrayType>(convertType(e->getType()));
assert(arrayTy && "string literals must be emitted as an array type");
auto arrayTy = mlir::cast<cir::ArrayType>(convertType(e->getType()));

Comment on lines 970 to 972
auto arrayEltTy = mlir::dyn_cast<cir::IntType>(arrayTy.getElementType());
assert(arrayEltTy &&
"string literal elements must be emitted as integral type");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto arrayEltTy = mlir::dyn_cast<cir::IntType>(arrayTy.getElementType());
assert(arrayEltTy &&
"string literal elements must be emitted as integral type");
auto arrayEltTy = mlir::cast<cir::IntType>(arrayTy.getElementType());

Comment on lines 974 to 975
auto arraySize = arrayTy.getSize();
auto literalSize = e->getLength();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use auto here.

auto literalSize = e->getLength();

// Collect the code units.
SmallVector<uint32_t, 32> elementValues;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SmallVector<uint32_t, 32> elementValues;
SmallVector<uint32_t> elementValues;

This is an artifact from the classic codegen implementation. The size used to be required, but it's not anymore and unless there is a known reason to reserve a specific size, it's usually better to omit it. In this case, since you're calling reserve on the next line the explicit size is entirely redundant.

elementValues.resize(arraySize);

// If the string is full of null bytes, emit a #cir.zero instead.
if (std::all_of(elementValues.begin(), elementValues.end(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this before we build the element value vector? Maybe something like this?

bool isAllZero = false;
for (unsigned i = 0; i < literalSize; ++i) {
  if (!e->getCodeUnit(i) == 0) {
    isAllZero = true;
    break;
  }
}
if (isAllZero)
  return cir::ZeroAttr::get(arrayTy);

In most non-zero cases, this will break out of the loop on the first element.

SmallVector<mlir::Attribute, 32> elements;
elements.reserve(arraySize);
for (uint64_t i = 0; i < arraySize; ++i)
elements.push_back(cir::IntAttr::get(arrayEltTy, elementValues[i]));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
elements.push_back(cir::IntAttr::get(arrayEltTy, elementValues[i]));
elements.push_back(cir::IntAttr::get(arrayEltTy, e->getCodeUnit(i)));

If we do the zero check as I suggested, we can avoid looping twice to build our vector of attributes.

return cir::ZeroAttr::get(arrayTy);

// Otherwise emit a constant array holding the characters.
SmallVector<mlir::Attribute, 32> elements;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SmallVector<mlir::Attribute, 32> elements;
SmallVector<mlir::Attribute> elements;

@@ -0,0 +1,40 @@
// RUN: %clang_cc1 -std=c++17 -triple x86_64-unknown-linux-gnu -fclangir -emit-cir %s -o %t.cir
// RUN: FileCheck --input-file=%t.cir %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add LLVM and OGCG checks here.

Comment on lines 992 to 993
for (uint64_t i = 0; i < arraySize; ++i)
elements.push_back(cir::IntAttr::get(arrayEltTy, elementValues[i]));
Copy link
Contributor

@xlauko xlauko Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (uint64_t i = 0; i < arraySize; ++i)
elements.push_back(cir::IntAttr::get(arrayEltTy, elementValues[i]));
for (uint32_t elementValue : elementValues)
elements.push_back(cir::IntAttr::get(arrayEltTy, elementValue));

- Use mlir::cast<> instead of dyn_cast<> + assert
- Use explicit types (uint64_t, unsigned) instead of auto for method returns
- Remove unnecessary SmallVector size parameters
- Optimize zero-check to happen before building vector (early exit)
- Avoid double-looping by building elements directly
- Add LLVM and OGCG checks to test file
@adams381
Copy link
Contributor Author

Addressed all reviewer feedback:

Code Style Improvements:

  • Changed dyn_cast<> + assert to cast<> (lines 967, 969)
  • Replaced auto with explicit types (uint64_t, unsigned) for method returns (lines 971, 972)
  • Removed unnecessary SmallVector size parameters (line 989)

Performance Optimizations:

  • Moved zero-check before building the vector to enable early exit (lines 977-983)
  • Eliminated double-looping by building elements directly from string literal instead of intermediate vector
  • Removed intermediate elementValues vector entirely

Test Improvements:

  • Added LLVM and OGCG checks to test file (as requested)

All reviewer suggestions have been implemented. The code now follows LLVM coding standards and includes performance optimizations.

Copy link
Contributor

@andykaylor andykaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I just have a few more suggestions.

for (unsigned i = 0; i < literalSize; ++i)
elements.push_back(cir::IntAttr::get(arrayEltTy, e->getCodeUnit(i)));
// Pad with zeros if needed.
for (uint64_t i = literalSize; i < arraySize; ++i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see why you'd be inclined to do this, but classic codegen doesn't. I wonder if it's necessary. Have you seen a case where arraySize is greater than literalSize? Maybe you could just assert their equivalence.

// LLVM: @{{.+}} = private constant [5 x i32] zeroinitializer
// OGCG: @{{.+}} = private unnamed_addr constant [5 x i32] zeroinitializer

#include <stddef.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include <stddef.h>
typedef __WCHAR_TYPE__ wchar_t;

const wchar_t *test_wchar_zero() {
return L"";
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add test cases using typedef __CHAR16_TYPE__ char16_t;?

- Replace padding logic with assert that arraySize == literalSize
- Improve assert message clarity
- Replace #include <stddef.h> with typedef __WCHAR_TYPE__ wchar_t
- Add test case using typedef __CHAR16_TYPE__ char16_t
- Add explanatory comments for test organization

Addresses reviewer comments on PR llvm#171541.
@adams381
Copy link
Contributor Author

Addressed remaining reviewer feedback:

Code Changes:

  • Replaced padding logic with assert(arraySize == literalSize) as suggested (asserts equivalence instead of handling uncertain edge case)
  • Improved assert message: "wide string literal length must match array type size"

Test Improvements:

  • Replaced #include <stddef.h> with typedef __WCHAR_TYPE__ wchar_t; as suggested
  • Added test case test_char16_typedef() using typedef __CHAR16_TYPE__ char16_t; as requested
  • Added explanatory comments to clarify test organization (built-in vs typedef types)

All changes comply with LLVM coding standards and .cursorrules requirements.

@github-actions
Copy link

github-actions bot commented Dec 11, 2025

🐧 Linux x64 Test Results

  • 85931 tests passed
  • 758 tests skipped
  • 2 tests failed

Failed Tests

(click on a test name to see its output)

Clang

Clang.CIR/CodeGen/mms-bitfields.c
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 1
/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang -cc1 -internal-isystem /home/gha/actions-runner/_work/llvm-project/llvm-project/build/lib/clang/22/include -nostdsysteminc -triple x86_64-unknown-linux-gnu -mms-bitfields -fclangir -emit-cir /home/gha/actions-runner/_work/llvm-project/llvm-project/clang/test/CIR/CodeGen/mms-bitfields.c -o /home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/clang/test/CIR/CodeGen/Output/mms-bitfields.c.tmp.cir
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang -cc1 -internal-isystem /home/gha/actions-runner/_work/llvm-project/llvm-project/build/lib/clang/22/include -nostdsysteminc -triple x86_64-unknown-linux-gnu -mms-bitfields -fclangir -emit-cir /home/gha/actions-runner/_work/llvm-project/llvm-project/clang/test/CIR/CodeGen/mms-bitfields.c -o /home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/clang/test/CIR/CodeGen/Output/mms-bitfields.c.tmp.cir
# .---command stderr------------
# | error: unknown argument: '-mms-bitfields'
# `-----------------------------
# error: command failed with exit status: 1

--

Clang.CIR/CodeGen/mms-bitfields.c
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 1
/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang -cc1 -internal-isystem /home/gha/actions-runner/_work/llvm-project/llvm-project/build/lib/clang/22/include -nostdsysteminc -triple x86_64-unknown-linux-gnu -mms-bitfields -fclangir -emit-cir /home/gha/actions-runner/_work/llvm-project/llvm-project/clang/test/CIR/CodeGen/mms-bitfields.c -o /home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/clang/test/CIR/CodeGen/Output/mms-bitfields.c.tmp.cir
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang -cc1 -internal-isystem /home/gha/actions-runner/_work/llvm-project/llvm-project/build/lib/clang/22/include -nostdsysteminc -triple x86_64-unknown-linux-gnu -mms-bitfields -fclangir -emit-cir /home/gha/actions-runner/_work/llvm-project/llvm-project/clang/test/CIR/CodeGen/mms-bitfields.c -o /home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/clang/test/CIR/CodeGen/Output/mms-bitfields.c.tmp.cir
# .---command stderr------------
# | error: unknown argument: '-mms-bitfields'
# `-----------------------------
# error: command failed with exit status: 1

--

If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the infrastructure label.


uint64_t arraySize = arrayTy.getSize();
unsigned literalSize = e->getLength();
assert(arraySize == literalSize &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this assert is firing in testing. I wasn't expecting that. Looking closer at the classic codegen implementation, it fills the elements array from zero to E->getLength() (i.e. literalSize here) and then resizes the array to NumElements. Given that classic codegen is using a vector of integers, maybe the resize handles the zero padding. I guess you need to bring back the zero-padding loop. Sorry about the bad suggestion.

- Change assert to arraySize == literalSize + 1 to account for null terminator
- Explicitly add null terminator to elements array
- Remove typedef statements that conflict with C++17 built-in types

Fixes CI assertion failure in PR llvm#171541.
@adams381
Copy link
Contributor Author

Fixed the CI assertion failure by accounting for the null terminator in the assertion (arraySize == literalSize + 1) and explicitly adding it to the elements array. Also removed typedef statements from the test file that conflict with C++17 built-in types.

All tests pass locally. Ready for review.

Copy link
Contributor

@andykaylor andykaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great.

@andykaylor andykaylor enabled auto-merge (squash) December 12, 2025 20:02
@andykaylor andykaylor merged commit f195d52 into llvm:main Dec 12, 2025
9 of 10 checks passed
anonymouspc pushed a commit to anonymouspc/llvm that referenced this pull request Dec 15, 2025
This PR migrates support for wide string literals from the incubator to
upstream.

## Changes

- Implement wide string literal support in
`getConstantArrayFromStringLiteral`
- Handle wchar_t, char16_t, and char32_t string literals
- Collect code units and create constant arrays with IntAttr elements
- Use ZeroAttr for null-filled strings

## Testing

- Copied `wide-string.cpp` test file from incubator
- Expanded test to include wchar_t test cases (incubator only had
char16_t and char32_t)
- All tests pass

---------

Co-authored-by: Andy Kaylor <akaylor@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang Clang issues not falling into any other category ClangIR Anything related to the ClangIR project

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants