[Clang] Do not warn on UTF-16 -> UTF-32 conversions. (#163927) #164654

cor3ntin · 2025-10-22T16:11:16Z

UTF-16 to UTF-32 conversions seems widespread,
and lone surrogate have a distinct representation in UTF-32.

Lets not warn on this case to make the warning easier to adopt. This follows SG-16 guideline

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p3695r2.html#changes-since-r1

Fixes #163719

llvmbot · 2025-10-22T16:12:04Z

@llvm/pr-subscribers-clang

Author: Corentin Jabot (cor3ntin)

Changes

UTF-16 to UTF-16 conversions seems widespread,
and lone surrogate have a distinct representation in UTF-32.

Lets not warn on this case to make the warning easier to adopt. This follows SG-16 guideline

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p3695r2.html#changes-since-r1

Fixes #163719

Full diff: https://github.com/llvm/llvm-project/pull/164654.diff

2 Files Affected:

(modified) clang/lib/Sema/SemaChecking.cpp (+8-1)
(modified) clang/test/SemaCXX/warn-implicit-unicode-conversions.cpp (+4-4)

diff --git a/clang/lib/Sema/SemaChecking.cpp b/clang/lib/Sema/SemaChecking.cpp
index dd5b710d7e1d4..41bcf8fd493fc 100644
--- a/clang/lib/Sema/SemaChecking.cpp
+++ b/clang/lib/Sema/SemaChecking.cpp
@@ -12014,13 +12014,20 @@ static void DiagnoseMixedUnicodeImplicitConversion(Sema &S, const Type *Source,
                                                    SourceLocation CC) {
   assert(Source->isUnicodeCharacterType() && Target->isUnicodeCharacterType() &&
          Source != Target);
+
+  // Lone surrogates have a distinct representation in UTF-32.
+  // Converting between UTF-16 and UTF-32 codepoints seems very widespread,
+  // so don't warn on such conversion.
+  if (Source->isChar16Type() && Target->isChar32Type())
+    return;
+
   Expr::EvalResult Result;
   if (E->EvaluateAsInt(Result, S.getASTContext(), Expr::SE_AllowSideEffects,
                        S.isConstantEvaluatedContext())) {
     llvm::APSInt Value(32);
     Value = Result.Val.getInt();
     bool IsASCII = Value <= 0x7F;
-    bool IsBMP = Value <= 0xD7FF || (Value >= 0xE000 && Value <= 0xFFFF);
+    bool IsBMP = Value <= 0xDFFF || (Value >= 0xE000 && Value <= 0xFFFF);
     bool ConversionPreservesSemantics =
         IsASCII || (!Source->isChar8Type() && !Target->isChar8Type() && IsBMP);
 
diff --git a/clang/test/SemaCXX/warn-implicit-unicode-conversions.cpp b/clang/test/SemaCXX/warn-implicit-unicode-conversions.cpp
index fcff006d0e028..f17f20ca25295 100644
--- a/clang/test/SemaCXX/warn-implicit-unicode-conversions.cpp
+++ b/clang/test/SemaCXX/warn-implicit-unicode-conversions.cpp
@@ -14,7 +14,7 @@ void test(char8_t u8, char16_t u16, char32_t u32) {
     c16(u32); // expected-warning {{implicit conversion from 'char32_t' to 'char16_t' may lose precision and change the meaning of the represented code unit}}
 
     c32(u8);  // expected-warning {{implicit conversion from 'char8_t' to 'char32_t' may change the meaning of the represented code unit}}
-    c32(u16); // expected-warning {{implicit conversion from 'char16_t' to 'char32_t' may change the meaning of the represented code unit}}
+    c32(u16);
     c32(u32);
 
 
@@ -30,7 +30,7 @@ void test(char8_t u8, char16_t u16, char32_t u32) {
     c16(char32_t(0x7f));
     c16(char32_t(0x80));
     c16(char32_t(0xD7FF));
-    c16(char32_t(0xD800)); // expected-warning {{implicit conversion from 'char32_t' to 'char16_t' changes the meaning of the code unit '<0xD800>'}}
+    c16(char32_t(0xD800));
     c16(char32_t(0xE000));
     c16(char32_t(U'🐉')); // expected-warning {{implicit conversion from 'char32_t' to 'char16_t' changes the meaning of the code point '🐉'}}
 
@@ -44,8 +44,8 @@ void test(char8_t u8, char16_t u16, char32_t u32) {
     c32(char16_t(0x80));
 
     c32(char16_t(0xD7FF));
-    c32(char16_t(0xD800)); // expected-warning {{implicit conversion from 'char16_t' to 'char32_t' changes the meaning of the code unit '<0xD800>'}}
-    c32(char16_t(0xDFFF)); // expected-warning {{implicit conversion from 'char16_t' to 'char32_t' changes the meaning of the code unit '<0xDFFF>'}}
+    c32(char16_t(0xD800));
+    c32(char16_t(0xDFFF));
     c32(char16_t(0xE000));
     c32(char16_t(u'☕'));

cor3ntin · 2025-10-23T09:58:30Z

@AaronBallman

AaronBallman · 2025-10-23T12:11:29Z

UTF-16 to UTF-16 conversions seems widespread,

UTF-16 to UTF-32, right?

AaronBallman

LGTM aside from the PR summary, should have a release note (perhaps on the release branch).

h-vetinari · 2025-10-26T11:51:52Z

should have a release note (perhaps on the release branch).

sidenote: updated release notes on the maintenance branches almost never get published (though they should).

Version	Last Patch Release	Release Notes (RN) exist	RN don't exist	RN for most recent patch release published?
v21	21.1.4 (currently)	✅ 21.1.0 ✅ 21.1.2	❌21.1.1 ❌21.1.3 ❌21.1.4	❌
v20	20.1.8	✅ 20.1.0	❌20.1.1 and onward	❌
v19	19.1.7	✅ 19.1.0	❌19.1.1 and onward	❌
v18	18.1.8	✅ 18.1.0 ✅ 18.1.1 ✅ 18.1.4 ✅ 18.1.6 ✅ 18.1.7 ✅ 18.1.8	❌18.1.2 ❌18.1.3 ❌18.1.5	✅
v17	17.0.6	✅ 17.0.1	❌ 17.0.0 ❌ 17.0.2 and onward	❌
v16	16.0.6	✅ 16.0.0	❌ 16.0.1 and onward	❌

It would help immensely if the release notes were built and published in an automated fashion per branch (some prior thoughts on this), rather than being dependent on the availability and goodwill of the release managers. Though I realize this would be a big initial lift.

c-rhodes · 2025-10-27T09:55:25Z

sidenote: updated release notes on the maintenance branches almost never get published (though they should).

I'm new to release maintenance so this is new to me, thanks for mentioning. I'll raise it with the other release maintainers.

UTF-16 to UTF-16 conversions seems widespread, and lone surrogate have a distinct representation in UTF-32. Lets not warn on this case to make the warning easier to adopt. This follows SG-16 guideline https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p3695r2.html#changes-since-r1 Fixes llvm#163719

github-actions · 2025-10-27T10:20:31Z

@cor3ntin (or anyone else). If you would like to add a note about this fix in the release notes (completely optional). Please reply to this comment with a one or two sentence description of the fix. When you are done, please add the release:note label to this PR.

cor3ntin requested a review from AaronBallman October 22, 2025 16:11

cor3ntin added the release:backport label Oct 22, 2025

llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Oct 22, 2025

efriedma-quic added this to the LLVM 21.x Release milestone Oct 22, 2025

github-project-automation bot added this to LLVM Release Status Oct 22, 2025

github-project-automation bot moved this to Needs Triage in LLVM Release Status Oct 22, 2025

c-rhodes moved this from Needs Triage to Needs Review in LLVM Release Status Oct 23, 2025

AaronBallman approved these changes Oct 23, 2025

View reviewed changes

c-rhodes moved this from Needs Review to Needs Merge in LLVM Release Status Oct 27, 2025

c-rhodes force-pushed the wconversion_backport branch from 9b3789a to 5c802f9 Compare October 27, 2025 10:19

c-rhodes merged commit 5c802f9 into llvm:release/21.x Oct 27, 2025
4 of 8 checks passed

github-project-automation bot moved this from Needs Merge to Done in LLVM Release Status Oct 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Clang] Do not warn on UTF-16 -> UTF-32 conversions. (#163927) #164654

[Clang] Do not warn on UTF-16 -> UTF-32 conversions. (#163927) #164654

cor3ntin commented Oct 22, 2025 •

edited

Loading

Uh oh!

llvmbot commented Oct 22, 2025

Uh oh!

cor3ntin commented Oct 23, 2025

Uh oh!

AaronBallman commented Oct 23, 2025

Uh oh!

AaronBallman left a comment

Uh oh!

h-vetinari commented Oct 26, 2025 •

edited

Loading

Uh oh!

c-rhodes commented Oct 27, 2025

Uh oh!

Uh oh!

github-actions bot commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[Clang] Do not warn on UTF-16 -> UTF-32 conversions. (#163927) #164654

[Clang] Do not warn on UTF-16 -> UTF-32 conversions. (#163927) #164654

Conversation

cor3ntin commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Oct 22, 2025

Uh oh!

cor3ntin commented Oct 23, 2025

Uh oh!

AaronBallman commented Oct 23, 2025

Uh oh!

AaronBallman left a comment

Choose a reason for hiding this comment

Uh oh!

h-vetinari commented Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

c-rhodes commented Oct 27, 2025

Uh oh!

Uh oh!

github-actions bot commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

cor3ntin commented Oct 22, 2025 •

edited

Loading

h-vetinari commented Oct 26, 2025 •

edited

Loading