Skip to content

Conversation

a-tarasyuk
Copy link
Member

Fixes #153745


This PR addresses a limitation in -Wstring-concatenation, where only the first missing comma in an initializer list was diagnosed.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Aug 17, 2025
@llvmbot
Copy link
Member

llvmbot commented Aug 17, 2025

@llvm/pr-subscribers-clang

Author: Oleksandr T. (a-tarasyuk)

Changes

Fixes #153745


This PR addresses a limitation in -Wstring-concatenation, where only the first missing comma in an initializer list was diagnosed.


Full diff: https://github.com/llvm/llvm-project/pull/154018.diff

3 Files Affected:

  • (modified) clang/docs/ReleaseNotes.rst (+2)
  • (modified) clang/lib/Sema/SemaDecl.cpp (+26-19)
  • (modified) clang/test/Sema/string-concat.c (+14)
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index b35f4ea42818a..48575b6e821f0 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -162,6 +162,8 @@ Improvements to Clang's diagnostics
   an override of a virtual method.
 - Fixed fix-it hint for fold expressions. Clang now correctly places the suggested right
   parenthesis when diagnosing malformed fold expressions. (#GH151787)
+- ``-Wstring-concatenation`` now diagnoses every missing comma in an initializer list,
+  rather than stopping after the first. (#GH153745)
 
 - Fixed an issue where emitted format-signedness diagnostics were not associated with an appropriate
   diagnostic id. Besides being incorrect from an API standpoint, this was user visible, e.g.:
diff --git a/clang/lib/Sema/SemaDecl.cpp b/clang/lib/Sema/SemaDecl.cpp
index 8ddbaf34a7f47..b63c157d81dc8 100644
--- a/clang/lib/Sema/SemaDecl.cpp
+++ b/clang/lib/Sema/SemaDecl.cpp
@@ -14708,7 +14708,16 @@ void Sema::CheckCompleteVariableDeclaration(VarDecl *var) {
       isa<InitListExpr>(var->getInit())) {
     const auto *ILE = cast<InitListExpr>(var->getInit());
     unsigned NumInits = ILE->getNumInits();
-    if (NumInits > 2)
+    if (NumInits > 2) {
+      auto concatenatedPartsAt = [&](unsigned Index) -> unsigned {
+        const Expr *E = ILE->getInit(Index);
+        if (E) {
+          if (const auto *S = dyn_cast<StringLiteral>(E->IgnoreImpCasts()))
+            return S->getNumConcatenated();
+        }
+        return 0;
+      };
+
       for (unsigned I = 0; I < NumInits; ++I) {
         const auto *Init = ILE->getInit(I);
         if (!Init)
@@ -14721,24 +14730,23 @@ void Sema::CheckCompleteVariableDeclaration(VarDecl *var) {
         // Diagnose missing comma in string array initialization.
         // Do not warn when all the elements in the initializer are concatenated
         // together. Do not warn for macros too.
-        if (NumConcat == 2 && !SL->getBeginLoc().isMacroID()) {
-          bool OnlyOneMissingComma = true;
-          for (unsigned J = I + 1; J < NumInits; ++J) {
-            const auto *Init = ILE->getInit(J);
-            if (!Init)
-              break;
-            const auto *SLJ = dyn_cast<StringLiteral>(Init->IgnoreImpCasts());
-            if (!SLJ || SLJ->getNumConcatenated() > 1) {
-              OnlyOneMissingComma = false;
-              break;
-            }
-          }
+        if (NumConcat == 2) {
+          if (SL->getBeginLoc().isMacroID())
+            continue;
+
+          unsigned L = I > 0 ? concatenatedPartsAt(I - 1) : 0;
+          unsigned R = I + 1 < NumInits ? concatenatedPartsAt(I + 1) : 0;
+
+          // Skip neighbors with multi-part concatenations.
+          if (L > 1 || R > 1)
+            continue;
 
-          if (OnlyOneMissingComma) {
+          // Diagnose when at least one neighbor is a single literal.
+          if (L || R) {
             SmallVector<FixItHint, 1> Hints;
-            for (unsigned i = 0; i < NumConcat - 1; ++i)
-              Hints.push_back(FixItHint::CreateInsertion(
-                  PP.getLocForEndOfToken(SL->getStrTokenLoc(i)), ","));
+            // Insert a comma between the two tokens of this element.
+            Hints.push_back(FixItHint::CreateInsertion(
+                PP.getLocForEndOfToken(SL->getStrTokenLoc(0)), ", "));
 
             Diag(SL->getStrTokenLoc(1),
                  diag::warn_concatenated_literal_array_init)
@@ -14746,10 +14754,9 @@ void Sema::CheckCompleteVariableDeclaration(VarDecl *var) {
             Diag(SL->getBeginLoc(),
                  diag::note_concatenated_string_literal_silence);
           }
-          // In any case, stop now.
-          break;
         }
       }
+    }
   }
 
 
diff --git a/clang/test/Sema/string-concat.c b/clang/test/Sema/string-concat.c
index 63abf100c020f..ae6c0897878a7 100644
--- a/clang/test/Sema/string-concat.c
+++ b/clang/test/Sema/string-concat.c
@@ -168,3 +168,17 @@ const char *extra_parens_to_suppress_warning[] = {
     "promise"),
     "shared_future"
 };
+
+const char *multiple_missing_commas[] = {
+    "1",
+    "2"  // expected-note {{place parentheses around the string literal to silence warning}}
+    "3", // expected-warning {{suspicious concatenation of string literals in an array initialization; did you mean to separate the elements with a comma?}}
+    "4",
+    "5",
+    "6"  // expected-note {{place parentheses around the string literal to silence warning}}
+    "7", // expected-warning {{suspicious concatenation of string literals in an array initialization; did you mean to separate the elements with a comma?}}
+    "8",
+    "9",
+    "10",
+    "11",
+};

@a-tarasyuk a-tarasyuk requested a review from Fznamznon August 18, 2025 11:07
Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@a-tarasyuk a-tarasyuk merged commit c2eb895 into llvm:main Aug 18, 2025
10 checks passed
@nathanchance
Copy link
Member

For what it's worth, this causes -Wstring-concatenation to pop up a few times in the Linux kernel. Two of the instances are simple enough to resolve under the kernel's "Do not split user visible strings regardless of length" rule.

drivers/scsi/lpfc/lpfc_attr.c:76:3: warning: suspicious concatenation of string literals in an array initialization; did you mean to separate the elements with a comma? [-Wstring-concatenation]
   75 |         "link negotiated speed does not match existing"
      |
      |                                                        ,
   76 |                 " trunk - link was \"high\" speed",
      |                 ^
drivers/scsi/lpfc/lpfc_attr.c:75:2: note: place parentheses around the string literal to silence warning
   75 |         "link negotiated speed does not match existing"
      |         ^
const char *const trunk_errmsg[] = {    /* map errcode */
    "", /* There is no such error code at index 0*/
    "link negotiated speed does not match existing"
        " trunk - link was \"low\" speed",
    "link negotiated speed does not match"
        " existing trunk - link was \"middle\" speed",
    "link negotiated speed does not match existing"
        " trunk - link was \"high\" speed",
    "Attached to non-trunking port - F_Port",
    "Attached to non-trunking port - N_Port",
    "FLOGI response timeout",
    "non-FLOGI frame received",
    "Invalid FLOGI response",
    "Trunking initialization protocol",
    "Trunk peer device mismatch",
};
diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
index 33582d48ec09..433cddcae4b6 100644
--- a/drivers/scsi/lpfc/lpfc_attr.c
+++ b/drivers/scsi/lpfc/lpfc_attr.c
@@ -68,12 +68,9 @@

 const char *const trunk_errmsg[] = {   /* map errcode */
        "",     /* There is no such error code at index 0*/
-       "link negotiated speed does not match existing"
-               " trunk - link was \"low\" speed",
-       "link negotiated speed does not match"
-               " existing trunk - link was \"middle\" speed",
-       "link negotiated speed does not match existing"
-               " trunk - link was \"high\" speed",
+       "link negotiated speed does not match existing trunk - link was \"low\" speed",
+       "link negotiated speed does not match existing trunk - link was \"middle\" speed",
+       "link negotiated speed does not match existing trunk - link was \"high\" speed",
        "Attached to non-trunking port - F_Port",
        "Attached to non-trunking port - N_Port",
        "FLOGI response timeout",
drivers/edac/i7300_edac.c:197:9: warning: suspicious concatenation of string literals in an array initialization; did you mean to separate the elements with a comma? [-Wstring-concatenation]
  196 |         [0]  = "Memory Write error on non-redundant retry or "
      |
      |                                                               ,
  197 |                "FBD configuration Write error on retry",
      |                ^
drivers/edac/i7300_edac.c:196:9: note: place parentheses around the string literal to silence warning
  196 |         [0]  = "Memory Write error on non-redundant retry or "
      |                ^
static const char *ferr_fat_fbd_name[] = {
    [22] = "Non-Redundant Fast Reset Timeout",
    [2]  = ">Tmid Thermal event with intelligent throttling disabled",
    [1]  = "Memory or FBD configuration CRC read error",
    [0]  = "Memory Write error on non-redundant retry or "
           "FBD configuration Write error on retry",
};
diff --git a/drivers/edac/i7300_edac.c b/drivers/edac/i7300_edac.c
index 69068f8d0cad..939b855e8fbf 100644
--- a/drivers/edac/i7300_edac.c
+++ b/drivers/edac/i7300_edac.c
@@ -193,8 +193,7 @@ static const char *ferr_fat_fbd_name[] = {
        [22] = "Non-Redundant Fast Reset Timeout",
        [2]  = ">Tmid Thermal event with intelligent throttling disabled",
        [1]  = "Memory or FBD configuration CRC read error",
-       [0]  = "Memory Write error on non-redundant retry or "
-              "FBD configuration Write error on retry",
+       [0]  = "Memory Write error on non-redundant retry or FBD configuration Write error on retry",
 };
 #define GET_FBD_FAT_IDX(fbderr)        (((fbderr) >> 28) & 3)
 #define FERR_FAT_FBD_ERR_MASK ((1 << 0) | (1 << 1) | (1 << 2) | (1 << 22))

However, this third one is a little interesting because the warning happens on the last instance of a sequence of multi-line concatenations:

drivers/net/dsa/sja1105/sja1105_static_config.c:991:3: warning: suspicious concatenation of string literals in an array initialization; did you mean to separate the elements with a comma? [-Wstring-concatenation]
  990 |                 "vl-lookup-table present, but one of vl-policing-table, "
      |
      |                                                                          ,
  991 |                 "vl-forwarding-table or vl-forwarding-parameters-table is empty",
      |                 ^
drivers/net/dsa/sja1105/sja1105_static_config.c:990:3: note: place parentheses around the string literal to silence warning
  990 |                 "vl-lookup-table present, but one of vl-policing-table, "
      |                 ^
const char *sja1105_static_config_error_msg[] = {
    [SJA1105_CONFIG_OK] = "",
    [SJA1105_TTETHERNET_NOT_SUPPORTED] =
        "schedule-table present, but TTEthernet is "
        "only supported on T and Q/S",
    [SJA1105_INCORRECT_TTETHERNET_CONFIGURATION] =
        "schedule-table present, but one of "
        "schedule-entry-points-table, schedule-parameters-table or "
        "schedule-entry-points-parameters table is empty",
    [SJA1105_INCORRECT_VIRTUAL_LINK_CONFIGURATION] =
        "vl-lookup-table present, but one of vl-policing-table, "
        "vl-forwarding-table or vl-forwarding-parameters-table is empty",
    [SJA1105_MISSING_L2_POLICING_TABLE] =
        "l2-policing-table needs to have at least one entry",
    [SJA1105_MISSING_L2_FORWARDING_TABLE] =
        "l2-forwarding-table is either missing or incomplete",
    [SJA1105_MISSING_L2_FORWARDING_PARAMS_TABLE] =
        "l2-forwarding-parameters-table is missing",
    [SJA1105_MISSING_GENERAL_PARAMS_TABLE] =
        "general-parameters-table is missing",
    [SJA1105_MISSING_VLAN_TABLE] =
        "vlan-lookup-table needs to have at least the default untagged VLAN",
    [SJA1105_MISSING_XMII_TABLE] =
        "xmii-table is missing",
    [SJA1105_MISSING_MAC_TABLE] =
        "mac-configuration-table needs to contain an entry for each port",
    [SJA1105_OVERCOMMITTED_FRAME_MEMORY] =
        "Not allowed to overcommit frame memory. L2 memory partitions "
        "and VL memory partitions share the same space. The sum of all "
        "16 memory partitions is not allowed to be larger than 929 "
        "128-byte blocks (or 910 with retagging). Please adjust "
        "l2-forwarding-parameters-table.part_spc and/or "
        "vl-forwarding-parameters-table.partspc.",
};

I will probably suggest fixing it like so just to be consistent but I am sure someone will ask why it only flags the third instance and none of the others (I realize because of the heuristic but still).

diff --git a/drivers/net/dsa/sja1105/sja1105_static_config.c b/drivers/net/dsa/sja1105/sja1105_static_config.c
index ffece8a400a6..45564b62ce41 100644
--- a/drivers/net/dsa/sja1105/sja1105_static_config.c
+++ b/drivers/net/dsa/sja1105/sja1105_static_config.c
@@ -980,15 +980,15 @@ static u64 blk_id_map[BLK_IDX_MAX] = {
 const char *sja1105_static_config_error_msg[] = {
        [SJA1105_CONFIG_OK] = "",
        [SJA1105_TTETHERNET_NOT_SUPPORTED] =
-               "schedule-table present, but TTEthernet is "
-               "only supported on T and Q/S",
+               ("schedule-table present, but TTEthernet is "
+               "only supported on T and Q/S"),
        [SJA1105_INCORRECT_TTETHERNET_CONFIGURATION] =
-               "schedule-table present, but one of "
+               ("schedule-table present, but one of "
                "schedule-entry-points-table, schedule-parameters-table or "
-               "schedule-entry-points-parameters table is empty",
+               "schedule-entry-points-parameters table is empty"),
        [SJA1105_INCORRECT_VIRTUAL_LINK_CONFIGURATION] =
-               "vl-lookup-table present, but one of vl-policing-table, "
-               "vl-forwarding-table or vl-forwarding-parameters-table is empty",
+               ("vl-lookup-table present, but one of vl-policing-table, "
+               "vl-forwarding-table or vl-forwarding-parameters-table is empty"),
        [SJA1105_MISSING_L2_POLICING_TABLE] =
                "l2-policing-table needs to have at least one entry",
        [SJA1105_MISSING_L2_FORWARDING_TABLE] =
@@ -1004,12 +1004,12 @@ const char *sja1105_static_config_error_msg[] = {
        [SJA1105_MISSING_MAC_TABLE] =
                "mac-configuration-table needs to contain an entry for each port",
        [SJA1105_OVERCOMMITTED_FRAME_MEMORY] =
-               "Not allowed to overcommit frame memory. L2 memory partitions "
+               ("Not allowed to overcommit frame memory. L2 memory partitions "
                "and VL memory partitions share the same space. The sum of all "
                "16 memory partitions is not allowed to be larger than 929 "
                "128-byte blocks (or 910 with retagging). Please adjust "
                "l2-forwarding-parameters-table.part_spc and/or "
-               "vl-forwarding-parameters-table.partspc.",
+               "vl-forwarding-parameters-table.partspc."),
 };

 static sja1105_config_valid_t

Maybe this warning could take into account using designators probably means the concatenation is intentional? Not sure if that is possible.

@AaronBallman
Copy link
Collaborator

For what it's worth, this causes -Wstring-concatenation to pop up a few times in the Linux kernel. Two of the instances are simple enough to resolve under the kernel's "Do not split user visible strings regardless of length" rule.

drivers/scsi/lpfc/lpfc_attr.c:76:3: warning: suspicious concatenation of string literals in an array initialization; did you mean to separate the elements with a comma? [-Wstring-concatenation]
   75 |         "link negotiated speed does not match existing"
      |
      |                                                        ,
   76 |                 " trunk - link was \"high\" speed",
      |                 ^
drivers/scsi/lpfc/lpfc_attr.c:75:2: note: place parentheses around the string literal to silence warning
   75 |         "link negotiated speed does not match existing"
      |         ^
const char *const trunk_errmsg[] = {    /* map errcode */
    "", /* There is no such error code at index 0*/
    "link negotiated speed does not match existing"
        " trunk - link was \"low\" speed",
    "link negotiated speed does not match"
        " existing trunk - link was \"middle\" speed",
    "link negotiated speed does not match existing"
        " trunk - link was \"high\" speed",
    "Attached to non-trunking port - F_Port",
    "Attached to non-trunking port - N_Port",
    "FLOGI response timeout",
    "non-FLOGI frame received",
    "Invalid FLOGI response",
    "Trunking initialization protocol",
    "Trunk peer device mismatch",
};

That looks like perfectly reasonable code we probably should not be warning on IMO. Requiring the user to parenthesize it or concatenate it is kind of annoying. I wonder if another suppression mechanism would be to silence in the presence of a \ line splice at the end of the line?

However, this third one is a little interesting because the warning happens on the last instance of a sequence of multi-line concatenations:

Maybe this warning could take into account using designators probably means the concatenation is intentional? Not sure if that is possible.

I'm not certain that a designator is enough of a signal to mean the concatenation is intentional. The tradeoff here is false positives vs false negatives, and I think this is starting to show that we're introducing more false positives. Another idea would be to revert these changes and instead make a clang-tidy check that's more aggressive.

@ilovepi
Copy link
Contributor

ilovepi commented Aug 19, 2025

FWIW, we're seeing this in Fuchsia too.

Bot: https://ci.chromium.org/ui/p/fuchsia/builders/global.ci/clang_toolchain.ci.core.x64-debug-tot-build_only/b8706156533673690033/overview

Logs: https://logs.chromium.org/logs/fuchsia/buildbucket/cr-buildbucket/8706156533673690033/+/u/build/ninja/stdout

The logs show that this is mostly in third party code, but that also means we may not be able to easily get it changed, and will have to suppress these on a case-by-case basis.

In particular the tcpdump cases also seem benign to me:
https://cs.opensource.google/fuchsia/fuchsia/+/main:third_party/tcpdump/src/print-802_11.c;l=835

Another idea would be to revert these changes and instead make a clang-tidy check that's more aggressive.

I think that would be our preference.

@a-tarasyuk
Copy link
Member Author

@ilovepi @nathanchance @AaronBallman I’ve opened a PR to roll back the changes for now, to finalize the intended behavior of this flag.

a-tarasyuk added a commit that referenced this pull request Aug 19, 2025
…ng comma in initializer lists" (#154369)

Revert #154018 changes due to excessive _false
positives_. The warning caused multiple benign reports in large
codebases (e.g. _Linux kernel_, _Fuchsia_, _tcpdump_). Since many of
these concatenations are intentional and follow project style rules, the
diagnostic introduced more false positives than value. This will be
revisited as a potential `clang-tidy` check instead.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Aug 19, 2025
…every missing comma in initializer lists" (#154369)

Revert llvm/llvm-project#154018 changes due to excessive _false
positives_. The warning caused multiple benign reports in large
codebases (e.g. _Linux kernel_, _Fuchsia_, _tcpdump_). Since many of
these concatenations are intentional and follow project style rules, the
diagnostic introduced more false positives than value. This will be
revisited as a potential `clang-tidy` check instead.
@r-barnes
Copy link

We noticed false positives in our internal codebase as well, and used parens to fix them.

As expect, the warning also identified additional bugs.

I hope there will either be a flag that performs as expected (concatenation is bad unless it is explicitly marked as good) or a clang-tidy to capture the same effect. It seems perverse to rely on indentation and project formatting to signal safety, as many of the above false positives do.

This reminds me of implicit fallthroughs where projects used many inconsistent variants of // fallthrough to indicate intentional fallthrough. While GCC's flags allowed this diversity to persist clang, wisely, require use of the standard [[fallthrough]].

Here there are many ways to signal intended concatenation, but they all rely on fallible human inspection to determine that they're fine. I don't think it's good to rely on visual inspection.

clang-tidy is less desirable from our perspective because it is often a retroactive catch rather than an immediate catch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

-Wstring-concatenation seems insufficiently alarmist
7 participants