Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang][C23] N3006 Underspecified object declarations #79845

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

to268
Copy link
Contributor

@to268 to268 commented Jan 29, 2024

The N3006 Underspecified object declarations paper describes that underspecified declarations should be diagnosed as an error.
Example snippet:

struct S1 { int x, y; };
union U1 { int a; double b; };
enum E1 { FOO, BAR };

auto valid_struct = (struct S1){ 1, 2 };
auto underspecified_struct = (struct S2 { int x, y; }){ 1, 2 }; // Error

auto valid_union = (union U1){ .a = 12 };
auto underspecified_union = (union U2 { int a; double b; }){ .a = 34 }; // Error

auto valid_enum = (enum E1){ FOO };
auto underspecified_enum = (enum E2 { BAZ, QUX }){ BAZ }; // Error

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Jan 29, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 29, 2024

@llvm/pr-subscribers-clang

Author: Guillot Tony (to268)

Changes

The N3006 Underspecified object declarations paper describes that underspecified declarations should be diagnosed as an error.
Example snippet:

struct S1 { int x, y; };
union U1 { int a; double b; };
enum E1 { FOO, BAR };

auto valid_struct = (struct S1){ 1, 2 };
auto underspecified_struct = (struct S2 { int x, y; }){ 1, 2 }; // Error

auto valid_union = (union U1){ .a = 12 };
auto underspecified_union = (union U2 { int a; double b; }){ .a = 34 }; // Error

auto valid_enum = (enum E1){ FOO };
auto underspecified_enum = (enum E2 { BAZ, QUX }){ BAZ }; // Error

Full diff: https://github.com/llvm/llvm-project/pull/79845.diff

6 Files Affected:

  • (modified) clang/docs/ReleaseNotes.rst (+3)
  • (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+2)
  • (modified) clang/lib/Sema/SemaExpr.cpp (+27)
  • (added) clang/test/C/C2x/n3006.c (+27)
  • (added) clang/test/Parser/c2x-underspecified-decls.c (+12)
  • (modified) clang/www/c_status.html (+1-1)
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 254e0a9cb72979..65885968204a64 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -83,6 +83,9 @@ C Language Changes
 C23 Feature Support
 ^^^^^^^^^^^^^^^^^^^
 
+- Clang now diagnoses `N3006 Underspecified object declarations
+  <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3006.htm>`_.
+
 Non-comprehensive list of changes in this release
 -------------------------------------------------
 
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 24d32cb87c89e2..8508208a3acedd 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -7639,6 +7639,8 @@ def err_attribute_arm_mve_polymorphism : Error<
   "'__clang_arm_mve_strict_polymorphism' attribute can only be applied to an MVE/NEON vector type">;
 def err_attribute_webassembly_funcref : Error<
   "'__funcref' attribute can only be applied to a function pointer type">;
+def err_c23_underspecified_object_declaration: Error<
+  "'%select{struct|<ERROR>|union|<ERROR>|enum}0 %1' is defined as an underspecified object initializer">;
 
 def warn_setter_getter_impl_required : Warning<
   "property %0 requires method %1 to be defined - "
diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp
index 2f1ddfb215116d..8166748c38e6e9 100644
--- a/clang/lib/Sema/SemaExpr.cpp
+++ b/clang/lib/Sema/SemaExpr.cpp
@@ -7805,6 +7805,33 @@ Sema::BuildCompoundLiteralExpr(SourceLocation LParenLoc, TypeSourceInfo *TInfo,
                                            diag::err_variable_object_no_init))
         return ExprError();
     }
+  } else if (LangOpts.C23 &&
+             (literalType->isRecordType() || literalType->isEnumeralType())) {
+    // C23 6.2.1p7: Structure, union, and enumeration tags have scope that
+    // begins just after the appearance of the tag in a type specifier that
+    // declares the tag.
+    // [...]
+    // An ordinary identifier that has an underspecified definition has scope
+    // that starts when the definition is completed; if the same ordinary
+    // identifier declares another entity with a scope that encloses the current
+    // block, that declaration is hidden as soon as the inner declarator is
+    // completed*.)
+    // [...]
+    // *) That means, that the outer declaration is not visible for the
+    // initializer.
+    auto Range = SourceRange(LParenLoc, RParenLoc);
+    const auto *Tag = literalType->castAs<TagType>();
+    const auto &TagRange = Tag->getDecl()->getSourceRange();
+
+    // We should diagnose underspecified declaration, unless the identifier has
+    // been diagnosed as being a redefinition, since the tag is made anonymous.
+    if (Range.fullyContains(TagRange) && Tag->getDecl()->getIdentifier()) {
+      Diag(TagRange.getBegin(),
+           diag::err_c23_underspecified_object_declaration)
+          << (unsigned)Tag->getDecl()->getTagKind()
+          << Tag->getDecl()->getName() << TagRange;
+      return ExprError();
+    }
   } else if (!literalType->isDependentType() &&
              RequireCompleteType(LParenLoc, literalType,
                diag::err_typecheck_decl_incomplete_type,
diff --git a/clang/test/C/C2x/n3006.c b/clang/test/C/C2x/n3006.c
new file mode 100644
index 00000000000000..15efc0ccd6d323
--- /dev/null
+++ b/clang/test/C/C2x/n3006.c
@@ -0,0 +1,27 @@
+// RUN: %clang_cc1 -std=c2x -verify %s
+
+/* WG14 N3006: Full
+ * Underspecified object declarations
+ */
+
+struct S1 { int x, y; };        // expected-note {{previous definition is here}}
+union U1 { int a; double b; };  // expected-note {{previous definition is here}}
+enum E1 { FOO, BAR };           // expected-note {{previous definition is here}}
+
+auto normal_struct = (struct S1){ 1, 2 };
+auto underspecified_struct = (struct S2 { int x, y; }){ 1, 2 };           // expected-error {{'struct S2' is defined as an underspecified object initializer}}
+auto underspecified_struct_redef = (struct S1 { char x, y; }){ 'A', 'B'}; // expected-error {{redefinition of 'S1'}}
+auto underspecified_empty_struct = (struct S3 { }){ };                    // expected-error {{'struct S3' is defined as an underspecified object initializer}}
+
+auto normal_union_int = (union U1){ .a = 12 };
+auto normal_union_double = (union U1){ .b = 2.4 };
+auto underspecified_union = (union U2 { int a; double b; }){ .a = 34 };         // expected-error {{'union U2' is defined as an underspecified object initializer}}
+auto underspecified_union_redef = (union U1 { char a; double b; }){ .a = 'A' }; // expected-error {{redefinition of 'U1'}}
+auto underspecified_empty_union = (union U3 {  }){  };                          // expected-error {{'union U3' is defined as an underspecified object initializer}}
+
+auto normal_enum_foo = (enum E1){ FOO };
+auto normal_enum_bar = (enum E1){ BAR };
+auto underspecified_enum = (enum E2 { BAZ, QUX }){ BAZ };       // expected-error {{'enum E2' is defined as an underspecified object initializer}}
+auto underspecified_enum_redef = (enum E1 { ONE, TWO }){ ONE }; // expected-error {{redefinition of 'E1'}}
+auto underspecified_empty_enum = (enum E3 {  }){ };             // expected-error {{'enum E3' is defined as an underspecified object initializer}} \
+                                                                   expected-error {{use of empty enum}}
diff --git a/clang/test/Parser/c2x-underspecified-decls.c b/clang/test/Parser/c2x-underspecified-decls.c
new file mode 100644
index 00000000000000..5a7e935cda5444
--- /dev/null
+++ b/clang/test/Parser/c2x-underspecified-decls.c
@@ -0,0 +1,12 @@
+// RUN: %clang_cc1 -fsyntax-only -verify=expected,c23 -std=c23 %s
+// RUN: %clang_cc1 -fsyntax-only -verify=expected,c17 -std=c17 %s
+
+auto underspecified_struct = (struct S1 { int x, y; }){ 1, 2 };         // c23-error {{'struct S1' is defined as an underspecified object initializer}} \
+                                                                           c17-error {{type specifier missing, defaults to 'int'; ISO C99 and later do not support implicit int}} \
+                                                                           c17-error {{illegal storage class on file-scoped variable}}
+auto underspecified_union = (union U1 { int a; double b; }){ .a = 34 }; // c23-error {{'union U1' is defined as an underspecified object initializer}} \
+                                                                           c17-error {{type specifier missing, defaults to 'int'; ISO C99 and later do not support implicit int}} \
+                                                                           c17-error {{illegal storage class on file-scoped variable}}
+auto underspecified_enum = (enum E1 { FOO, BAR }){ BAR };               // c23-error {{'enum E1' is defined as an underspecified object initializer}} \
+                                                                           c17-error {{type specifier missing, defaults to 'int'; ISO C99 and later do not support implicit int}} \
+                                                                           c17-error {{illegal storage class on file-scoped variable}}
diff --git a/clang/www/c_status.html b/clang/www/c_status.html
index 3955a1d7963976..fc7d3f5f8ad79c 100644
--- a/clang/www/c_status.html
+++ b/clang/www/c_status.html
@@ -1191,7 +1191,7 @@ <h2 id="c2x">C23 implementation status</h2>
     <tr>
       <td>Underspecified object definitions</td>
       <td><a href="https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3006.htm">N3006</a></td>
-      <td class="none" align="center">No</td>
+      <td class="unreleased" align="center">Clang 19</td>
     </tr>
     <tr>
       <td>Type inference for object declarations</td>

@@ -7639,6 +7639,8 @@ def err_attribute_arm_mve_polymorphism : Error<
"'__clang_arm_mve_strict_polymorphism' attribute can only be applied to an MVE/NEON vector type">;
def err_attribute_webassembly_funcref : Error<
"'__funcref' attribute can only be applied to a function pointer type">;
def err_c23_underspecified_object_declaration: Error<
"'%select{struct|<ERROR>|union|<ERROR>|enum}0 %1' is defined as an underspecified object initializer">;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the diagnostic message need to be corrected.

Copy link
Contributor

@cor3ntin cor3ntin Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think underspecified object is useful for users
Maybe error: the deduced type of foo refers to a struct|union|enum bar which is defined in its initializer.

Copy link
Contributor

@Fznamznon Fznamznon Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK "underspecified" is how the standard calls it. And constexpr declarations in C are also "underspecified".
gcc agrees https://godbolt.org/z/1KcW9zTcd .
I agree that underspecified sounds weird for ordinary people though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also didn't wanted to just use the same wording as GCC.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the terminology used in the standard is really hard for mortals to understand, but I also think the definition of "underspecified" makes it hard to capture this in a single diagnostic and still be precise. But I think existing diagnostics cover the other cases, mostly.

C23 6.7.1p12: A declaration such that the declaration specifiers contain no type specifier or that is declared with constexpr is said to be underspecified. If such a declaration is not a definition, if it declares no or more than one ordinary identifier, if the declared identifier already has a declaration in the same scope, if the declared entity is not an object, or if anywhere within the sequence of tokens making up the declaration identifiers that are not ordinary are declared, the behavior is implementation-defined.

So I think the best I can come up with is:

%select{declaration using a deduced type|constexpr declaration}0 %1 also declares %2 which is not an ordinary identifier

I think "if such a declaration is not a definition" is already covered by constexpr variable 'i' must be initialized by a constant expression: https://godbolt.org/z/nvKEzv975
I think "if it declares no or more than one ordinary identifier" is already covered by expected identifier or '(' or is supported by Clang: https://godbolt.org/z/Yas4od5Ev (Note, we should add documentation for the implementation-defined extension accepting multiple declarations in a group.)
I think "if the declared identifier already has a declaration in the same scope" is not correctly supported: https://godbolt.org/z/Kab7GvW85 but this should be handled by an existing redeclaration diagnostic when we fix it.
I think "if the declared entity is not an object" is already covered by "'auto' not allowed in function return type": https://godbolt.org/z/KzeTeczY7
I think "if anywhere within the sequence of tokens making up the declaration identifiers that are not ordinary are declared" is the only thing we're missing a diagnostic for, and the above suggested wording would handle it.

So I think we can get away without inflicting "underspecified" on users.

The other half of this paper has to do with scope of identifiers, we may end up needing to get creative with diagnostic wording when lookup fails to find an identifier from an underspecified declaration.

Copy link

github-actions bot commented Jan 29, 2024

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 65ae09eeb6773b14189fc67051870c8fc4eb9ae3 76ee2339d7eb5dde94bfbb3c6bf20cb28229447b -- clang/test/C/C2x/n3006.c clang/test/Parser/c2x-underspecified-decls.c clang/lib/Sema/SemaExpr.cpp
View the diff from clang-format here.
diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp
index bb6004d6fc..7ec2c0a5fd 100644
--- a/clang/lib/Sema/SemaExpr.cpp
+++ b/clang/lib/Sema/SemaExpr.cpp
@@ -7840,9 +7840,11 @@ Sema::BuildCompoundLiteralExpr(SourceLocation LParenLoc, TypeSourceInfo *TInfo,
       return ExprError();
     }
   } else if (!literalType->isDependentType() &&
-             RequireCompleteType(LParenLoc, literalType,
-               diag::err_typecheck_decl_incomplete_type,
-               SourceRange(LParenLoc, LiteralExpr->getSourceRange().getEnd())))
+             RequireCompleteType(
+                 LParenLoc, literalType,
+                 diag::err_typecheck_decl_incomplete_type,
+                 SourceRange(LParenLoc,
+                             LiteralExpr->getSourceRange().getEnd())))
     return ExprError();
 
   InitializedEntity Entity

@to268
Copy link
Contributor Author

to268 commented Jan 29, 2024

CC: @AaronBallman

Comment on lines 7835 to 7839
RequireCompleteType(
LParenLoc, literalType,
diag::err_typecheck_decl_incomplete_type,
SourceRange(LParenLoc,
LiteralExpr->getSourceRange().getEnd())))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

White space changes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the formatting CI was yelling at me for this.
I'll probably revert this since it's not part of the feature.

@@ -7639,6 +7639,8 @@ def err_attribute_arm_mve_polymorphism : Error<
"'__clang_arm_mve_strict_polymorphism' attribute can only be applied to an MVE/NEON vector type">;
def err_attribute_webassembly_funcref : Error<
"'__funcref' attribute can only be applied to a function pointer type">;
def err_c23_underspecified_object_declaration: Error<
"'%select{struct|<ERROR>|union|<ERROR>|enum}0 %1' is defined as an underspecified object initializer">;
Copy link
Contributor

@cor3ntin cor3ntin Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think underspecified object is useful for users
Maybe error: the deduced type of foo refers to a struct|union|enum bar which is defined in its initializer.

@to268
Copy link
Contributor Author

to268 commented Feb 5, 2024

Ping: @AaronBallman

@to268
Copy link
Contributor Author

to268 commented Mar 18, 2024

I have rebased my PR to include N3018 constexpr feature and reverted a formatted snippet of code that is not part of the PR.
I will add constexpr tests but I need to investigate why the constexpr keyword does not trigger the diagnostic (not sure if it's a constexpr implementation bug or something I need to handle in my PR (it does not really matter TBO)).

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this! I think there's something here worth considering: how an implementation handles use of underspecified declarations is left implementation-defined. I think we want to diagnose many of the situations it covers, but I think we already have most of those diagnostics covered.

The one thing that's not covered is declaring or defining new identifiers as part of an underspecified declaration. However, we might want to accept that as an extension rather than follow GCC's footsteps. C users are used to the fact that you can define a type basically anywhere you can spell a type name. Is it confusing? Absolutely. But I think it's just as confusing to allow users to define a type in some places but not others, along fairly arbitrary lines. In other words, it's weird to me that we would allow:

void func() {
  int x = (struct Foo { int x; }){ 0 }.x;
}

but would reject:

void func() {
  constexpr int x = (struct Foo { int x; }){ 0 }.x;
}

So support for this paper might entail adding documentation for what our implementation-defined behavior is when we accept code + test coverage, rather than rejecting code because GCC rejects it.

CC @zygoloid @jyknight @jrtc27 @nickdesaulniers for some more opinions on this.

@@ -7639,6 +7639,8 @@ def err_attribute_arm_mve_polymorphism : Error<
"'__clang_arm_mve_strict_polymorphism' attribute can only be applied to an MVE/NEON vector type">;
def err_attribute_webassembly_funcref : Error<
"'__funcref' attribute can only be applied to a function pointer type">;
def err_c23_underspecified_object_declaration: Error<
"'%select{struct|<ERROR>|union|<ERROR>|enum}0 %1' is defined as an underspecified object initializer">;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the terminology used in the standard is really hard for mortals to understand, but I also think the definition of "underspecified" makes it hard to capture this in a single diagnostic and still be precise. But I think existing diagnostics cover the other cases, mostly.

C23 6.7.1p12: A declaration such that the declaration specifiers contain no type specifier or that is declared with constexpr is said to be underspecified. If such a declaration is not a definition, if it declares no or more than one ordinary identifier, if the declared identifier already has a declaration in the same scope, if the declared entity is not an object, or if anywhere within the sequence of tokens making up the declaration identifiers that are not ordinary are declared, the behavior is implementation-defined.

So I think the best I can come up with is:

%select{declaration using a deduced type|constexpr declaration}0 %1 also declares %2 which is not an ordinary identifier

I think "if such a declaration is not a definition" is already covered by constexpr variable 'i' must be initialized by a constant expression: https://godbolt.org/z/nvKEzv975
I think "if it declares no or more than one ordinary identifier" is already covered by expected identifier or '(' or is supported by Clang: https://godbolt.org/z/Yas4od5Ev (Note, we should add documentation for the implementation-defined extension accepting multiple declarations in a group.)
I think "if the declared identifier already has a declaration in the same scope" is not correctly supported: https://godbolt.org/z/Kab7GvW85 but this should be handled by an existing redeclaration diagnostic when we fix it.
I think "if the declared entity is not an object" is already covered by "'auto' not allowed in function return type": https://godbolt.org/z/KzeTeczY7
I think "if anywhere within the sequence of tokens making up the declaration identifiers that are not ordinary are declared" is the only thing we're missing a diagnostic for, and the above suggested wording would handle it.

So I think we can get away without inflicting "underspecified" on users.

The other half of this paper has to do with scope of identifiers, we may end up needing to get creative with diagnostic wording when lookup fails to find an identifier from an underspecified declaration.

@@ -7813,6 +7813,32 @@ Sema::BuildCompoundLiteralExpr(SourceLocation LParenLoc, TypeSourceInfo *TInfo,
diag::err_variable_object_no_init))
return ExprError();
}
} else if (LangOpts.C23 &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these changes are quite sufficient because I think we need to handle more cases than just compound literals. Consider:

constexpr typeof(struct s *) x = 0; // declares `s` which is not an ordinary identifier
constexpr struct S { int a, b; } y = { 0 }; // declares `S`, `a`, and `b`, none of which are ordinary identifiers
constexpr int a = 0, b = 0; // declares `a` and `b` as ordinary identifiers
auto c = (struct T { int x, y; }){0, 0}; // declares `T`, `x`, and `y`, none of which are ordinary identifiers
constexpr int (*fp)(struct X { int x; } val) = 0; // declares `X` and `x` which are not ordinary identifiers

(https://godbolt.org/z/Pq87aMnch)

auto underspecified_enum = (enum E2 { BAZ, QUX }){ BAZ }; // expected-error {{'enum E2' is defined as an underspecified object initializer}}
auto underspecified_enum_redef = (enum E1 { ONE, TWO }){ ONE }; // expected-error {{redefinition of 'E1'}}
auto underspecified_empty_enum = (enum E3 { }){ }; // expected-error {{'enum E3' is defined as an underspecified object initializer}} \
expected-error {{use of empty enum}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need additional test coverage for the changes to scope that came from the paper, but we should already have diagnostics implemented for them. I read that as changing the behavior of this code:

constexpr int i = i;
auto j = j;

The only bit I'm not certain of is "... if the same ordinary identifier declares another entity with a scope that encloses the current block, that declaration is hidden as soon as the inner declarator is completed".

int func() {
  struct S { int x, y; };
  constexpr int i = (struct T { int a, b; }){0, 1}.a;

  struct T t = { 1, 2 };
}

so that the use of struct T outside of the compound literal is no longer valid. I'm not certain we want to conform to that restriction if we're deciding we want to accept the use in the compound literal. Again, it's mysterious behavior for the constexpr specifier to change behavior of type definitions.

union U1 { int a; double b; }; // expected-note {{previous definition is here}}
enum E1 { FOO, BAR }; // expected-note {{previous definition is here}}

auto normal_struct = (struct S1){ 1, 2 };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider adding a test using compound literals

auto normal_struct2 = (struct S1) { .x = 1, .y = 2 };

(I'm surprised we don't currently support this (normal_struct2 or normal_struct). That alone seems like an improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants