Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang] Dump Auto Type Inference #95509

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

nisarga3
Copy link

This pull request introduces the functionality to dump type inferences for variables and function return types using the auto keyword in C++11. When the -fdump-auto-type-inference option is specified, the compiler will emit informational messages that describe the inferred types for auto declarations.

Problem Statement :
C++11's auto keyword allows developers to let the compiler deduce the type of a variable or the return type of a function, which simplifies code and reduces redundancy. However, this can sometimes obscure the actual types being used, making it difficult to understand the code's behaviour, especially in complex codebases or during debugging. Developers need a way to see the types inferred by the compiler to improve their understanding and confidence in the code.

Proposed Solution
The proposed solution is to implement a compiler feature that dumps the inferred types of variables and function return types when the -fdump-auto-type-inference option is used. This feature will output informational messages indicating the deduced types, providing clarity and aiding in debugging.

Compilation Command (from build directory)
./bin/clang++ -mllvm -fdump-auto-type-inference ./Hello.cpp

Hello.cpp

#include<iostream>
using namespace std;
void testAuto() {
 auto w = 5;
 auto z = 3.14;
 
 auto add = [](auto a, auto b) {
   return a + b;
 };
}
int main() {
 testAuto();
 
   auto x = 5;            // int
   auto y = 3.14;         // double
   auto z = 'c';          // char
   auto arr = {1, 2, 3};  // std::initializer_list<int>
   
   auto add = [](auto a, auto b) {
       return a + b;
   };
   
   auto divide = [](auto a, auto b) -> decltype(a / b) {
       return a / b;
   };
   
   struct Foo {
       auto getVal() const {
           return val;
       }
       int val = 42;
   };
       
   return 0;
}

Output

../hello.cpp:5:8: remark: type of 'w' deduced as 'int'
   5 |   auto w = 5;
     |        ^
../hello.cpp:6:8: remark: type of 'z' deduced as 'double'
   6 |   auto z = 3.14;
     |        ^
../hello.cpp:8:8: remark: type of 'add' deduced as '(lambda at ../hello.cpp:8:14)'
   8 |   auto add = [](auto a, auto b) {
     |        ^
../hello.cpp:16:10: remark: type of 'x' deduced as 'int'
  16 |     auto x = 5;            // int
     |          ^
../hello.cpp:17:10: remark: type of 'y' deduced as 'double'
  17 |     auto y = 3.14;         // double
     |          ^
../hello.cpp:18:10: remark: type of 'z' deduced as 'char'
  18 |     auto z = 'c';          // char
     |          ^
../hello.cpp:19:10: remark: type of 'arr' deduced as 'std::initializer_list<int>'
  19 |     auto arr = {1, 2, 3};  // std::initializer_list<int>
     |          ^
../hello.cpp:27:10: remark: type of 'add' deduced as '(lambda at ../hello.cpp:27:16)'
  27 |     auto add = [](auto a, auto b) {
     |          ^
../hello.cpp:31:10: remark: type of 'divide' deduced as '(lambda at ../hello.cpp:31:19)'
  31 |     auto divide = [](auto a, auto b) -> decltype(a / b) {
     |          ^
../hello.cpp:36:14: remark: return type of function 'getVal' deduced as 'int'
  36 |         auto getVal() const {
     |              ^

Benefits to the Community

  • Improved Code Comprehension: Developers can easily see the types inferred by the compiler, which enhances their understanding of the code.
  • Enhanced Debugging: During debugging, knowing the exact types can help diagnose type-related issues more efficiently.

…uto-type-inference option to Clang. When enabled, the compiler will emit messages describing the inferred types for auto declarations in C++11. This feature aids in understanding and debugging by providing clarity on the types deduced by the compiler.
@nisarga3 nisarga3 requested a review from Endilll as a code owner June 14, 2024 06:25
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Jun 14, 2024
@llvmbot
Copy link
Member

llvmbot commented Jun 14, 2024

@llvm/pr-subscribers-clang

Author: Nisarga V (nisarga3)

Changes

This pull request introduces the functionality to dump type inferences for variables and function return types using the auto keyword in C++11. When the -fdump-auto-type-inference option is specified, the compiler will emit informational messages that describe the inferred types for auto declarations.

Problem Statement :
C++11's auto keyword allows developers to let the compiler deduce the type of a variable or the return type of a function, which simplifies code and reduces redundancy. However, this can sometimes obscure the actual types being used, making it difficult to understand the code's behaviour, especially in complex codebases or during debugging. Developers need a way to see the types inferred by the compiler to improve their understanding and confidence in the code.

Proposed Solution
The proposed solution is to implement a compiler feature that dumps the inferred types of variables and function return types when the -fdump-auto-type-inference option is used. This feature will output informational messages indicating the deduced types, providing clarity and aiding in debugging.

Compilation Command (from build directory)
./bin/clang++ -mllvm -fdump-auto-type-inference ./Hello.cpp

Hello.cpp

#include&lt;iostream&gt;
using namespace std;
void testAuto() {
 auto w = 5;
 auto z = 3.14;
 
 auto add = [](auto a, auto b) {
   return a + b;
 };
}
int main() {
 testAuto();
 
   auto x = 5;            // int
   auto y = 3.14;         // double
   auto z = 'c';          // char
   auto arr = {1, 2, 3};  // std::initializer_list&lt;int&gt;
   
   auto add = [](auto a, auto b) {
       return a + b;
   };
   
   auto divide = [](auto a, auto b) -&gt; decltype(a / b) {
       return a / b;
   };
   
   struct Foo {
       auto getVal() const {
           return val;
       }
       int val = 42;
   };
       
   return 0;
}

Output

../hello.cpp:5:8: remark: type of 'w' deduced as 'int'
   5 |   auto w = 5;
     |        ^
../hello.cpp:6:8: remark: type of 'z' deduced as 'double'
   6 |   auto z = 3.14;
     |        ^
../hello.cpp:8:8: remark: type of 'add' deduced as '(lambda at ../hello.cpp:8:14)'
   8 |   auto add = [](auto a, auto b) {
     |        ^
../hello.cpp:16:10: remark: type of 'x' deduced as 'int'
  16 |     auto x = 5;            // int
     |          ^
../hello.cpp:17:10: remark: type of 'y' deduced as 'double'
  17 |     auto y = 3.14;         // double
     |          ^
../hello.cpp:18:10: remark: type of 'z' deduced as 'char'
  18 |     auto z = 'c';          // char
     |          ^
../hello.cpp:19:10: remark: type of 'arr' deduced as 'std::initializer_list&lt;int&gt;'
  19 |     auto arr = {1, 2, 3};  // std::initializer_list&lt;int&gt;
     |          ^
../hello.cpp:27:10: remark: type of 'add' deduced as '(lambda at ../hello.cpp:27:16)'
  27 |     auto add = [](auto a, auto b) {
     |          ^
../hello.cpp:31:10: remark: type of 'divide' deduced as '(lambda at ../hello.cpp:31:19)'
  31 |     auto divide = [](auto a, auto b) -&gt; decltype(a / b) {
     |          ^
../hello.cpp:36:14: remark: return type of function 'getVal' deduced as 'int'
  36 |         auto getVal() const {
     |              ^

Benefits to the Community

  • Improved Code Comprehension: Developers can easily see the types inferred by the compiler, which enhances their understanding of the code.
  • Enhanced Debugging: During debugging, knowing the exact types can help diagnose type-related issues more efficiently.

Full diff: https://github.com/llvm/llvm-project/pull/95509.diff

6 Files Affected:

  • (modified) clang/include/clang/Driver/Options.td (+4)
  • (modified) clang/include/clang/Sema/Sema.h (+12)
  • (modified) clang/lib/Sema/Sema.cpp (+25)
  • (modified) clang/lib/Sema/SemaDecl.cpp (+6)
  • (modified) clang/lib/Sema/SemaStmt.cpp (+6)
  • (added) clang/test/Sema/fdump_auto-type-inference.cpp (+51)
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index f04c220d6e1db..655bfcd66e970 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -7467,6 +7467,10 @@ def ast_dump_filter : Separate<["-"], "ast-dump-filter">,
   MarshallingInfoString<FrontendOpts<"ASTDumpFilter">>;
 def ast_dump_filter_EQ : Joined<["-"], "ast-dump-filter=">,
   Alias<ast_dump_filter>;
+def fdump_auto_type_inference : Flag<["-"], "fdump-auto-type-inference">, Group<f_Group>,
+    HelpText<"Dump auto type inference information">;
+def fno_dump_auto_type_inference : Flag<["-"], "fno-dump-auto-type-inference">,Group<f_Group>,
+    HelpText<"Disable dumping auto type inference information">;
 def fno_modules_global_index : Flag<["-"], "fno-modules-global-index">,
   HelpText<"Do not automatically generate or update the global module index">,
   MarshallingInfoNegativeFlag<FrontendOpts<"UseGlobalModuleIndex">>;
diff --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h
index 4d4579fcfd456..502ef30fe5a21 100644
--- a/clang/include/clang/Sema/Sema.h
+++ b/clang/include/clang/Sema/Sema.h
@@ -70,6 +70,7 @@
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/TinyPtrVector.h"
+#include "llvm/Support/CommandLine.h"
 #include <deque>
 #include <memory>
 #include <optional>
@@ -77,6 +78,11 @@
 #include <tuple>
 #include <vector>
 
+namespace opts {
+// Option for dumping auto type inference
+extern llvm::cl::OptionCategory DumpAutoInference;
+extern llvm::cl::opt<bool> DumpAutoTypeInference;
+} // namespace opts
 namespace llvm {
 class APSInt;
 template <typename ValueT, typename ValueInfoT> class DenseSet;
@@ -560,6 +566,12 @@ class Sema final : public SemaBase {
   /// Warn that the stack is nearly exhausted.
   void warnStackExhausted(SourceLocation Loc);
 
+  /// Emits diagnostic remark indicating the compiler-deduced types and return
+  /// type for variables and functions
+  void DumpAutoTypeInference(SourceManager &SM, SourceLocation Loc, bool isVar,
+                             ASTContext &Context, llvm::StringRef Name,
+                             QualType DeducedType);
+
   /// Run some code with "sufficient" stack space. (Currently, at least 256K is
   /// guaranteed). Produces a warning if we're low on stack space and allocates
   /// more in that case. Use this in code that may recurse deeply (for example,
diff --git a/clang/lib/Sema/Sema.cpp b/clang/lib/Sema/Sema.cpp
index a612dcd4b4d03..13d137ec0e784 100644
--- a/clang/lib/Sema/Sema.cpp
+++ b/clang/lib/Sema/Sema.cpp
@@ -80,6 +80,14 @@
 using namespace clang;
 using namespace sema;
 
+namespace opts {
+llvm::cl::OptionCategory DumpAutoInference("DumpAutoInference");
+llvm::cl::opt<bool> DumpAutoTypeInference{
+    "fdump-auto-type-inference",
+    llvm::cl::desc("Dump compiler-deduced type for variables and return expressions declared using C++ 'auto' keyword"), llvm::cl::ZeroOrMore,
+    llvm::cl::cat(DumpAutoInference)};
+} // namespace opts
+
 SourceLocation Sema::getLocForEndOfToken(SourceLocation Loc, unsigned Offset) {
   return Lexer::getLocForEndOfToken(Loc, Offset, SourceMgr, LangOpts);
 }
@@ -553,6 +561,23 @@ void Sema::warnStackExhausted(SourceLocation Loc) {
   }
 }
 
+// Emits diagnostic remark indicating the compiler-deduced types and return type
+// for variables and functions
+void Sema::DumpAutoTypeInference(SourceManager &SM, SourceLocation Loc,
+                                 bool isVar, ASTContext &Context,
+                                 llvm::StringRef Name, QualType DeducedType) {
+  if (SM.isWrittenInMainFile(Loc) &&
+      opts::DumpAutoTypeInference.getNumOccurrences()) {
+    DiagnosticsEngine &Diag = Context.getDiagnostics();
+    unsigned DiagID = isVar ? Diag.getCustomDiagID(DiagnosticsEngine::Remark,
+                                                   "type of '%0' deduced as %1")
+                            : Diag.getCustomDiagID(
+                                  DiagnosticsEngine::Remark,
+                                  "return type of function '%0' deduced as %1");
+    Diag.Report(Loc, DiagID) << Name << DeducedType;
+  }
+}
+
 void Sema::runWithSufficientStackSpace(SourceLocation Loc,
                                        llvm::function_ref<void()> Fn) {
   clang::runWithSufficientStackSpace([&] { warnStackExhausted(Loc); }, Fn);
diff --git a/clang/lib/Sema/SemaDecl.cpp b/clang/lib/Sema/SemaDecl.cpp
index 4b9b735f1cfb4..ba5cf1868384e 100644
--- a/clang/lib/Sema/SemaDecl.cpp
+++ b/clang/lib/Sema/SemaDecl.cpp
@@ -13148,6 +13148,12 @@ bool Sema::DeduceVariableDeclarationType(VarDecl *VDecl, bool DirectInit,
   VDecl->setType(DeducedType);
   assert(VDecl->isLinkageValid());
 
+  // Emit a remark indicating the compiler-deduced type for variables declared
+  // using the C++ 'auto' keyword
+  SourceManager &SM = getSourceManager();
+  Sema::DumpAutoTypeInference(SM, VDecl->getLocation(), true, Context,
+                              VDecl->getNameAsString(), DeducedType);
+
   // In ARC, infer lifetime.
   if (getLangOpts().ObjCAutoRefCount && ObjC().inferObjCARCLifetime(VDecl))
     VDecl->setInvalidDecl();
diff --git a/clang/lib/Sema/SemaStmt.cpp b/clang/lib/Sema/SemaStmt.cpp
index 57465d4a77ac2..e14f86f2450cd 100644
--- a/clang/lib/Sema/SemaStmt.cpp
+++ b/clang/lib/Sema/SemaStmt.cpp
@@ -3799,6 +3799,12 @@ bool Sema::DeduceFunctionTypeFromReturnExpr(FunctionDecl *FD,
     // Update all declarations of the function to have the deduced return type.
     Context.adjustDeducedFunctionResultType(FD, Deduced);
 
+  // Emit a remark indicating the compiler-deduced return type for functions
+  // declared using the C++ 'auto' keyword
+  SourceManager &SM = getSourceManager();
+  Sema::DumpAutoTypeInference(SM, FD->getLocation(), false, Context,
+                              FD->getNameAsString(), Deduced);
+
   return false;
 }
 
diff --git a/clang/test/Sema/fdump_auto-type-inference.cpp b/clang/test/Sema/fdump_auto-type-inference.cpp
new file mode 100644
index 0000000000000..0509933a0b4a8
--- /dev/null
+++ b/clang/test/Sema/fdump_auto-type-inference.cpp
@@ -0,0 +1,51 @@
+// RUN: %clang_cc1 -std=c++14 -mllvm -fdump-auto-type-inference %s
+
+void testAuto() {
+  // Test auto variables
+  auto x = 5;
+  auto y = 3.14;
+
+  // Test auto return type of a lambda function
+  auto add = [](int a, double b) -> double {
+    return a + b;
+  };
+
+  // Expected remarks based on the compiler output
+  // expected-remark@-5 {{type of 'x' deduced as 'int'}}
+  // expected-remark@-4 {{type of 'y' deduced as 'double'}}
+  // expected-remark@-3 {{type of 'add' deduced as '(lambda at %s'}}
+}
+
+int main() {
+    testAuto();
+    // Testing auto variables
+    auto x = 5;            // int
+    auto y = 3.14;         // double
+    auto z = 'c';          // char
+    
+    // expected-remark@+1{{type of 'x' deduced as 'int'}}
+    // expected-remark@+1{{type of 'y' deduced as 'double'}}
+    // expected-remark@+1{{type of 'z' deduced as 'char'}}
+    
+    // Testing auto return type of a function
+    auto add = [](auto a, auto b) {
+        return a + b;
+    };
+    
+    auto divide = [](auto a, auto b) -> decltype(a / b) {
+        return a / b;
+    };
+    
+    struct Foo {
+        auto getVal() const {
+            return val;
+        }
+        int val = 42;
+    };
+    
+    // expected-remark@+2{{type of 'add' deduced as '(lambda}}
+    // expected-remark@+1{{type of 'divide' deduced as '(lambda}}
+    // expected-remark@+1{{function return type of 'getVal' deduced as 'int'}}
+    
+    return 0;
+}

@Sirraide
Copy link
Member

I candidly don’t really see the use case for this feature. This seems like it should be done by an LSP rather than by Clang itself—and clangd for instance can already do this: when I hover over a variable, it shows me the type of that variable, irrespective of whether it was declared with auto or not.

Furthermore, in cases where the auto ends up being confusing, you can always just... not use it. That’s pretty much how we use auto in LLVM (i.e. only if the type is obvious, e.g. in a cast<>()).

@Sirraide
Copy link
Member

when I hover over a variable, it shows me the type of that variable

For example:

image

Copy link
Contributor

@Endilll Endilll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this, but I'm not sure this is the right direction.

The flag you're proposing is going to add unwieldy amount of remarks if used on a typical translation unit as opposed to a small example, because from what I understand, it dumps the type of every single auto in the TU. I believe this is bad ergonomics. You can improve it by not adding remarks to declarations in #includes, but this only gets you so far.

We already have more targeted means to achieve the same result, but they are reserved for the purpose of debugging Clang itself. I'm referring to #pragma clang __debug dump (docs):

template <typename T>
T f(T) {
  return {};
}

int main() {
  auto a = f(0);
  #pragma clang __debug dump a
}

yields (colorized in the actual terminal):

lookup results for a:
VarDecl 0xe93dce8 <<source>:7:3, col:15> col:8 a 'int' cinit
`-CallExpr 0xe93e108 <col:12, col:15> 'int'
  |-ImplicitCastExpr 0xe93e0f0 <col:12> 'int (*)(int)' <FunctionToPointerDecay>
  | `-DeclRefExpr 0xe93e060 <col:12> 'int (int)' lvalue Function 0xe93df60 'f' 'int (int)' (FunctionTemplate 0xe93d9e8 'f')
  `-IntegerLiteral 0xe93dd98 <col:14> 'int' 0

https://godbolt.org/z/Ko5PqdvEh

If you're going to pursue this direction, having a pragma or an attribute that adds a remark to a particular declaration seems more useful by being less noisy.

-mllvm -fdump-auto-type-inference

This patch adds frontend behavior, so it should be -Xclang, not -mllvm. But in any case, if we intend this to be a user interface, it should be a regular driver flag, not a frontend flag, because we shouldn't make promises to users about the latter.

Copy link

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 11c08327dab425b67d80e99850e654e4c6c17864 df232a67ac0f5a294e8db4c86e10b6bdf664d673 -- clang/test/Sema/fdump_auto-type-inference.cpp clang/include/clang/Sema/Sema.h clang/lib/Sema/Sema.cpp clang/lib/Sema/SemaDecl.cpp clang/lib/Sema/SemaStmt.cpp
View the diff from clang-format here.
diff --git a/clang/lib/Sema/Sema.cpp b/clang/lib/Sema/Sema.cpp
index 13d137ec0e..dbb7edd826 100644
--- a/clang/lib/Sema/Sema.cpp
+++ b/clang/lib/Sema/Sema.cpp
@@ -84,8 +84,9 @@ namespace opts {
 llvm::cl::OptionCategory DumpAutoInference("DumpAutoInference");
 llvm::cl::opt<bool> DumpAutoTypeInference{
     "fdump-auto-type-inference",
-    llvm::cl::desc("Dump compiler-deduced type for variables and return expressions declared using C++ 'auto' keyword"), llvm::cl::ZeroOrMore,
-    llvm::cl::cat(DumpAutoInference)};
+    llvm::cl::desc("Dump compiler-deduced type for variables and return "
+                   "expressions declared using C++ 'auto' keyword"),
+    llvm::cl::ZeroOrMore, llvm::cl::cat(DumpAutoInference)};
 } // namespace opts
 
 SourceLocation Sema::getLocForEndOfToken(SourceLocation Loc, unsigned Offset) {

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @Endilll regarding the chattiness of this approach. AST dumping is a better way to do this until reflection support comes to C++ (at that point, I expect we'll have more nuanced tools for inspecting the deduced type).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants