Skip to content

Conversation

@da-viper
Copy link
Contributor

@da-viper da-viper commented Nov 26, 2025

The simplified parser incorrectly assumes if there is a context, there is no return type.

Fixed the case where functions have both a context and a return type. For example,

int foo::bar::func()
Type<int> foo::bar::func()

Also fixed the case where there is no space between the context and return.
std::vector<int>foo::bar()

The simplified parser incorrectly assumes if there is a context,
there is no return type.
@llvmbot
Copy link
Member

llvmbot commented Nov 26, 2025

@llvm/pr-subscribers-lldb

Author: Ebuka Ezike (da-viper)

Changes

The simplified parser incorrectly assumes if there is a context, there is no return type.


Full diff: https://github.com/llvm/llvm-project/pull/169652.diff

2 Files Affected:

  • (modified) lldb/source/Plugins/Language/CPlusPlus/CPlusPlusLanguage.cpp (+14-10)
  • (modified) lldb/unittests/Language/CPlusPlus/CPlusPlusLanguageTest.cpp (+5)
diff --git a/lldb/source/Plugins/Language/CPlusPlus/CPlusPlusLanguage.cpp b/lldb/source/Plugins/Language/CPlusPlus/CPlusPlusLanguage.cpp
index 4b66ff814935a..d347b57996c65 100644
--- a/lldb/source/Plugins/Language/CPlusPlus/CPlusPlusLanguage.cpp
+++ b/lldb/source/Plugins/Language/CPlusPlus/CPlusPlusLanguage.cpp
@@ -481,18 +481,22 @@ bool CPlusPlusLanguage::CxxMethodName::TrySimplifiedParse() {
       m_basename = full.substr(basename_begin, basename_end - basename_begin);
     }
 
-    if (IsTrivialBasename(m_basename)) {
+    // if the context has a white space it may have a return type.
+    // e.g. `int foo::bar::func()` or `Type<int > foo::bar::func(int)`
+    const bool no_whitespace =
+        m_context.find_first_of(" \t\n\v\f\r") == llvm::StringRef::npos;
+
+    if (no_whitespace && IsTrivialBasename(m_basename)) {
       return true;
-    } else {
-      // The C++ basename doesn't match our regular expressions so this can't
-      // be a valid C++ method, clear everything out and indicate an error
-      m_context = llvm::StringRef();
-      m_basename = llvm::StringRef();
-      m_arguments = llvm::StringRef();
-      m_qualifiers = llvm::StringRef();
-      m_return_type = llvm::StringRef();
-      return false;
     }
+    // The C++ basename doesn't match our regular expressions so this can't
+    // be a valid C++ method, clear everything out and indicate an error
+    m_context = llvm::StringRef();
+    m_basename = llvm::StringRef();
+    m_arguments = llvm::StringRef();
+    m_qualifiers = llvm::StringRef();
+    m_return_type = llvm::StringRef();
+    return false;
   }
   return false;
 }
diff --git a/lldb/unittests/Language/CPlusPlus/CPlusPlusLanguageTest.cpp b/lldb/unittests/Language/CPlusPlus/CPlusPlusLanguageTest.cpp
index 23f2f4218601a..f0c4b0a83c890 100644
--- a/lldb/unittests/Language/CPlusPlus/CPlusPlusLanguageTest.cpp
+++ b/lldb/unittests/Language/CPlusPlus/CPlusPlusLanguageTest.cpp
@@ -30,6 +30,9 @@ TEST(CPlusPlusLanguage, MethodNameParsing) {
       {"foo::~bar(baz)", "", "foo", "~bar", "(baz)", "", "foo::~bar"},
       {"a::b::c::d(e,f)", "", "a::b::c", "d", "(e,f)", "", "a::b::c::d"},
       {"void f(int)", "void", "", "f", "(int)", "", "f"},
+      {"int main()", "int", "", "main", "()", "", "main"},
+      {"int foo::bar::func01(int a, double b)", "int", "foo::bar", "func01",
+       "(int a, double b)", "", "foo::bar::func01"},
 
       // Operators
       {"std::basic_ostream<char, std::char_traits<char> >& "
@@ -101,6 +104,8 @@ TEST(CPlusPlusLanguage, MethodNameParsing) {
        "std::forward<decltype(nullptr)>"},
 
       // Templates
+      {"vector<int > foo::bar::func(int)", "vector<int >", "foo::bar", "func",
+       "(int)", "", "foo::bar::func"},
       {"void llvm::PM<llvm::Module, llvm::AM<llvm::Module>>::"
        "addPass<llvm::VP>(llvm::VP)",
        "void", "llvm::PM<llvm::Module, llvm::AM<llvm::Module>>",

@da-viper da-viper requested a review from dmpots November 26, 2025 15:32
}

if (IsTrivialBasename(m_basename)) {
// if the context has a white space it may have a return type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add more details to the summary about what cases are fixed with this change and what happens without the fix (e.g. crash, incorrect output).

It looks like previously we were incorrectly handling cases that had both a context and a return type, but seems like it did handle the case where we had a return type without a context based on this test case

{"int main()", "int", "", "main", "()", "", "main"},

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I did handle int main() correctly, I can remove that test case.

@da-viper da-viper requested a review from dmpots November 26, 2025 20:20
@Michael137
Copy link
Member

Any chance we could remove the simplified parse entirely? I had something up for it a while back but didnt land it: #137072

Seems like this is even more motivation to remove it

Copy link
Contributor

@dmpots dmpots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as a fix to existing code. I agree with @Michael137 that it would be nice to completely remove the Trivial parser if possible.

}

/// A context is trivial if an only if it matches this pattern.
/// "^\s*([A-Za-z_:]*)\s*$".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have an example context we are trying to match. For example, in the code foo::bar::baz() the context would be foo::bar.

@Michael137
Copy link
Member

@da-viper will you need to make additional upcoming changes to the simplified parse? If it's a standalone fix then happy to land it as a one-off.

@da-viper
Copy link
Contributor Author

@da-viper will you need to make additional upcoming changes to the simplified parse? If it's a standalone fix then happy to land it as a one-off.

I do not mind closing the PR in favour of removing the simple parser entirely if there is no noticeable difference in the search time performance.

@da-viper
Copy link
Contributor Author

will you need to make additional upcoming changes to the simplified parse?

The changes are in the name matching as I stumbled upon the parsing error

@Michael137
Copy link
Member

@da-viper will you need to make additional upcoming changes to the simplified parse? If it's a standalone fix then happy to land it as a one-off.

I do not mind closing the PR in favour of removing the simple parser entirely if there is no noticeable difference in the search time performance.

My only worry was that there were more upcoming changes in this component, which become increasingly difficult to maintain. Probably wont have time to revive the other PR until in two weeks time. So I dont mind landing it for now if it unblocks you

@da-viper da-viper merged commit 90e8889 into llvm:main Nov 27, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants