-
Notifications
You must be signed in to change notification settings - Fork 15k
[lldb] Re-use clang's keyword enable/disable mechanism in CPlusPlusNameParser.cpp #164284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[lldb] Re-use clang's keyword enable/disable mechanism in CPlusPlusNameParser.cpp #164284
Conversation
…meParser.cpp The problem is that CPlusPlusNameParser doesn't respect the language dialect for `target variable Class::constant`. If `constant` is a keyword in some (downstream in this case) dialects then it lexes `constant` as `kw_constant` instead of `kw_raw_identifier` even if that dialect is not active. As a result: * `target variable constant` works * `target variable Class::constant` fails to look up constant because it sees `kw_raw_identifier, kw_coloncolon, kw_constant` instead of `kw_raw_identifier, kw_coloncolon, kw_raw_identifier` * `expr -l c++ -- Class::constant` works because clang is parsing it Note: There's seemingly a separate bug where GetLangOptions() does not account for the process/frame language and just uses a pre-defined set of language options. I have not attempted to fix that since, for now, at least hardcoding my dialect to be disabled by that function does the right thing for existing tests. Also fixed a trivial return-after-else that clangd pointed out in the same area
|
@llvm/pr-subscribers-lldb @llvm/pr-subscribers-clang Author: Daniel Sanders (dsandersllvm) ChangesThe problem is that CPlusPlusNameParser doesn't respect the language dialect for
Note: There's seemingly a separate bug where GetLangOptions() does not account for the process/frame language and just uses a pre-defined set of language options. I have not attempted to fix that since, for now, at least hardcoding my dialect to be disabled by that function does the right thing for existing tests. Also fixed a trivial return-after-else that clangd pointed out in the same area Full diff: https://github.com/llvm/llvm-project/pull/164284.diff 3 Files Affected:
diff --git a/clang/include/clang/Basic/IdentifierTable.h b/clang/include/clang/Basic/IdentifierTable.h
index e4044bcdfcc60..8177029f13ca9 100644
--- a/clang/include/clang/Basic/IdentifierTable.h
+++ b/clang/include/clang/Basic/IdentifierTable.h
@@ -46,6 +46,59 @@ class LangOptions;
class MultiKeywordSelector;
class SourceLocation;
+// This is an inline namespace to allow both clang and lldb to use them without
+// namespace prefixes (via `using namespace` in lldb's case).
+inline namespace TokenKeyEnumerators {
+enum TokenKey : unsigned {
+ KEYC99 = 0x1,
+ KEYCXX = 0x2,
+ KEYCXX11 = 0x4,
+ KEYGNU = 0x8,
+ KEYMS = 0x10,
+ BOOLSUPPORT = 0x20,
+ KEYALTIVEC = 0x40,
+ KEYNOCXX = 0x80,
+ KEYBORLAND = 0x100,
+ KEYOPENCLC = 0x200,
+ KEYC23 = 0x400,
+ KEYNOMS18 = 0x800,
+ KEYNOOPENCL = 0x1000,
+ WCHARSUPPORT = 0x2000,
+ HALFSUPPORT = 0x4000,
+ CHAR8SUPPORT = 0x8000,
+ KEYOBJC = 0x10000,
+ KEYZVECTOR = 0x20000,
+ KEYCOROUTINES = 0x40000,
+ KEYMODULES = 0x80000,
+ KEYCXX20 = 0x100000,
+ KEYOPENCLCXX = 0x200000,
+ KEYMSCOMPAT = 0x400000,
+ KEYSYCL = 0x800000,
+ KEYCUDA = 0x1000000,
+ KEYZOS = 0x2000000,
+ KEYNOZOS = 0x4000000,
+ KEYHLSL = 0x8000000,
+ KEYFIXEDPOINT = 0x10000000,
+ KEYMAX = KEYFIXEDPOINT, // The maximum key
+ KEYALLCXX = KEYCXX | KEYCXX11 | KEYCXX20,
+ KEYALL = (KEYMAX | (KEYMAX - 1)) & ~KEYNOMS18 & ~KEYNOOPENCL &
+ ~KEYNOZOS // KEYNOMS18, KEYNOOPENCL, KEYNOZOS are excluded.
+};
+} // namespace TokenKeyEnumerators
+
+/// How a keyword is treated in the selected standard. This enum is ordered
+/// intentionally so that the value that 'wins' is the most 'permissive'.
+enum KeywordStatus {
+ KS_Unknown, // Not yet calculated. Used when figuring out the status.
+ KS_Disabled, // Disabled
+ KS_Future, // Is a keyword in future standard
+ KS_Extension, // Is an extension
+ KS_Enabled, // Enabled
+};
+
+KeywordStatus getKeywordStatus(const LangOptions &LangOpts,
+ unsigned Flags);
+
enum class ReservedIdentifierStatus {
NotReserved = 0,
StartsWithUnderscoreAtGlobalScope,
diff --git a/clang/lib/Basic/IdentifierTable.cpp b/clang/lib/Basic/IdentifierTable.cpp
index 4a2b77cd16bfc..65ed1273350c5 100644
--- a/clang/lib/Basic/IdentifierTable.cpp
+++ b/clang/lib/Basic/IdentifierTable.cpp
@@ -77,57 +77,6 @@ IdentifierTable::IdentifierTable(const LangOptions &LangOpts,
// Language Keyword Implementation
//===----------------------------------------------------------------------===//
-// Constants for TokenKinds.def
-namespace {
-
-enum TokenKey : unsigned {
- KEYC99 = 0x1,
- KEYCXX = 0x2,
- KEYCXX11 = 0x4,
- KEYGNU = 0x8,
- KEYMS = 0x10,
- BOOLSUPPORT = 0x20,
- KEYALTIVEC = 0x40,
- KEYNOCXX = 0x80,
- KEYBORLAND = 0x100,
- KEYOPENCLC = 0x200,
- KEYC23 = 0x400,
- KEYNOMS18 = 0x800,
- KEYNOOPENCL = 0x1000,
- WCHARSUPPORT = 0x2000,
- HALFSUPPORT = 0x4000,
- CHAR8SUPPORT = 0x8000,
- KEYOBJC = 0x10000,
- KEYZVECTOR = 0x20000,
- KEYCOROUTINES = 0x40000,
- KEYMODULES = 0x80000,
- KEYCXX20 = 0x100000,
- KEYOPENCLCXX = 0x200000,
- KEYMSCOMPAT = 0x400000,
- KEYSYCL = 0x800000,
- KEYCUDA = 0x1000000,
- KEYZOS = 0x2000000,
- KEYNOZOS = 0x4000000,
- KEYHLSL = 0x8000000,
- KEYFIXEDPOINT = 0x10000000,
- KEYMAX = KEYFIXEDPOINT, // The maximum key
- KEYALLCXX = KEYCXX | KEYCXX11 | KEYCXX20,
- KEYALL = (KEYMAX | (KEYMAX - 1)) & ~KEYNOMS18 & ~KEYNOOPENCL &
- ~KEYNOZOS // KEYNOMS18, KEYNOOPENCL, KEYNOZOS are excluded.
-};
-
-/// How a keyword is treated in the selected standard. This enum is ordered
-/// intentionally so that the value that 'wins' is the most 'permissive'.
-enum KeywordStatus {
- KS_Unknown, // Not yet calculated. Used when figuring out the status.
- KS_Disabled, // Disabled
- KS_Future, // Is a keyword in future standard
- KS_Extension, // Is an extension
- KS_Enabled, // Enabled
-};
-
-} // namespace
-
// This works on a single TokenKey flag and checks the LangOpts to get the
// KeywordStatus based exclusively on this flag, so that it can be merged in
// getKeywordStatus. Most should be enabled/disabled, but some might imply
@@ -222,7 +171,7 @@ static KeywordStatus getKeywordStatusHelper(const LangOptions &LangOpts,
/// Translates flags as specified in TokenKinds.def into keyword status
/// in the given language standard.
-static KeywordStatus getKeywordStatus(const LangOptions &LangOpts,
+KeywordStatus clang::getKeywordStatus(const LangOptions &LangOpts,
unsigned Flags) {
// KEYALL means always enabled, so special case this one.
if (Flags == KEYALL) return KS_Enabled;
diff --git a/lldb/source/Plugins/Language/CPlusPlus/CPlusPlusNameParser.cpp b/lldb/source/Plugins/Language/CPlusPlus/CPlusPlusNameParser.cpp
index d8c095d6edeb7..13cea5e513005 100644
--- a/lldb/source/Plugins/Language/CPlusPlus/CPlusPlusNameParser.cpp
+++ b/lldb/source/Plugins/Language/CPlusPlus/CPlusPlusNameParser.cpp
@@ -719,9 +719,8 @@ CPlusPlusNameParser::ParseFullNameImpl() {
}
start_position.Remove();
return result;
- } else {
- return std::nullopt;
}
+ return std::nullopt;
}
llvm::StringRef CPlusPlusNameParser::GetTextForRange(const Range &range) {
@@ -755,12 +754,19 @@ static const clang::LangOptions &GetLangOptions() {
return g_options;
}
-static const llvm::StringMap<tok::TokenKind> &GetKeywordsMap() {
- static llvm::StringMap<tok::TokenKind> g_map{
-#define KEYWORD(Name, Flags) {llvm::StringRef(#Name), tok::kw_##Name},
+static const llvm::StringMap<tok::TokenKind> GetKeywordsMap() {
+ using namespace clang::TokenKeyEnumerators;
+ llvm::StringMap<tok::TokenKind> g_map;
+
+ auto LangOpts = GetLangOptions();
+
+ clang::KeywordStatus AddResult;
+#define KEYWORD(NAME, FLAGS) \
+ AddResult = getKeywordStatus(LangOpts, FLAGS); \
+ if (AddResult != clang::KS_Disabled) \
+ g_map[llvm::StringRef(#NAME)] = tok::kw_##NAME;
#include "clang/Basic/TokenKinds.def"
#undef KEYWORD
- };
return g_map;
}
@@ -769,7 +775,7 @@ void CPlusPlusNameParser::ExtractTokens() {
return;
clang::Lexer lexer(clang::SourceLocation(), GetLangOptions(), m_text.data(),
m_text.data(), m_text.data() + m_text.size());
- const auto &kw_map = GetKeywordsMap();
+ const auto kw_map = GetKeywordsMap();
clang::Token token;
for (lexer.LexFromRawLexer(token); !token.is(clang::tok::eof);
lexer.LexFromRawLexer(token)) {
|
You can test this locally with the following command:git-clang-format --diff origin/main HEAD --extensions h,cpp -- clang/include/clang/Basic/IdentifierTable.h clang/lib/Basic/IdentifierTable.cpp lldb/source/Plugins/Language/CPlusPlus/CPlusPlusNameParser.cpp --diff_from_common_commit
View the diff from clang-format here.diff --git a/clang/include/clang/Basic/IdentifierTable.h b/clang/include/clang/Basic/IdentifierTable.h
index 54500b95b..7062aea31 100644
--- a/clang/include/clang/Basic/IdentifierTable.h
+++ b/clang/include/clang/Basic/IdentifierTable.h
@@ -92,8 +92,7 @@ enum KeywordStatus {
KS_Enabled, // Enabled
};
-KeywordStatus getKeywordStatus(const LangOptions &LangOpts,
- unsigned Flags);
+KeywordStatus getKeywordStatus(const LangOptions &LangOpts, unsigned Flags);
enum class ReservedIdentifierStatus {
NotReserved = 0,
|
| @@ -46,6 +46,59 @@ class LangOptions; | |||
| class MultiKeywordSelector; | |||
| class SourceLocation; | |||
|
|
|||
| // This is an inline namespace to allow both clang and lldb to use them without | |||
| // namespace prefixes (via `using namespace` in lldb's case). | |||
| inline namespace TokenKeyEnumerators { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make the Clang changes in a separate PR. I don't think you need the inline namespace either. Can we just follow the convention of the other enums in this header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind splitting it but that PR won't make sense without the context of this PR
The reason we need the inline namespace is because of this code:
#define KEYWORD(NAME, FLAGS) \
AddResult = getKeywordStatus(LangOpts, FLAGS); \
...
#include "clang/Basic/TokenKinds.def"
I can't reference FLAGS as clang::FLAGS because some of the KEYWORD() declaration's in TokenKinds.def are the | of multiple values such as BOOLSUPPORT|KEYC23 and I'd need the namespace to appear on each identifier. enum class is ruled out for the same reason, and enum class combined with using enum ... requires C++20 and therefore can't be used.
inline namespace allows lldb to import them with using namespace ... for use as bare names without changing how clang uses them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind splitting it but that PR won't make sense without the context of this PR
I don't have a huge preference. Typically we split out cross-project changes into separate PRs. It makes the area maintainers more likely to review it in time (just based filtering email subjects) and has the benefit of making the main non-NFC PR smaller. You can always just reference this PR as a motivating example.
I can't reference FLAGS as clang::FLAGS because some of the KEYWORD() declaration's in TokenKinds.def are the | of multiple values such as BOOLSUPPORT|KEYC23 and I'd need the namespace to appear on each identifier. enum class is ruled out for the same reason, and enum class combined with using enum ... requires C++20 and therefore can't be used.
Can't we just do using namespace clang inside that LLDB function then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separated out the clang portion into #165323. The commits are still in this PR as it's a dependency for this to work
Can't we just do using namespace clang inside that LLDB function then?
I've switched it to using namespace clang. I assume lldb is generally not doing that because of name collisions and wanting to know when it's using other components but this is at least scoped to a small function so it should still be fine from that perspective
|
I should mention which test was broken. It was test/Shell/SymbolFile/DWARF/x86/debug-names-static-constexpr-member.s |
…usPlusNameParser.cpp" This is just to allow me to separate the clang and lldb parts without breaking the github PR's comment history
…db to re-use it lldb's CPlusPlusNameParser is currently identifying keywords using it's own map implemented using clang/Basic/TokenKinds.def. However, it does not respect the language options so identifiers can incorrectly determined to be keywords when using languages in which they are not keywords. Rather than implement the logic to enable/disable keywords in both projects it makes sense for lldb to use clang's implementation. See llvm#164284 for more information
…meParser.cpp The problem is that CPlusPlusNameParser doesn't respect the language dialect for `target variable Class::constant`. If `constant` is a keyword in some (downstream in this case) dialects then it lexes `constant` as `kw_constant` instead of `kw_raw_identifier` even if that dialect is not active. As a result: * `target variable constant` works * `target variable Class::constant` fails to look up constant because it sees `kw_raw_identifier, kw_coloncolon, kw_constant` instead of `kw_raw_identifier, kw_coloncolon, kw_raw_identifier` * `expr -l c++ -- Class::constant` works because clang is parsing it Note: There's seemingly a separate bug where GetLangOptions() does not account for the process/frame language and just uses a pre-defined set of language options. I have not attempted to fix that since, for now, at least hardcoding my dialect to be disabled by that function does the right thing for existing tests. Also fixed a trivial return-after-else that clangd pointed out in the same area
…db to re-use it lldb's CPlusPlusNameParser is currently identifying keywords using it's own map implemented using clang/Basic/TokenKinds.def. However, it does not respect the language options so identifiers can incorrectly determined to be keywords when using languages in which they are not keywords. Rather than implement the logic to enable/disable keywords in both projects it makes sense for lldb to use clang's implementation. See llvm#164284 for more information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very minor nit, but could we move this using namespace down to just before the macro. Just so it gives the reader some indication that we may be using it for the include machinery
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lldb/unittests/Language/CPlusPlus/CPlusPlusLanguageTest.cpp has some unit-tests for CPlusPlusNameParser (and friends). If we added a constructor to CPlusPlusNameParser which took a user-provided LangOptions (but defaults to the existing one) that we could add a unit-test which constructs the CPlusPlusNameParser with some LangOptions which triggers the issue you were seeing. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit unfortunate we have to reconstruct this StringMap every time. Do you have an intuition for how expensive that is?
Why can't we construct the map statically? The LangOptions never change right? So you could just wrap all of this new logic in an immediately-invoked-lambda and still make the g_map static. Wdyt?
The problem is that CPlusPlusNameParser doesn't respect the language dialect for
target variable Class::constant. Ifconstantis a keyword in some (downstream in this case) dialects then it lexesconstantaskw_constantinstead ofkw_raw_identifiereven if that dialect is not active. As a result:target variable constantworkstarget variable Class::constantfails to look up constant because it seeskw_raw_identifier, kw_coloncolon, kw_constantinstead ofkw_raw_identifier, kw_coloncolon, kw_raw_identifierexpr -l c++ -- Class::constantworks because clang is parsing itNote: There's seemingly a separate bug where GetLangOptions() does not account for the process/frame language and just uses a pre-defined set of language options. I have not attempted to fix that since, for now, at least hardcoding my dialect to be disabled by that function does the right thing for existing tests.
Also fixed a trivial return-after-else that clangd pointed out in the same area