Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clangd] Language server gets stuck parsing comments with # in asm files #62090

Closed
Ferdi265 opened this issue Apr 12, 2023 · 11 comments
Closed
Labels
clangd hang Compiler hang (infinite loop)

Comments

@Ferdi265
Copy link
Member

Ferdi265 commented Apr 12, 2023

When parsing .S assembler files that contain comments starting with # , clangd gets stuck in clang::syntax::TokenCollector::Builder::build() at 100% CPU usage.

This would be an invalid preprocessing directive in a .c file, but is ignored by the preprocessor in .S files as it is just another comment syntax on some targets. Somehow this trips up the clangd parser and it gets stuck forever. Some targets even have this as the only available comment syntax (; comments are not accepted on e.g. Arm, Xtensa)

Example that doesn't hang:

main:
#define comment
    bx lr

Example that hangs:

main:
    # test comment
    bx lr

Clang version: 15.0.7 (Arch Linux), target: arm-none-eabi, native target: x86_64-pc-linux-gnu

@Ferdi265
Copy link
Member Author

Stack trace:

#0  clang::syntax::TokenCollector::Builder::build() && () at /usr/src/debug/clang/clang-15.0.7.src/lib/Tooling/Syntax/Tokens.cpp:781
#1  clang::syntax::TokenCollector::consume() && () at /usr/src/debug/clang/clang-15.0.7.src/lib/Tooling/Syntax/Tokens.cpp:948
#2  0x000055e421e67d11 in clang::clangd::ParsedAST::build ()
    at /usr/src/debug/clang/clang-15.0.7.src/tools/extra/clangd/ParsedAST.cpp:630
#3  0x000055e421eb29d3 in generateDiagnostics () at /usr/src/debug/clang/clang-15.0.7.src/tools/extra/clangd/TUScheduler.cpp:1186
#4  0x000055e421eb370a in clang::clangd::(anonymous namespace)::ASTWorker::updatePreamble(std::unique_ptr<clang::CompilerInvocation, std::default_delete<clang::CompilerInvocation> >, clang::clangd::ParseInputs, std::shared_ptr<clang::clangd::PreambleData const>, std::vector<clang::clangd::Diag, std::allocator<clang::clangd::Diag> >, clang::clangd::WantDiagnostics)::{lambda()#1}::operator()() [clone .part.0] [clone .lto_priv.0] () at /usr/src/debug/clang/clang-15.0.7.src/tools/extra/clangd/TUScheduler.cpp:1119
#5  0x000055e4229129bd in llvm::unique_function<void ()>::operator()() () at /usr/include/llvm/ADT/FunctionExtras.h:384
#6  llvm::function_ref<void ()>::callback_fn<llvm::unique_function<void ()> >(long) ()
    at /usr/include/llvm/ADT/STLFunctionalExtras.h:45
#7  llvm::function_ref<void ()>::operator()() const () at /usr/include/llvm/ADT/STLFunctionalExtras.h:68
#8  clang::clangd::(anonymous namespace)::ASTWorker::runTask(llvm::StringRef, llvm::function_ref<void ()>) [clone .constprop.0] ()
    at /usr/src/debug/clang/clang-15.0.7.src/tools/extra/clangd/TUScheduler.cpp:1299
#9  0x000055e421eac12c in run () at /usr/src/debug/clang/clang-15.0.7.src/tools/extra/clangd/TUScheduler.cpp:1432
#10 0x000055e422004187 in llvm::unique_function<void ()>::operator()() () at /usr/include/llvm/ADT/FunctionExtras.h:384
#11 operator() () at /usr/src/debug/clang/clang-15.0.7.src/tools/extra/clangd/support/Threading.cpp:100
#12 Apply<clang::clangd::AsyncTaskRunner::runAsync(const llvm::Twine&, llvm::unique_function<void()>)::<lambda()> > ()
    at /usr/include/llvm/Support/thread.h:42
#13 GenericThreadProxy<std::tuple<clang::clangd::AsyncTaskRunner::runAsync(const llvm::Twine&, llvm::unique_function<void()>)::<lambda()> > > () at /usr/include/llvm/Support/thread.h:50
#14 ThreadProxy<std::tuple<clang::clangd::AsyncTaskRunner::runAsync(const llvm::Twine&, llvm::unique_function<void()>)::<lambda()> > >(void) () at /usr/include/llvm/Support/thread.h:60
#15 0x00007f0fe469ebb5 in start_thread (arg=<optimized out>) at pthread_create.c:444
#16 0x00007f0fe4720d90 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

@Ferdi265
Copy link
Member Author

potentially related to #58482, since it also involves the preprocessor and clangd getting stuck, but the stack trace looks different

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 12, 2023

@llvm/issue-subscribers-clangd

@HighCommander4
Copy link
Collaborator

Clangd does not support assembly source files.

What editor are you using? The editor should not be trying to use clangd as the language server for .S files.

@Ferdi265
Copy link
Member Author

Hmm, I think I might have misconfigured something then. I'm using a rather custom setup. I initially enabled clangd on .S files mostly for getting macro expansion and go-to-definition on #include directives. Though I still think that, supported or not, clangd getting stuck and using 100% CPU is a bug.

I will see if I find the root cause of the issue on my own and will check if either I find a reproducer on a supported input file format or find a way to make clangd more robust in this case (i.e., just error out and fail in case the file seems invalid).

I totally understand if this issue is closed as unsupported/wontfix, since it is not a supported file format.

@HighCommander4
Copy link
Collaborator

I'm sympathetic to the use case of browsing an assembly file in an editor with features like go-to-def on #include directives. This is a feature I've seen in some other C/C++ editors (such as Eclipse CDT), and it seems to me that it would be a reasonable fit for clangd to add such a feature too. It's just not a feature it has today.

If we were to add that feature, it would involve feeding the assembly source code to an assembly parser (possibly the same one used by the actual assembler during building), not to the C/C++ parser.

I agree that ideally the C/C++ parser should gracefully (i.e. without crashing or hanging) handle any invalid input, and assembly source is a type of invalid input, but I think our time is better spent exploring a solution that doesn't try to feed assembly code to the C/C++ parser in the first place.

@Ferdi265
Copy link
Member Author

I think another simpler solution — instead of a full-blown assembly parser — would be a pure preprocessing parser, that gets out of the way of any of the rest of the syntax and just parses the # directives (if any).

@HighCommander4
Copy link
Collaborator

I think another simpler solution — instead of a full-blown assembly parser — would be a pure preprocessing parser, that gets out of the way of any of the rest of the syntax and just parses the # directives (if any).

I agree, this approach sounds promising.

@HighCommander4
Copy link
Collaborator

By the way, I reproduced the issue with the vscode client by tricking the client into using clangd as the language server for .S files by adding the following to my vscode config:

"files.associations": {
  "*.S": "cpp"
}

Note that this is only needed so the .S file passes the vscode client's filter for which files to pass to clangd -- the client doesn't actually convey information about the file's language to clangd. Clangd determines the language itself based on the extension and any -x flag in the compile command (which takes precedence over the extension).

In my case, I did not add any -x flag (for example, -x c++) to the compile command, so the only information clangd had about the file type was the extension, .S -- and yet, clangd tried to parse this file as C/C++.

So, I think there is something to fix in clangd here: if it gets a file whose extension looks like assembly (such as .S), and this is not overridden by a -x flag, it should bail out of trying to parse the file (until some sort of support for assembly files is added).

@HighCommander4
Copy link
Collaborator

So, I think there is something to fix in clangd here: if it gets a file whose extension looks like assembly (such as .S), and this is not overridden by a -x flag, it should bail out of trying to parse the file (until some sort of support for assembly files is added).

Posted https://reviews.llvm.org/D149236 to fix this.

razmser pushed a commit to SuduIDE/llvm-project that referenced this issue Oct 2, 2023
The previous behaviour is to try to parse such files, and in some
cases assert or hang in components that don't expect these forms of
input, like TokenBuffer.

Fixes llvm#62090

Differential Revision: https://reviews.llvm.org/D149236
razmser pushed a commit to SuduIDE/llvm-project that referenced this issue Oct 2, 2023
The previous behaviour is to try to parse such files, and in some
cases assert or hang in components that don't expect these forms of
input, like TokenBuffer.

Fixes llvm#62090

Differential Revision: https://reviews.llvm.org/D149236
razmser pushed a commit to SuduIDE/llvm-project that referenced this issue Oct 2, 2023
The previous behaviour is to try to parse such files, and in some
cases assert or hang in components that don't expect these forms of
input, like TokenBuffer.

Fixes llvm#62090

Differential Revision: https://reviews.llvm.org/D149236
razmser pushed a commit to SuduIDE/llvm-project that referenced this issue Oct 3, 2023
The previous behaviour is to try to parse such files, and in some
cases assert or hang in components that don't expect these forms of
input, like TokenBuffer.

Fixes llvm#62090

Differential Revision: https://reviews.llvm.org/D149236
razmser pushed a commit to SuduIDE/llvm-project that referenced this issue Oct 3, 2023
The previous behaviour is to try to parse such files, and in some
cases assert or hang in components that don't expect these forms of
input, like TokenBuffer.

Fixes llvm#62090

Differential Revision: https://reviews.llvm.org/D149236
razmser pushed a commit to SuduIDE/llvm-project that referenced this issue Oct 6, 2023
The previous behaviour is to try to parse such files, and in some
cases assert or hang in components that don't expect these forms of
input, like TokenBuffer.

Fixes llvm#62090

Differential Revision: https://reviews.llvm.org/D149236
razmser pushed a commit to SuduIDE/llvm-project that referenced this issue Oct 11, 2023
The previous behaviour is to try to parse such files, and in some
cases assert or hang in components that don't expect these forms of
input, like TokenBuffer.

Fixes llvm#62090

Differential Revision: https://reviews.llvm.org/D149236
@HighCommander4
Copy link
Collaborator

I'm sympathetic to the use case of browsing an assembly file in an editor with features like go-to-def on #include directives. This is a feature I've seen in some other C/C++ editors (such as Eclipse CDT), and it seems to me that it would be a reasonable fit for clangd to add such a feature too. It's just not a feature it has today.

If we were to add that feature, it would involve feeding the assembly source code to an assembly parser (possibly the same one used by the actual assembler during building), not to the C/C++ parser.

It looks like someone has prototyped proper assembly support in clangd: https://discourse.llvm.org/t/prototype-assembly-support-for-clangd/74417

@EugeneZelenko EugeneZelenko added the hang Compiler hang (infinite loop) label Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clangd hang Compiler hang (infinite loop)
Projects
None yet
Development

No branches or pull requests

4 participants