Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using clang as CPP makes unexpected indentation changes #78778

Open
trini opened this issue Jan 19, 2024 · 10 comments
Open

Using clang as CPP makes unexpected indentation changes #78778

trini opened this issue Jan 19, 2024 · 10 comments
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema"

Comments

@trini
Copy link

trini commented Jan 19, 2024

For background, over in U-Boot we use the same "Kconfig" language as the Linux kernel, and so have a config file for a build that is defined with CONFIG_FOO=y and # CONFIG_BAR is not set. What we're doing now is allowing for a config file to #include another valid config file fragment, and using $(CC) -E to handle the include directive.

When using GCC, this works as expected and nothing about the files is changed. However, with clang-16 # CONFIG_FOO is not set becomes # CONFIG_FOO is not set, which is invalid for the Kconfig language. I can pass in -traditional-cpp as another flag and in some cases this will keep the expected behavior, but in others, it does not.

I've put an example of what I'm doing at: https://gist.github.com/trini/a9fec05a1010611f287661e58b17cbd0 which shows the trivial case. Using the starfive_visionfive2_defconfig file as non-trivial testcase shows that even with -traditional-cpp we still get some whitespace changes.

@github-actions github-actions bot added the clang Clang issues not falling into any other category label Jan 19, 2024
@EugeneZelenko EugeneZelenko added clang:frontend Language frontend issues, e.g. anything involving "Sema" and removed clang Clang issues not falling into any other category labels Jan 19, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 19, 2024

@llvm/issue-subscribers-clang-frontend

Author: Tom Rini (trini)

For background, over in [U-Boot](https://docs.u-boot.org/) we use the same "Kconfig" language as the Linux kernel, and so have a config file for a build that is defined with `CONFIG_FOO=y` and `# CONFIG_BAR is not set`. What we're doing now is allowing for a config file to `#include` another valid config file fragment, and using `$(CC) -E` to handle the include directive.

When using GCC, this works as expected and nothing about the files is changed. However, with clang-16 # CONFIG_FOO is not set becomes # CONFIG_FOO is not set, which is invalid for the Kconfig language. I can pass in -traditional-cpp as another flag and in some cases this will keep the expected behavior, but in others, it does not.

I've put an example of what I'm doing at: https://gist.github.com/trini/a9fec05a1010611f287661e58b17cbd0 which shows the trivial case. Using the starfive_visionfive2_defconfig file as non-trivial testcase shows that even with -traditional-cpp we still get some whitespace changes.

@shafik
Copy link
Collaborator

shafik commented Jan 19, 2024

CC @cor3ntin @tahonermann

@tahonermann
Copy link
Contributor

Hi @trini. I was able to reproduce the reported issue.

I spent a few minutes looking at this, but was unable to quickly identify where the leading space character is coming from. Any chance that you are able to investigate further? Alternatively, you might consider use of a sed script or similar to discard unwanted whitespace.

@trini
Copy link
Author

trini commented Jan 23, 2024

Sorry @tahonermann but I have no idea where to start on the llvm side of the issue. I can work around the issue, yes, but I do worry about other cases where cpp is (ab)used and white space matters.

@trini
Copy link
Author

trini commented Feb 15, 2024

Having just hit a much larger example of the problem, I'm just checking in to see if looking further in to this is on anyones radar, it sounds like it's unfortunately not clear where the problem might be and so possibly a difficult fix? Thanks.

@AaronBallman
Copy link
Collaborator

The leading whitespace is added by this block:

if (ColNo <= 1 && Tok.is(tok::hash))

It looks like that hack may need to be improved.

@tahonermann
Copy link
Contributor

That hack goes back to at least 2006. Git history stops at 22eb972.

@trini
Copy link
Author

trini commented Feb 15, 2024

Yeah, I'm not saying this is a recent regression in llvm, to be clear. A long-standing but just now problematic for someone situation is reasonable.

@tahonermann
Copy link
Contributor

For ease of reference, here is the hack Aaron linked to:

clang/lib/Frontend/PrintPreprocessedOutput.cpp:
 703       // This hack prevents stuff like:
 704       // #define HASH #
 705       // HASH define foo bar
 706       // From having the # character end up at column 1, which makes it so it
 707       // is not handled as a #define next time through the preprocessor if in
 708       // -fpreprocessed mode.
 709       if (ColNo <= 1 && Tok.is(tok::hash))
 710         *OS << ' ';

I played around a bit and discovered the following interesting things.

  1. Clang doesn't support a -fpreprocessed option as either a driver or a -cc1 option.
  2. Gcc does support a -fpreprocessed option and its behavior is sensitive to the hack.
  3. Clang expands macros when compiling .i files or when invoked with -x cpp-output; gcc does not.

The following demonstrates how gcc's behavior changes with the hack in place. Note that -fpreprocessed is implied when compiling a .i, .ii, or .mi file and -x isn't used.

$ cat t.i
# define X foo
int i = X;
 # define Y bar
int j = Y;

$ gcc -c t.i
t.i:2:9: error: ‘X’ undeclared here (not in a function)
    2 | int i = X;
      |         ^
t.i:3:2: error: stray ‘#’ in program
    3 |  # define Y bar
      |  ^
t.i:3:4: error: unknown type name ‘define’
    3 |  # define Y bar
      |    ^~~~~~
t.i:3:13: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘bar’
    3 |  # define Y bar
      |             ^~~

The first define directive is ignored and X is not expanded. The second define directive is not ignored, but is not treated as a directive either.

In contrast, Clang processes both define directives as directives and performs macro expansion.

$ clang -c t.i
t.i:2:9: error: use of undeclared identifier 'foo'
    2 | int i = X;
      |         ^
t.i:1:12: note: expanded from macro 'X'
    1 | # define X foo
      |            ^
t.i:4:9: error: use of undeclared identifier 'bar'
    4 | int j = Y;
      |         ^
t.i:3:13: note: expanded from macro 'Y'
    3 |  # define Y bar
      |             ^
2 errors generated.

My conclusions:

  1. This hack appears to date back to a time when someone (presumably Chris Lattner) was using Clang to generate preprocessed output that was then consumed by gcc. Perhaps this was part of very early Clang development when the preprocessor was being developed and fed to gcc to validate compilation.
  2. The hack is probably no longer needed.
  3. Clang should probably emulate gcc's -fpreprocessed behavior by ignoring preprocessor directives and suppressing macro expansion when processing files named .i, .ii, or .mi or when invoked with -x cpp-output or similar.

@tahonermann
Copy link
Contributor

I locally removed the hack and ran the Clang testsuite. The following tests failed:

  • clang/test/Preprocessor/hash_space.c
  • clang/test/Preprocessor/print-assembler.s

The hash_space.c test exists specifically to test this hack, so removal is appropriate.

The print-assembler.s test is more interesting. It was introduced by commit 400a64e. This needs more investigation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema"
Projects
None yet
Development

No branches or pull requests

6 participants