Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concatenation operator ".=" does not work for &statusline with GCC 12 Release build #19125

Closed
giddie opened this issue Jun 27, 2022 · 24 comments
Closed
Labels
bug issues reporting wrong behavior build building and installing Neovim using the provided scripts
Milestone

Comments

@giddie
Copy link

giddie commented Jun 27, 2022

Neovim version (nvim -v)

NVIM v0.7.2

Vim (not Nvim) behaves the same?

no, vim 8.2.5046

Operating system/version

ArchLinux

Terminal name/version

Konsole 22.04.2 / TMux 3.3_a

$TERM environment variable

screen-256color

Installation

ArchLinux community repository

How to reproduce the issue

nvim --clean
:let &stl = "hello"
:let &stl .= " world"

Expected behavior

Status line should read: hello world

Actual behavior

Status line reads: world

@giddie giddie added the bug issues reporting wrong behavior label Jun 27, 2022
@clason
Copy link
Member

clason commented Jun 27, 2022

Works for me. (Sorry, not sure what else I can say here.)

@zeertzjq zeertzjq added bug-regression wrong behavior that was introduced in a previous commit (please bisect) bug issues reporting wrong behavior distribution packaging and distributing Nvim to users and removed bug issues reporting wrong behavior bug-regression wrong behavior that was introduced in a previous commit (please bisect) labels Jun 27, 2022
@zeertzjq
Copy link
Member

zeertzjq commented Jun 27, 2022

Seems only reproducible with Arch Linux community package, not with appimage.

@zeertzjq zeertzjq added build building and installing Neovim using the provided scripts distribution packaging and distributing Nvim to users and removed distribution packaging and distributing Nvim to users labels Jun 27, 2022
@zeertzjq
Copy link
Member

It's very strange that this bug is only present in Release build, but not in RelWithDebInfo or Debug build, so this may be a bug in GCC 12.

charmander added a commit to charmander/conf that referenced this issue Jun 28, 2022
including a fix for the latest Arch Linux version when used with <https://github.com/equalsraf/neovim-qt>, where `let &rtp.=,…` appears to have stopped working (neovim/neovim#19125).
@zeertzjq
Copy link
Member

zeertzjq commented Jun 28, 2022

Using objdump, I found a problem:

At the beginning of ex_let_one(), arg is stored in %rdi, and is copied to both %r14 and 0x28(%rsp):

  112070:	41 57                	push   %r15
  112072:	41 89 cf             	mov    %ecx,%r15d
  112075:	41 56                	push   %r14
  112077:	49 89 fe             	mov    %rdi,%r14
  11207a:	41 55                	push   %r13
  11207c:	49 89 f5             	mov    %rsi,%r13
  11207f:	41 54                	push   %r12
  112081:	4d 89 c4             	mov    %r8,%r12
  112084:	55                   	push   %rbp
  112085:	4c 89 cd             	mov    %r9,%rbp
  112088:	53                   	push   %rbx
  112089:	48 81 ec d8 00 00 00 	sub    $0xd8,%rsp
  112090:	48 89 7c 24 28       	mov    %rdi,0x28(%rsp)
  112095:	64 48 8b 04 25 28 00 	mov    %fs:0x28,%rax

find_option_end() modifies the value stored in 0x28(%rsp), while the value in %r14 is still the initial value of arg:

  1121e4:	48 8d 74 24 34       	lea    0x34(%rsp),%rsi
  1121e9:	48 8d 7c 24 28       	lea    0x28(%rsp),%rdi
  1121ee:	e8 8d f3 00 00       	call   121580 <find_option_end.lto_priv.0>

%r14 instead of 0x28(%rsp) is passed to get_option_value() here, causing get_option_value() to fail to find the option:

  1126e6:	8b 4c 24 34          	mov    0x34(%rsp),%ecx
  1126ea:	0f 84 b0 fb ff ff    	je     1122a0 <ex_let_one+0x230>
  1126f0:	4c 89 44 24 18       	mov    %r8,0x18(%rsp)
  1126f5:	48 8d 54 24 38       	lea    0x38(%rsp),%rdx
  1126fa:	4c 89 fe             	mov    %r15,%rsi
  1126fd:	4c 89 f7             	mov    %r14,%rdi
  112700:	89 4c 24 08          	mov    %ecx,0x8(%rsp)
  112704:	48 89 44 24 10       	mov    %rax,0x10(%rsp)
  112709:	67 e8 71 2f 14 00    	addr32 call 255680 <get_option_value>

Meanwhile, 0x28(%rsp) is passed correctly to set_option_value():

  112296:	8b 4c 24 34          	mov    0x34(%rsp),%ecx
  11229a:	45 31 d2             	xor    %r10d,%r10d
  11229d:	45 31 c9             	xor    %r9d,%r9d
  1122a0:	4c 89 54 24 08       	mov    %r10,0x8(%rsp)
  1122a5:	48 8b 7c 24 28       	mov    0x28(%rsp),%rdi
  1122aa:	4c 89 ca             	mov    %r9,%rdx
  1122ad:	4c 89 c6             	mov    %r8,%rsi
  1122b0:	67 e8 4a 4f 14 00    	addr32 call 257200 <set_option_value>

@zeertzjq
Copy link
Member

What's stranger is that if I pass the value of p to an external function in find_option_end(), this bug no longer happens:

diff --git a/src/nvim/eval.c b/src/nvim/eval.c
index 7b6e954b3..732837f4f 100644
--- a/src/nvim/eval.c
+++ b/src/nvim/eval.c
@@ -9547,6 +9547,7 @@ static const char *find_option_end(const char **const arg, int *const opt_flags)
     return NULL;
   }
   *arg = p;
+  (void)_(p);
 
   if (p[0] == 't' && p[1] == '_' && p[2] != NUL && p[3] != NUL) {
     p += 4;  // t_xx/termcap option

@heftig
Copy link
Contributor

heftig commented Jun 28, 2022

My setup broke because I used let &runtimepath .= and got hit with this bug.

Building with -fno-strict-aliasing helps. Maybe because char and char_u shouldn't alias?

Looks like there's been some work to replace char_u with char. Is this still broken on master?

archlinux-github pushed a commit to archlinux/svntogit-community that referenced this issue Jun 28, 2022
git-svn-id: file:///srv/repos/svn-community/svn@1239811 9fca08f4-af9d-4005-b8df-a31f2cc04f65
archlinux-github pushed a commit to archlinux/svntogit-community that referenced this issue Jun 28, 2022
git-svn-id: file:///srv/repos/svn-community/svn@1239811 9fca08f4-af9d-4005-b8df-a31f2cc04f65
@zeertzjq zeertzjq added the has:workaround issue is not fixed but can be circumvented until then label Jun 28, 2022
@zeertzjq zeertzjq changed the title Concatenation operator ".=" does not work for &statusline Concatenation operator ".=" does not work for &statusline with GCC 12 Release build Jun 28, 2022
@bfredl
Copy link
Member

bfredl commented Jun 28, 2022

char_u and char are allowed to alias. Seems like a compiler bug.

@foonathan
Copy link

I've ran into the same issue earlier today.

char_u and char are allowed to alias. Seems like a compiler bug.

Yes, char_u and char are allowed to alias, but we're casting char_u** to char**, which are unrelated pointer types.

@bfredl
Copy link
Member

bfredl commented Jun 29, 2022

we're casting char_u** to char**, which are unrelated pointer types.

Both are addresses of pointers to arbitrary bytes of memory, they are pretty much as related as two pointer types ever could be. But it seems even byte-level addressing is something that GCC wants to fuck up because some footnote to a footnote in the standard could be interpreted as allowing to fuck it up.

@heftig
Copy link
Contributor

heftig commented Jun 29, 2022

I believe type compatibility is preserved by pointers, i.e. if T and U are compatible, then T* and U* are compatible, and thus T** and U** are compatible, etc.

That said, if you don't want to worry about the standard's aliasing rules you should add the -fno-strict-aliasing to the compiler flags.

@foonathan
Copy link

foonathan commented Jun 29, 2022

I believe type compatibility is preserved by pointers, i.e. if T and U are compatible, then T* and U* are compatible, and thus T** and U** are compatible, etc.

Type compatibility isn't relevant here, strict aliasing is. The code tries to access a char_u* via a pointer to char*, which is undefined behavior: https://eel.is/c++draft/basic.lval

That said, if you don't want to worry about the standard's aliasing rules you should add the -fno-strict-aliasing to the compiler flags.

Can the flag be added to the build system by default? Because otherwise, the compiler is pretty much allowed to do that, and getting a ton of unknown option &foo errors on startup because every single let &foo = bar is broken, having to debug that, and then having to rebuild, isn't great user experience.

@heftig
Copy link
Contributor

heftig commented Jun 29, 2022

Type compatibility isn't relevant here, strict aliasing is. The code tries to access a char_u* via a pointer to char*, which is undefined behavior: https://eel.is/c++draft/basic.lval

That's C++, not C. The languages have very different rules. See https://gist.github.com/shafik/848ae25ee209f698763cffee272a58f8 for a pretty good overview.

@foonathan
Copy link

Ahh, it's a C source file, my bad.

I believe type compatibility is preserved by pointers, i.e. if T and U are compatible, then T* and U* are compatible, and thus T** and U** are compatible, etc.

Then this applies, but char and unsigned char are not compatible types (C11 standard, section 6.2.7 - it's copyrighted, so I can only link to cppreference: https://en.cppreference.com/w/c/language/type#Compatible_types).

@heftig
Copy link
Contributor

heftig commented Jun 29, 2022

Look for document N2176; it's the final draft of the C17 standard and publicly available.

And yeah, you're right. The types aren't compatible.

@justinmk
Copy link
Member

justinmk commented Jun 29, 2022

So it looks like this is a plan:

  1. set -fno-strict-aliasing for now, to fix the build (backport to 0.7.x)
  2. in order to remove -fno-strict-aliasing, we need these changes:
    • anywhere we are casting to (char **), change the char_u** source to char** and remove the cast
    • anywhere we are casting to (char_u **), change the char_u** target to char** and remove the cast

Added wiki page: https://github.com/neovim/neovim/wiki/Removing-char_u

@justinmk
Copy link
Member

justinmk commented Jun 29, 2022

Look for document N2176; it's the final draft of the C17 standard and publicly available

idk about N2176, but here's an older version: http://port70.net/~nsz/c/c99/n1256.html#6.5p7

An object shall have its stored value accessed only by an lvalue expression that has one of the following types
...
a type that is the signed or unsigned type corresponding to the effective type of the object,

That seems to allow char_u <=> char, no?

Though this part is interesting:

Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression.[72)](http://port70.net/~nsz/c/c99/n1256.html#note72) Furthermore, the prior value shall be read only to determine the value to be stored.[73)](http://port70.net/~nsz/c/c99/n1256.html#note73)

This renders undefined statement expressions such as

            i = ++i + 1;
            a[i++] = i;
while allowing
            i = i + 1;
            a[i] = i;

@heftig
Copy link
Contributor

heftig commented Jun 29, 2022

That seems to allow char_u <=> char, no?

Yes, but only directly: You can access a char via a char_u*. Accessing a char* via a char_u** remains undefined.

char* is a pointer and has no "corresponding unsigned type".

@justinmk
Copy link
Member

justinmk commented Jun 29, 2022

Yes, but only directly: You can access a char via a char_u*. Accessing a char* via a char_u** remains undefined.

Ah, undefined here literally means "they don't address it at all in the standard"? That's hilarious. So the way to figure out how C works is to comb through every sentence in the standard and then decide if the exact thing you want to do was or wasn't explicitly mentioned.

@bfredl
Copy link
Member

bfredl commented Jun 29, 2022

Yes, anything which is not specified in exact detail in the standard in enough lawyer-proof language can be reinterpreted in any new compiler version to screw over working code that more or less universally was accepted by all existing compilers.

@foonathan
Copy link

An object shall have its stored value accessed only by an lvalue expression that has one of the following types
...
a type that is the signed or unsigned type corresponding to the effective type of the object,

That seems to allow char_u <=> char, no?

char_u <=> char is allowed, but only because char_u can alias anything, not because of the quoted line. The signed type corresponding to unsigned char isn't char, it's signed char. char, signed char, and unsigned char are three separate types (but int and signed int is the same type). C's weird.

@giddie

This comment was marked as off-topic.

@justinmk
Copy link
Member

@foonathan thanks for that note, added to https://github.com/neovim/neovim/wiki/Removing-char_u

@zeertzjq zeertzjq added this to the 0.7.3 milestone Jun 30, 2022
@justinmk
Copy link
Member

Is this still an issue on latest nightly?

@zeertzjq
Copy link
Member

IIRC it has been fixed by a PR removing char_u.

@zeertzjq zeertzjq modified the milestones: 0.7.3, 0.8 Sep 5, 2022
@zeertzjq zeertzjq removed the has:workaround issue is not fixed but can be circumvented until then label Aug 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug issues reporting wrong behavior build building and installing Neovim using the provided scripts
Projects
None yet
Development

No branches or pull requests

7 participants