Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent deduplication of string literals #60476

Closed
slonopotamus opened this issue Feb 2, 2023 · 5 comments
Closed

Inconsistent deduplication of string literals #60476

slonopotamus opened this issue Feb 2, 2023 · 5 comments
Labels
llvm:optimizations question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!

Comments

@slonopotamus
Copy link

slonopotamus commented Feb 2, 2023

Given the following code

#include <stdio.h>

static const char a[] = "bla";
static const char b[] = "bla";
static const char* const c = "bla";

int main() {
    puts(a);
    puts(b);
    puts(c);
}

and compiling it with -O1 and higher, I would expect either one or three literals in object file. One in case this falls under "String literals, and compound literals with const-qualified types, need not designate distinct objects." from the standard, and three in case it doesn't. But what in fact clang 15.0.0 produces is two literals:

main:                                   # @main
        push    rbx
        lea     rbx, [rip + .L.str]
        mov     rdi, rbx
        call    puts@PLT
        lea     rdi, [rip + _ZL1b]
        call    puts@PLT
        mov     rdi, rbx
        call    puts@PLT
        xor     eax, eax
        pop     rbx
        ret
_ZL1b:
        .asciz  "bla"

.L.str:
        .asciz  "bla"

So, clang thinks it is safe to merge a + c variables, but doesn't think it is safe to merge b to the same address. Is it misbehaving optimization or I am misunderstanding something?

Also, there's currently a discussion on GCC bugtracker going on that tries to decide whether merging of a + c is valid at all or it violates the standard. GCC doesn't merge anything in given testcase and produces three literals in output.

Godbolt playground: https://godbolt.org/z/Td67dfrEE

@slonopotamus
Copy link
Author

Also, quote from C17 standard:

Like string literals, const-qualified compound literals can be placed into read-only memory and can even be shared. (6.5.2.5 p 13).

@efriedma-quic
Copy link
Collaborator

"a" and "b" are global variables, and global variables can't be merged with each other. The initializers aren't "string literals" in the sense that's relevant here; "bla" is just a shortcut for { 'b', 'l', 'a', '\0' }.

clang interprets "need not designate distinct objects" to mean that string literals can be merged with anything, including variables.

@slonopotamus
Copy link
Author

Okay, but in what case "const-qualified compound literals can even be shared" applies then?

@efriedma-quic
Copy link
Collaborator

efriedma-quic commented Feb 10, 2023

A "const-qualified compound literal" is something like const int* x = (const int[]){1,2,3};.

@slonopotamus
Copy link
Author

Okay, closing. Thanks for explanations.

@EugeneZelenko EugeneZelenko added the question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead! label Feb 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:optimizations question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!
Projects
None yet
Development

No branches or pull requests

3 participants