Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

const char* string literal propagation and const string merge inconsistencies #898

Closed
smolt opened this issue Apr 14, 2015 · 5 comments
Closed

Comments

@smolt
Copy link
Member

smolt commented Apr 14, 2015

The DMD front end code used by LDC propagates const char* string literals such that identical strings must be merged or a program can exhibit pathological behavior. A fix for LDC is being prepared with pull #896 which should close this issue, but it is still good to document the details of the problem.

The problem manifests as two pointers failing to be equal when they should really be equal. In the dmd-testsuite, this causes runnable/externmangle.cpp to fail on OS X.

In the following snippet, pointers s and p are initialized to point to the same address:

import core.stdc.stdio;
void main()
{
    const char* s = "hi";
    const(char)* p = s;
    printf("%p %p\n", s, p);
    assert(s == p);
}

DMD provides the correct outcome.

$ rdmd charptrprob.d
0x1032bb200 0x1032bb200

Compiling with ldc2 0.15.1 has printf unexpectedly showing different addresses and the assertion fails.

$ ldc2 --run charptrprob.d 
0x10a981e7d 0x10a981e73
core.exception.AssertError@charptrprob.d(7): Assertion failure

Using the optimizer (ldc2 -O) gives another result: printf shows identical addresses but the assertion still fails.

$ ldc2 -O --run charptrprob.d 
0x10bcf4ec7 0x10bcf4ec7
core.exception.AssertError@charptrprob.d(7): Assertion failure

Finally, to make it more interesting, changing const(char)* p = s to const char* p = s and trying again does something else. Now printf shows different addresses but the assertion passes saying they are identical.

$ ldc2 --run charptrprob.d 
0x10bc7ae7d 0x10bc7ae80

What is going on here? Three different outcomes for code behavior that should be defined. And none of these outcomes are correct.

It turns out that the D front end code shared by DMD and LDC expands constants to their initializer. In the sample code, s is a constant, so expands to literal "hi" wherever it is used. This happens with or without optimization enabled, essentially becoming:

import core.stdc.stdio;
void main()
{
    const char* s = "hi";
    const(char)* p = "hi";
    printf("%p %p\n", "hi", p);
    assert("hi" == p);
}

This is very different from the original code. If this were C (and I assume D follows the same rules), each occurrence of "hi" may or may not be at the same address; it's implementation dependent. But this is just an internal step in compilation and to make it behave as the original code, all the "hi" string literals must be one and the same (EQ). LDC does some merging of string literals but only when optimization is enabled and not all cases are covered. This is why the inconsistent results occur.

Even DMD can be inconsistent here. It's backend string cache only holds 16 strings, so code can be crafted that produces inconsistent results too.

@dnadlinger
Copy link
Member

Thanks for the excellent analysis (!). Did you file an upstream bug yet? Just linking to here and your earlier crafted DMD test case should be enough.

@kinke
Copy link
Member

kinke commented Apr 16, 2015

I'm experiencing a related issue on Linux x64 with merge-2.067, concerning runnable/interpret.d (test 109).

In this snippet:

struct Test109S { this(int){ this.s = &this; } Test109S* s; }
const t109s = new Test109S(0);

void main()
{
    import core.stdc.stdio;
    printf("t109s = %p, t109s.s = %p\n", t109s, t109s.s);
    assert(t109s.s is t109s);
}

t109s is expanded 4 times as struct literal instead of folding to the single global:

...
@.structliteral1 = internal global { %test109.Test109S* } { %test109.Test109S* bitcast ({ %test109.Test109S* }* @.structliteral1 to %test109.Test109S*) }, align 8 ; [#uses = 2 type = { %test109.Test109S* }*]
@_D7test1095t109sxPS7test1098Test109S = constant { %test109.Test109S* }* @.structliteral1 ; [#uses = 1 type = { %test109.Test109S* }**]
...
define i32 @_Dmain({ i64, { i64, i8* }* } %unnamed) #0 {
  %.structliteral = alloca %test109.Test109S      ; [#uses = 3 type = %test109.Test109S*]
  %.structliteral1 = alloca %test109.Test109S     ; [#uses = 3 type = %test109.Test109S*]
  %.structliteral2 = alloca %test109.Test109S     ; [#uses = 3 type = %test109.Test109S*]
  %.structliteral3 = alloca %test109.Test109S     ; [#uses = 3 type = %test109.Test109S*]
  %1 = getelementptr %test109.Test109S* %.structliteral, i32 0, i32 0 ; [#uses = 2 type = %test109.Test109S**]
  store %test109.Test109S* %.structliteral, %test109.Test109S** %1
  %2 = load %test109.Test109S** %1                ; [#uses = 0 type = %test109.Test109S*]
  %3 = getelementptr %test109.Test109S* %.structliteral1, i32 0, i32 0 ; [#uses = 2 type = %test109.Test109S**]
  store %test109.Test109S* %.structliteral1, %test109.Test109S** %3
  %4 = load %test109.Test109S** %3                ; [#uses = 0 type = %test109.Test109S*]
  %tmp = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([26 x i8]* @.str2, i32 0, i32 0), %test109.Test109S* %.structliteral, %test109.Test109S* %.structliteral1) ; [#uses = 0 type = i32]
  %5 = getelementptr %test109.Test109S* %.structliteral2, i32 0, i32 0 ; [#uses = 2 type = %test109.Test109S**]
  store %test109.Test109S* %.structliteral2, %test109.Test109S** %5
  %6 = load %test109.Test109S** %5                ; [#uses = 0 type = %test109.Test109S*]
  %7 = getelementptr %test109.Test109S* %.structliteral3, i32 0, i32 0 ; [#uses = 2 type = %test109.Test109S**]
  store %test109.Test109S* %.structliteral3, %test109.Test109S** %7
  %8 = load %test109.Test109S** %7                ; [#uses = 0 type = %test109.Test109S*]
  %9 = icmp eq %test109.Test109S* %.structliteral2, %.structliteral3 ; [#uses = 1 type = i1]
  br i1 %9, label %noassert, label %assert
...

LDC 0.15.2:

...
@.structliteral2 = internal global { %test109.Test109S* } { %test109.Test109S* bitcast ({ %test109.Test109S* }* @.structliteral2 to %test109.Test109S*) }, align 8
@.structliteral3 = internal global { { %test109.Test109S* }* } { { %test109.Test109S* }* @.structliteral2 }, align 8
@_D7test1095t109sxPS7test1098Test109S = constant { { %test109.Test109S* }* }* @.structliteral3
...
define i32 @_Dmain({ i64, { i64, i8* }* } %unnamed) #0 {
  %tmp = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([26 x i8]* @.str4, i32 0, i32 0), %test109.Test109S* bitcast ({ %test109.Test109S* }* @.structliteral2 to %test109.Test109S*), %test109.Test109S* bitcast ({ %test109.Test109S* }* @.structliteral2 to %test109.Test109S*))
  br i1 true, label %noassert, label %assert
...

@smolt
Copy link
Member Author

smolt commented Apr 17, 2015

@smolt
Copy link
Member Author

smolt commented Jun 22, 2016

I think this can be closed after adding a test. @kinke, is comment above still valid? If so, can it become its own issue?

@dnadlinger
Copy link
Member

dnadlinger commented Jun 22, 2016

The upstream dmd-testsuite includes a test case for this now, so we should be fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants