Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize static array comparisons to a memcmp call for types for which this is valid. #1719

Merged
merged 2 commits into from
Mar 1, 2017

Conversation

JohanEngelen
Copy link
Member

Resolves #1632.

@JohanEngelen JohanEngelen mentioned this pull request Aug 25, 2016
gen/arrays.cpp Outdated
if (ltype->ty != Tsarray)
return false;

auto *elemType = ltype->nextOf()->toBasetype();
Copy link
Member

@kinke kinke Aug 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a function somewhere (I don't recall its name right now) which descends to the first non-static-array element type. By using that, you can get rid of the recursion in the function above.

You also need to check for a compatible rhs type. This is valid but obviously not suited for memcmp (and should be part of a test):

int[3] ia = [ 1, 2, 3 ];
short[3] sa = [ 1, 2, 3 ];
assert(ia == sa);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, didn't know that.
But the recursion isn't so bad, I think? validCompareWithMemcmpType will become recursive again when someone implements the logic for Tstruct, and then the Tsarray logic is also needed there.

Copy link
Member Author

@JohanEngelen JohanEngelen Aug 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot about the testcase, bad assumption on my part. I'll just check that both lhs and rhs types are exactly the same then?

Edit: interestingly, the memcmp call is not emitted for int[3] == short[3]. I wasn't expecting that lol. So it already works, but testcase needs to be added certainly. Gotta go now.

Copy link
Member

@kinke kinke Aug 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, the recursion may really be needed for structs later on.

I'll just check that both lhs and rhs types are exactly the same then?

I always find this a bit tricky. 'Exactly the same' includes const/immutable modifiers, which don't matter here. There's a stripModifiers() function or so (which already offers recursion, which we need here), but I find that one a bit tedious to use. An idea might be checking the LLVM types: DtoMemType(l->type) == DtoMemType(r->type) (e.g., this would work for char[] == byte[] and even char[] == bool[], both are allowed by the front-end, and work for integer signedness-mismatches too).

Thanks for doing this, people will appreciate it I'm sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For when the types are different (int[3] == short[3] but also byte[3] == char[3]), the front-end lowers the code into a call to (e.g.) object._ArrayEq!(byte, char)._ArrayEq(byte[], char[]). So optimizing those will require much more and different work.
I will add a check that the types have to be equivalent (ignoring constness), just in case.

@JohanEngelen
Copy link
Member Author

tests\debuginfo\codeview.d fails on Windows, but I don't see how it is related to this PR (no array comparisons happening in codeview.d afaict). @rainers Do you know what's going on? Thanks.

@rainers
Copy link
Contributor

rainers commented Sep 1, 2016

@rainers Do you know what's going on? Thanks.

Sorry for the trouble. I suspect it is again displaying a different definition of "string" from another DLL or static library. Making the symbol search case sensitive might help. You can switch this by adding a line // CDB: .symopt-1 before // CDB: dt string in codeview.d.

gen/arrays.cpp Outdated
bool validCompareWithMemcmp(DValue *l, DValue *r) {
auto *ltype = l->type->toBasetype();

// Only static arrays are potentially compared using memcmp.
Copy link
Member

@dnadlinger dnadlinger Sep 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not dynamic arrays? It seems like all this would require is an extra icmp for the length members.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe a fast-return if the pointers match too, to optimize checks against the same memory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All left for future work. It just adds complexity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is not very useful, since it exactly replicates the content of the code itself, yet leaves the question as to why unanswered.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've improved the comments.

@rainers
Copy link
Contributor

rainers commented Sep 2, 2016

You can switch this by adding a line // CDB: .symopt-1 before // CDB: dt string in codeview.d.

Also improved symbol loading in #1731

@kinke
Copy link
Member

kinke commented Sep 2, 2016

AppVeyor jobs retriggered.

@kinke
Copy link
Member

kinke commented Sep 2, 2016

Wow, something bad is happening here. In master, tests/debuginfo/codeview.d takes 1:43 mins for x64 and 0:23 for x86 (when hovering over an AppVeyor log line, a tooltip pops up with a timestamp).

With this PR, the x86 test takes a massive 6:21, and the x64 job times out after more than 26 minutes for that single test! Pinging @rainers.

@rainers
Copy link
Contributor

rainers commented Sep 2, 2016

Pinging @rainers.

I suspect cdb is trying to load symbols from the MS symbol servers. Passing -snul (disables automatic symbol loading for unqualified names) to the cdb command line might help.

@JohanEngelen
Copy link
Member Author

Now the Windows unittest fails on a uuid.d assert, line 944. I've looked at the IR (Mac and Windows), but for that particular comparison, memcmp is not used (it is a dyn array comparison).
Anyone know what's going on?

@kinke
Copy link
Member

kinke commented Sep 4, 2016

AppVeyor is very strange today. For your x64 run:

1/1 Test #957: lit-tests ........................   Passed  1230.67 sec

tests/codegen/inlining_templates.d takes 17 minutes. For my LLVM 3.9 final PR, it even took almost 25 minutes. So I'm not sure AppVeyor can be trusted today...

Edit: Sorry, this has nothing to do with your issue. It's the debuginfo tests with cdb that take forever (unfortunately, on my box too).

@dnadlinger
Copy link
Member

On your box too? Should be easy to fix then? Otherwise, I'd say we back out the tests for now.

@rainers
Copy link
Contributor

rainers commented Sep 4, 2016

tests/codegen/inlining_templates.d takes 17 minutes

I suspect that it is again the next test that takes so long: codeview.d ;-( cvbasictypes.d also too more than 3 minutes.

@rainers
Copy link
Contributor

rainers commented Sep 4, 2016

It's the debuginfo tests with cdb that take forever (unfortunately, on my box too).

Can you run them without redirecting the output and see what it is doing? The only issue I've seen as a cause for slowdoen is loading symbols from the symbol servers.

@dnadlinger
Copy link
Member

dnadlinger commented Sep 4, 2016

The only issue I've seen as a cause for slowdoen is loading symbols from the symbol servers.

But that should be disabled in master now?

@rainers
Copy link
Contributor

rainers commented Sep 4, 2016

But that should be disabled in master now?

Yes, that's what I thought. With cdb from the Windows 10 SDK, I also see a pause (less than a minute) when loading the symbols of the executable explicitely. That doesn't happen for cdb from the Windows 8.1 SDK.

@dnadlinger
Copy link
Member

With cdb from the Windows 10 SDK, I also see a pause (less than a minute) when loading the symbols of the executable explicitely.

Do you mean, due to network traffic? 30s or whatever it is still seems awfully long for loading symbols for an executable.

@rainers
Copy link
Contributor

rainers commented Sep 4, 2016

Do you mean, due to network traffic? 30s or whatever it is still seems awfully long for loading symbols for an executable.

Yes. It is looking for codeview.exe`s symbols on the MS symbol servers, too. I'm trying to disable that...

@rainers
Copy link
Contributor

rainers commented Sep 4, 2016

I hope this helps: #1743

@kinke
Copy link
Member

kinke commented Sep 5, 2016

Anyone know what's going on?

Nope. I can reproduce it here though, it starts with -O2. It works fine if the unittest is isolated in its own module. But a printf in the original source right before the assertion reveals that str (the runtime array) is all set to nulls...

str = 0x000000C2199FF4E0 (00000000-0000-0000-0000-000000000000)
s   = 0x00007FF7276097B0 (8ab3060e-2cba-4f23-b74c-b52db3bdfb46)

@kinke
Copy link
Member

kinke commented Sep 5, 2016

Well, it's not all nulls, the dashes are part of the string. This is more complete:

// printf("id = 0x%p 0x%p\n", id.ulongs[0], id.ulongs[1]);
id = 0x234FBA2C0E06B38A 0x46FBBDB32DB54CB7
// printf("str = 0x%p (%.36s)\ns   = 0x%p (%.36s)\n", str.ptr, str.ptr, s.ptr, s.ptr);
str = 0x000000C43AAFF710 (00000000-0000-0000-0000-000000000000)
s   = 0x00007FF60AA297B0 (8ab3060e-2cba-4f23-b74c-b52db3bdfb46)

So it looks as if the this pointer in id.toString(str[]) (id is an enum struct std.uuid.UUID) could point to something else with all zeros.

@kinke
Copy link
Member

kinke commented Sep 5, 2016

Well, this could very well be another symptom of #1324, as we have a compile-time instance of a struct with a union again.

@JohanEngelen
Copy link
Member Author

JohanEngelen commented Sep 6, 2016

I don't know what's going on here :(
I can reproduce it too on Windows, but with very slight changes, the bug is gone. Definitely something with unions+CTFE. I can fix Phobos so the bug is gone but... :'( :'(
It is also very strange that it depends on the LLVM optimization level.

Edit: "fix Phobos" would mean changing "enum" to "auto" in the unittest, basically going from ctfe to runtime, hiding the union bug.

@kinke
Copy link
Member

kinke commented Oct 25, 2016

AppVeyor retriggered after merging #1846 and also verified locally - no change unfortunately, std.uuid:944 still asserts starting with -O2 on Win64. :(

@JohanEngelen
Copy link
Member Author

:(

@dnadlinger
Copy link
Member

@UplinkCoder: Sure. I might have implemented the struct/class reference lowering in LDC. My point is that the lack of identity is supposed to be non-observable at runtime. If it is (e.g. if the data is mutable, like in issue 15989), it's mainly an accepts-invalid DMD bug.

@UplinkCoder
Copy link
Contributor

@klickverbot You know certainly alot more about ldc internals then I do (I know nothing about them).
@JohanEngelen If this were an issue with ctfe it should not be affected by the optimization level.
Maybe the optimizer assumes some things that are not true ?

@kinke
Copy link
Member

kinke commented Feb 10, 2017

The unoptimized IR (I've used the merge-2.072 branch, this PR not merged in) looks absolutely fine:

import std.uuid;

void main()
{
    import std.encoding : Char = AsciiChar;
    enum  utfstr = "8ab3060e-2cba-4f23-b74c-b52db3bdfb46";
    alias String = immutable(Char)[];
    enum String s = cast(String)utfstr;
    enum id = UUID(utfstr);
    Char[36] str;
    id.toString(str[]);
    assert(str == s);
}
%std.uuid.UUID = type { [16 x i8] }

; data for `enum UUID id`
@.arrayliteral = internal unnamed_addr constant [16 x i8] c"\8A\B3\06\0E,\BAO#\B7L\B5-\B3\BD\FBF" ; [#uses = 1]
; `enum String s`, used in the comparison
@.str = private unnamed_addr constant [37 x i8] c"8ab3060e-2cba-4f23-b74c-b52db3bdfb46\00" ; [#uses = 1]

  %str = alloca [36 x i8], align 1                ; [#uses = 4, size/byte = 36]
  ; enum UUID id
  %.structliteral = alloca %std.uuid.UUID, align 8 ; [#uses = 2, size/byte = 16]
  ; zero-initialize `str`
  %1 = bitcast [36 x i8]* %str to i8*             ; [#uses = 1]
  call void @llvm.memset.p0i8.i64(i8* %1, i8 0, i64 36, i32 1, i1 false)
  ; [...]
  ; initialize single `id` field via memcpy from @.arrayliteral
  %4 = getelementptr inbounds %std.uuid.UUID, %std.uuid.UUID* %.structliteral, i32 0, i32 0 ; [#uses = 1, type = [16 x i8]*]
  %5 = bitcast [16 x i8]* %4 to i8*               ; [#uses = 1]
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %5, i8* getelementptr inbounds ([16 x i8], [16 x i8]* @.arrayliteral, i32 0, i32 0), i64 16, i32 1, i1 false)
  ; str[]
  %6 = bitcast [36 x i8]* %str to i8*             ; [#uses = 1]
  %7 = insertvalue { i64, i8* } { i64 36, i8* undef }, i8* %6, 1 ; [#uses = 1]
  ; id.toString(str[])
  call void @_D3std4uuid4UUID39__T8toStringTAE3std8encoding9AsciiCharZ8toStringMxFNaNbNiNfMAE3std8encoding9AsciiCharZv(%std.uuid.UUID* nonnull %.structliteral, { i64, i8* } %7) #0
  ; str[]
  %8 = bitcast [36 x i8]* %str to i8*             ; [#uses = 1]
  %9 = insertvalue { i64, i8* } { i64 36, i8* undef }, i8* %8, 1 ; [#uses = 1]
  ; str == s
  %10 = call i32 @_adEq2({ i64, i8* } %9, { i64, i8* } { i64 36, i8* getelementptr inbounds ([37 x i8], [37 x i8]* @.str, i32 0, i32 0) }, %object.TypeInfo* bitcast (%"typeid(AsciiChar[])"* @_D34TypeInfo_AE3std8encoding9AsciiChar6__initZ to %object.TypeInfo*)) #2 ; [#uses = 1]

My results back then via printf clearly indicated that the run-time data of id was garbage/zero-initialized. The this pointer of id (the %.structliteral alloca) is fine, and it's initialized. I don't think the LLVM optimizer chokes on something as simple as this. So it's very likely that the problem lies somewhere in the front-end when constructing enum id = UUID(utfstr); at compile-time and ending up with the (parsed) 16 bytes of UUID runtime data (@.arrayliteral in LLVM IR).

@UplinkCoder: Please have a look at the interesting comment in https://github.com/dlang/phobos/blob/master/std/uuid.d#L353.

@kinke
Copy link
Member

kinke commented Feb 16, 2017

For the record: I'm absolutely eager to merge this and am in favor of just using static immutable id = UUID(utfstr); for LDC to work around this apparently invalid or at least problematic unittest.

@kinke
Copy link
Member

kinke commented Feb 17, 2017

This PR doesn't affect the problematic unittest itself IR-wise; two slices are compared, and this PR only supports static arrays so far.

@JohanEngelen
Copy link
Member Author

Shall I modify Phobos and merge this?
(and open a new issue with the reduced problematic testcase from this PR, so we can investigate there what is going on?)

@dnadlinger
Copy link
Member

Seems reasonable, although introducing a known miscompilation leaves a bit of a stale taste… I don't have time to look into the issue any more closely right now, though.

@JohanEngelen
Copy link
Member Author

Yeah... this is a strange bug. I'm not sure whether it is a miscompile or a unittest bug. My current feeling is that it is a unittest bug, exposed by aggressive optimization enabled by this PR.

@dnadlinger
Copy link
Member

Yep… I'd say let's merge it and keep a close eye on it throughout the 1.2 beta phase.

@JohanEngelen
Copy link
Member Author

green after phobos modification.

@kinke
Copy link
Member

kinke commented Feb 22, 2017

Stefan suggested a static immutable instead of auto, did you try that?

@JohanEngelen
Copy link
Member Author

With static immutable we don't have to change the nothrow @nogc. So if that works, great. Updated the PR.

@JohanEngelen
Copy link
Member Author

Travis is not retriggering :(

@JohanEngelen JohanEngelen reopened this Mar 1, 2017
@kinke
Copy link
Member

kinke commented Mar 1, 2017

Yay, finally. ;)

I don't think it'll make a huge impact in the current form. As soon as slices are supported (should be straight foward), that will definitely change, and I actually expect noticeable performance improvements for client code (incl. synthetic benchmarks, where LDC and D in general could even shine a bit more). And we should work on supporting suited structs soon too.

@kinke kinke merged commit 33f226e into ldc-developers:master Mar 1, 2017
@JohanEngelen JohanEngelen deleted the gh1632 branch March 1, 2017 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants