-
-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize static array comparisons to a memcmp call for types for which this is valid. #1719
Conversation
gen/arrays.cpp
Outdated
| if (ltype->ty != Tsarray) | ||
| return false; | ||
|
|
||
| auto *elemType = ltype->nextOf()->toBasetype(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a function somewhere (I don't recall its name right now) which descends to the first non-static-array element type. By using that, you can get rid of the recursion in the function above.
You also need to check for a compatible rhs type. This is valid but obviously not suited for memcmp (and should be part of a test):
int[3] ia = [ 1, 2, 3 ];
short[3] sa = [ 1, 2, 3 ];
assert(ia == sa);There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, didn't know that.
But the recursion isn't so bad, I think? validCompareWithMemcmpType will become recursive again when someone implements the logic for Tstruct, and then the Tsarray logic is also needed there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot about the testcase, bad assumption on my part. I'll just check that both lhs and rhs types are exactly the same then?
Edit: interestingly, the memcmp call is not emitted for int[3] == short[3]. I wasn't expecting that lol. So it already works, but testcase needs to be added certainly. Gotta go now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, the recursion may really be needed for structs later on.
I'll just check that both lhs and rhs types are exactly the same then?
I always find this a bit tricky. 'Exactly the same' includes const/immutable modifiers, which don't matter here. There's a stripModifiers() function or so (which already offers recursion, which we need here), but I find that one a bit tedious to use. An idea might be checking the LLVM types: DtoMemType(l->type) == DtoMemType(r->type) (e.g., this would work for char[] == byte[] and even char[] == bool[], both are allowed by the front-end, and work for integer signedness-mismatches too).
Thanks for doing this, people will appreciate it I'm sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For when the types are different (int[3] == short[3] but also byte[3] == char[3]), the front-end lowers the code into a call to (e.g.) object._ArrayEq!(byte, char)._ArrayEq(byte[], char[]). So optimizing those will require much more and different work.
I will add a check that the types have to be equivalent (ignoring constness), just in case.
|
|
Sorry for the trouble. I suspect it is again displaying a different definition of "string" from another DLL or static library. Making the symbol search case sensitive might help. You can switch this by adding a line |
gen/arrays.cpp
Outdated
| bool validCompareWithMemcmp(DValue *l, DValue *r) { | ||
| auto *ltype = l->type->toBasetype(); | ||
|
|
||
| // Only static arrays are potentially compared using memcmp. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not dynamic arrays? It seems like all this would require is an extra icmp for the length members.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And maybe a fast-return if the pointers match too, to optimize checks against the same memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All left for future work. It just adds complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is not very useful, since it exactly replicates the content of the code itself, yet leaves the question as to why unanswered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've improved the comments.
Also improved symbol loading in #1731 |
|
AppVeyor jobs retriggered. |
|
Wow, something bad is happening here. In master, With this PR, the x86 test takes a massive 6:21, and the x64 job times out after more than 26 minutes for that single test! Pinging @rainers. |
I suspect cdb is trying to load symbols from the MS symbol servers. Passing |
|
Now the Windows unittest fails on a uuid.d assert, line 944. I've looked at the IR (Mac and Windows), but for that particular comparison, memcmp is not used (it is a dyn array comparison). |
|
AppVeyor is very strange today. For your x64 run:
Edit: Sorry, this has nothing to do with your issue. It's the debuginfo tests with cdb that take forever (unfortunately, on my box too). |
|
On your box too? Should be easy to fix then? Otherwise, I'd say we back out the tests for now. |
I suspect that it is again the next test that takes so long: codeview.d ;-( cvbasictypes.d also too more than 3 minutes. |
Can you run them without redirecting the output and see what it is doing? The only issue I've seen as a cause for slowdoen is loading symbols from the symbol servers. |
But that should be disabled in master now? |
Yes, that's what I thought. With cdb from the Windows 10 SDK, I also see a pause (less than a minute) when loading the symbols of the executable explicitely. That doesn't happen for cdb from the Windows 8.1 SDK. |
Do you mean, due to network traffic? 30s or whatever it is still seems awfully long for loading symbols for an executable. |
Yes. It is looking for codeview.exe`s symbols on the MS symbol servers, too. I'm trying to disable that... |
|
I hope this helps: #1743 |
Nope. I can reproduce it here though, it starts with |
|
Well, it's not all nulls, the dashes are part of the string. This is more complete: // printf("id = 0x%p 0x%p\n", id.ulongs[0], id.ulongs[1]);
id = 0x234FBA2C0E06B38A 0x46FBBDB32DB54CB7
// printf("str = 0x%p (%.36s)\ns = 0x%p (%.36s)\n", str.ptr, str.ptr, s.ptr, s.ptr);
str = 0x000000C43AAFF710 (00000000-0000-0000-0000-000000000000)
s = 0x00007FF60AA297B0 (8ab3060e-2cba-4f23-b74c-b52db3bdfb46)So it looks as if the |
|
Well, this could very well be another symptom of #1324, as we have a compile-time instance of a struct with a union again. |
|
I don't know what's going on here :( Edit: "fix Phobos" would mean changing "enum" to "auto" in the unittest, basically going from ctfe to runtime, hiding the union bug. |
|
AppVeyor retriggered after merging #1846 and also verified locally - no change unfortunately, |
|
:( |
|
@UplinkCoder: Sure. I might have implemented the struct/class reference lowering in LDC. My point is that the lack of identity is supposed to be non-observable at runtime. If it is (e.g. if the data is mutable, like in issue 15989), it's mainly an accepts-invalid DMD bug. |
|
@klickverbot You know certainly alot more about ldc internals then I do (I know nothing about them). |
|
The unoptimized IR (I've used the merge-2.072 branch, this PR not merged in) looks absolutely fine: import std.uuid;
void main()
{
import std.encoding : Char = AsciiChar;
enum utfstr = "8ab3060e-2cba-4f23-b74c-b52db3bdfb46";
alias String = immutable(Char)[];
enum String s = cast(String)utfstr;
enum id = UUID(utfstr);
Char[36] str;
id.toString(str[]);
assert(str == s);
}%std.uuid.UUID = type { [16 x i8] }
; data for `enum UUID id`
@.arrayliteral = internal unnamed_addr constant [16 x i8] c"\8A\B3\06\0E,\BAO#\B7L\B5-\B3\BD\FBF" ; [#uses = 1]
; `enum String s`, used in the comparison
@.str = private unnamed_addr constant [37 x i8] c"8ab3060e-2cba-4f23-b74c-b52db3bdfb46\00" ; [#uses = 1]
%str = alloca [36 x i8], align 1 ; [#uses = 4, size/byte = 36]
; enum UUID id
%.structliteral = alloca %std.uuid.UUID, align 8 ; [#uses = 2, size/byte = 16]
; zero-initialize `str`
%1 = bitcast [36 x i8]* %str to i8* ; [#uses = 1]
call void @llvm.memset.p0i8.i64(i8* %1, i8 0, i64 36, i32 1, i1 false)
; [...]
; initialize single `id` field via memcpy from @.arrayliteral
%4 = getelementptr inbounds %std.uuid.UUID, %std.uuid.UUID* %.structliteral, i32 0, i32 0 ; [#uses = 1, type = [16 x i8]*]
%5 = bitcast [16 x i8]* %4 to i8* ; [#uses = 1]
call void @llvm.memcpy.p0i8.p0i8.i64(i8* %5, i8* getelementptr inbounds ([16 x i8], [16 x i8]* @.arrayliteral, i32 0, i32 0), i64 16, i32 1, i1 false)
; str[]
%6 = bitcast [36 x i8]* %str to i8* ; [#uses = 1]
%7 = insertvalue { i64, i8* } { i64 36, i8* undef }, i8* %6, 1 ; [#uses = 1]
; id.toString(str[])
call void @_D3std4uuid4UUID39__T8toStringTAE3std8encoding9AsciiCharZ8toStringMxFNaNbNiNfMAE3std8encoding9AsciiCharZv(%std.uuid.UUID* nonnull %.structliteral, { i64, i8* } %7) #0
; str[]
%8 = bitcast [36 x i8]* %str to i8* ; [#uses = 1]
%9 = insertvalue { i64, i8* } { i64 36, i8* undef }, i8* %8, 1 ; [#uses = 1]
; str == s
%10 = call i32 @_adEq2({ i64, i8* } %9, { i64, i8* } { i64 36, i8* getelementptr inbounds ([37 x i8], [37 x i8]* @.str, i32 0, i32 0) }, %object.TypeInfo* bitcast (%"typeid(AsciiChar[])"* @_D34TypeInfo_AE3std8encoding9AsciiChar6__initZ to %object.TypeInfo*)) #2 ; [#uses = 1]My results back then via printf clearly indicated that the run-time data of @UplinkCoder: Please have a look at the interesting comment in https://github.com/dlang/phobos/blob/master/std/uuid.d#L353. |
|
For the record: I'm absolutely eager to merge this and am in favor of just using |
|
This PR doesn't affect the problematic unittest itself IR-wise; two slices are compared, and this PR only supports static arrays so far. |
|
Shall I modify Phobos and merge this? |
|
Seems reasonable, although introducing a known miscompilation leaves a bit of a stale taste… I don't have time to look into the issue any more closely right now, though. |
|
Yeah... this is a strange bug. I'm not sure whether it is a miscompile or a unittest bug. My current feeling is that it is a unittest bug, exposed by aggressive optimization enabled by this PR. |
|
Yep… I'd say let's merge it and keep a close eye on it throughout the 1.2 beta phase. |
|
green after phobos modification. |
|
Stefan suggested a |
|
With |
…h this is valid. Resolves ldc-developers#1632
|
Travis is not retriggering :( |
|
Yay, finally. ;) I don't think it'll make a huge impact in the current form. As soon as slices are supported (should be straight foward), that will definitely change, and I actually expect noticeable performance improvements for client code (incl. synthetic benchmarks, where LDC and D in general could even shine a bit more). And we should work on supporting suited structs soon too. |
Resolves #1632.