-
Notifications
You must be signed in to change notification settings - Fork 29
Add {load, store}Unaligned and prefetch wrappers in core.simd #163
Conversation
src/core/simd.d
Outdated
Returns: | ||
Vector | ||
*/ | ||
pragma(inline, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This statement-form probably doesn't have any effect (should only be used inside a function and then affects that function, but requires semantic analysis of that function), so please get rid of the trailing semicolon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes my bad. But just for info, what did you mean by this?
should only be used inside a function and then affects that function, but requires semantic analysis of that function
What pragma(inline)
has to do with the semantic analysis of a function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meaning that if the body of the function isn't analyzed, it won't be inlined.
a.d:
void foo()
{
pragma(inline, true);
}
// supposedly equivalent, but not in practice:
pragma(inline, true)
void bar() {}
main.d:
import a;
void main()
{
foo();
bar();
}
ldc2 -output-ll main.d
, then viewing main.ll
with text editor: foo()
call isn't inlined/eliminated. [It is if foo
is a template.]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh now I see. I guess LDC analyzes functions that have at least one statement. I don't know how a pragma
is called formally, but I guess it's not an actual statement.
And so, foo
has nothing and so it's body is not analyzed (hence the compiler never sees the pragma
). bar
is not analyzed either, but the pragma
is seen because it's not part of its body.
Is this LDC specific? Also, the reason that you put them in separate files is because in one file, it's one "compilation unit" and so the compiler can reason and just not declare any of them? While I don't know, on a separate it has to declare foo
as external symbol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's all in the same file, then the foo-body is fully analyzed (=> codegen into object file), and then the pragma is applied, and all is well. If you're only compiling main.d
, then foo
isn't codegen'd, the frontend doesn't analyze more than it has to and misses the pragma in the body. And if you compile both modules into one object file (ldc2 -output-ll -singleobj a.d main.d
), all is well again. And if you omit the -singleobj
and compile the 2 modules into 2 separate object files in a single cmdline, I've just seen that both calls aren't inlined... => bug ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Give me a second as I'm trying to comprehend this. To rephrase:
- If I try to compile a single file, then a single object file is generated. And the front-end has to analyze all the symbols, as they're in the same file. So everything ok.
- Now, if compile
main.d
only, the compiler has to generate one object file formain.d
. That object file calls some functionsfoo
andbar
which though will be in some other object file and the only thing that the front-end cares about is the API of these, not the body. So, it misses the body. - The bug is that in 2 object files, only
bar
should have been inlined? As if just compilingmain.d
?
src/core/simd.d
Outdated
is(V == int4) || | ||
is(V == uint4) || | ||
is(V == long2) || | ||
is(V == ulong2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fulfilling the DMD interface doesn't mean we cannot add all vector types and get rid of these 128-bit vectors limitation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you mean what I added. Although it starts to look a little too much. I think this can be done with some kind of isVector
compile-time function.
By all vectors, I really meant all ;), not just adding the 256-bit ones (there's also AVX-512 and non-x86 archs...). E.g., even this works: void foo()
{
alias V = __vector(float[1023]);
float[V.length] f = 1;
import ldc.simd;
auto v = loadUnaligned!V(f.ptr);
v += 2;
} |
Hmm.. Ok, let me see how I can do this. |
If it's what I uploaded, I'll feel quite dumb haha. I was searching 10 minutes how to pattern match arbitrary types in |
The existing unittests (nested You can build the druntime test runner like this, in the build dir: |
EDIT: Ignore this message freely.
I don't know exactly to handle submodules. druntime is submodule in ldc and so I can't switch to my branch. But, it seems that there is an error in the old unittests. Or maybe better, something doesn't works that works in DMD
That points to this: Line 808 in 4995c11
And it points there even if I remove the new tests. It seems that the problem is with test!void16() in storeUnaligned tests. If I remove this line, then there are weird linker problems. They're less if I add -shared . But still, there's one:
I guess it's in the As for the unittests per se, I surely see this problem: Line 741 in 4995c11
whici is it doesn't have enough space for 32 bytes vectors and it doesn't test all 32 alignments. I suppose when you say generalizing, you mean the array size and the alignments. |
Sorry I was late on this. I rebuilt LDC and everything ran just fine, meaning, all 4 druntime-test-runner combinations output PASS. Now, the thing that remains is to generalize the unittests which I guess includes fixing this |
Exactly.
Nope, that's a separate issue in the codegenerator and a nice finding - we error out when default-initializing a void-vector, so that needs to be fixed in the compiler. |
Good, but where do we put a stop? Meaning, since you pointed and as I can see here LLVM Vector Type, it doesn't seem that there's a limit on the size. Edit: Although, one guess is to stop on 512 or 1024 since I think there's no CPU currently supporting more than 512.
Ah, great. I'll take a look on the codegen just for fun. |
Compile this with void foo()
{
alias V = __vector(void[16]);
V v;
} => last lines of output before the error:
You can then grep for |
Thank you! I see it has the DMD AST structure (which well.. expected since AFAIK it gets an AST from DMD) with Edit: Unrelated, but LDC is really -- slow -- to -- build. :P |
It took me 1 hour of running the installed LDC and not the one I was building (hey, it was 12 midnight here). I feel like it's very hacky. I'll see what the CI says and continue to look at it tomorrow. |
Compared to DMD, yes, but the backend complexity isn't comparable. Rebuilding |
Oh.. it is ninja ldc2. What a satisfactory comment that was. Well, see, I didn't know that. :P
To get LLVM and LDC running on Windows is pretty much the same procedure as in Linux? |
More or less, since CMake is cross-platform. The prerequisites, incl. shell setup, are more involved though. See https://wiki.dlang.org/Building_and_hacking_LDC_on_Windows_using_MSVC. Prebuilt LDC-LLVM v7.0.0 (excl. 7.0.0-2) probably works out of the box, so you may be able to skip building LLVM yourself. |
Experimental meaning I don't know how good they are. They don't seem that good to me.
because of alignment but can do the code that kinke wrote in a comment above. |
My bad that I hadn't run the testsuite. Now that I did, 2 tests fail: |
Indeed, no good. Maybe we should instead support the upstream signature in Something like: private alias ElementType(V) = typeof(V.array[0]);
template loadUnaligned(V)
{
deprecated("bla please use other version bla")
V loadUnaligned(const V* p);
V loadUnaligned(const ElementType!V* p);
}
void foo()
{
alias float4 = __vector(float[4]);
const float[4] a;
static assert(is(typeof(loadUnaligned(cast(const float4*) a.ptr)) == float4));
static assert(is(typeof(loadUnaligned!float4(a.ptr)) == float4));
} |
Good indeed! No need for fully qualified name. I'll see what I can do tomorrow. |
src/ldc/simd.di
Outdated
alias BaseType!V ElementType; | ||
|
||
pragma(inline, true) | ||
deprecated("This is the DMD interface, use it only if cross-compiling. Otherwise, please use the LDC interface.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'Cross-compiling' usually means something completely different, compiling on a host for a different target. 'use it only for DMD compatibility' would be better IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. I meant to say "if you plan to compile with both DMD and LDC" but too versbose. "DMD compatibility" sells it better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not add the deprecation message here. If you work on a codebase where deprecations == errors, you force people to immediately start using LDC's interface, while the DMD interface has not been deprecated by Dlang. Adding the deprecation here would mean that LDC would deprecate something that GDC and DMD don't; let's not fork things here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, it seems good to have it in a comment though. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding it to the documentation of the function is fine. dmd -de ...
should work as before :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't -de
for features that are officially deprecated? i.e. we don't have control.
https://dlang.org/deprecate.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you add deprecated
calling the function will no longer compile with -de
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah got it :)
src/core/simd.d
Outdated
static string generalTests(string T, string size)() | ||
{ | ||
string res; | ||
res ~= "test!(__vector("~T~"[8 / "~size~"]))();"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, we can do without these extra tests (the loadUnaligned
tests haven't been extended). Sorry if I put you on the wrong track; I said you'd need to generalize the tests if you want to test non-128-bit vectors too. If you do want to keep the tests, then the mixin orgy and passing type and size as strings can be handled more elegantly:
static void generalTests(T)()
{
static foreach (size; [8, 16, 32, 64])
test!(__vector(T[size / T.sizeof]));
}
generalTests!void();
generalTests!byte();
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool indeed, I haven't thought about it. TBH, since we moved the code to ldc/simd
, I don't know if there is any point in adding anything to core/simd
.
I can't push to your branch; could you please resync your local repo with LDC's, |
I don't know what happened. The command above gives me: Edit: Btw, I did the bad thing of doing the changes in the ldc branch. Maybe that's the problem. |
Ok, I fetched the upstream. Now it should be ok. |
Thx; I meant 'fetch' when saying 'resync', my bad. |
Issue: ldc-developers/ldc#3121
Needs a check on the handling of unittests (which are common).