Skip to content
This repository has been archived by the owner on Feb 8, 2024. It is now read-only.

Add {load, store}Unaligned and prefetch wrappers in core.simd #163

Merged
merged 12 commits into from
Sep 11, 2019
Merged

Add {load, store}Unaligned and prefetch wrappers in core.simd #163

merged 12 commits into from
Sep 11, 2019

Conversation

baziotis
Copy link

@baziotis baziotis commented Aug 8, 2019

Issue: ldc-developers/ldc#3121

Needs a check on the handling of unittests (which are common).

@baziotis baziotis changed the title Ldc Add {load, store}Unaligned and prefetch wrappers in core.simd Aug 8, 2019
src/core/simd.d Outdated
Returns:
Vector
*/
pragma(inline, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement-form probably doesn't have any effect (should only be used inside a function and then affects that function, but requires semantic analysis of that function), so please get rid of the trailing semicolon.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes my bad. But just for info, what did you mean by this?

should only be used inside a function and then affects that function, but requires semantic analysis of that function

What pragma(inline) has to do with the semantic analysis of a function?

Copy link
Member

@kinke kinke Aug 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meaning that if the body of the function isn't analyzed, it won't be inlined.

a.d:

void foo()
{
    pragma(inline, true);
}

// supposedly equivalent, but not in practice:
pragma(inline, true)
void bar() {}

main.d:

import a;
void main()
{
    foo();
    bar();
}

ldc2 -output-ll main.d, then viewing main.ll with text editor: foo() call isn't inlined/eliminated. [It is if foo is a template.]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh now I see. I guess LDC analyzes functions that have at least one statement. I don't know how a pragma is called formally, but I guess it's not an actual statement.
And so, foo has nothing and so it's body is not analyzed (hence the compiler never sees the pragma). bar is not analyzed either, but the pragma is seen because it's not part of its body.

Is this LDC specific? Also, the reason that you put them in separate files is because in one file, it's one "compilation unit" and so the compiler can reason and just not declare any of them? While I don't know, on a separate it has to declare foo as external symbol.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's all in the same file, then the foo-body is fully analyzed (=> codegen into object file), and then the pragma is applied, and all is well. If you're only compiling main.d, then foo isn't codegen'd, the frontend doesn't analyze more than it has to and misses the pragma in the body. And if you compile both modules into one object file (ldc2 -output-ll -singleobj a.d main.d), all is well again. And if you omit the -singleobj and compile the 2 modules into 2 separate object files in a single cmdline, I've just seen that both calls aren't inlined... => bug ;)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give me a second as I'm trying to comprehend this. To rephrase:

  • If I try to compile a single file, then a single object file is generated. And the front-end has to analyze all the symbols, as they're in the same file. So everything ok.
  • Now, if compile main.d only, the compiler has to generate one object file for main.d. That object file calls some functions foo and bar which though will be in some other object file and the only thing that the front-end cares about is the API of these, not the body. So, it misses the body.
  • The bug is that in 2 object files, only bar should have been inlined? As if just compiling main.d?

src/core/simd.d Outdated
is(V == int4) ||
is(V == uint4) ||
is(V == long2) ||
is(V == ulong2))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fulfilling the DMD interface doesn't mean we cannot add all vector types and get rid of these 128-bit vectors limitation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you mean what I added. Although it starts to look a little too much. I think this can be done with some kind of isVector compile-time function.

src/core/simd.d Outdated Show resolved Hide resolved
@kinke
Copy link
Member

kinke commented Aug 8, 2019

By all vectors, I really meant all ;), not just adding the 256-bit ones (there's also AVX-512 and non-x86 archs...). E.g., even this works:

void foo()
{
    alias V = __vector(float[1023]);
    float[V.length] f = 1;
    import ldc.simd;
    auto v = loadUnaligned!V(f.ptr);
    v += 2;
}

@baziotis
Copy link
Author

baziotis commented Aug 8, 2019

By all vectors, I really meant all ;), not just adding the 256-bit ones.

Hmm.. Ok, let me see how I can do this.

@baziotis
Copy link
Author

baziotis commented Aug 8, 2019

If it's what I uploaded, I'll feel quite dumb haha. I was searching 10 minutes how to pattern match arbitrary types in __vector in the function contract. Like:
if(is(V == __vector(T[len])) where T would be some kind of template parameter and len also, but size_t. -.-

@kinke
Copy link
Member

kinke commented Aug 10, 2019

The existing unittests (nested test(T)) don't work with 256-bit vectors, they are specialized for 128-bit and need to be generalized if you want to instantiate them with the additional ones.

You can build the druntime test runner like this, in the build dir: ninja druntime-test-runner[-debug][-shared]. Then run the core.simd tests with runtime/druntime-test-runner[-debug][-shared] core.simd.

@baziotis
Copy link
Author

baziotis commented Aug 10, 2019

EDIT: Ignore this message freely.


The existing unittests (nested test(T)) don't work with 256-bit vectors, they are specialized for 128-bit and need to be generalized if you want to instantiate them with the additional ones.

I don't know exactly to handle submodules. druntime is submodule in ldc and so I can't switch to my branch.
But that's not a real problem, I will reapply the commit there and then checkout it. And I built the druntime-test-runner-debug.

But, it seems that there is an error in the old unittests. Or maybe better, something doesn't works that works in DMD

core/simd.d(808): Error: invalid cast from `ubyte` to `void`

That points to this:

line.
And it points there even if I remove the new tests.
It seems that the problem is with test!void16() in storeUnaligned tests. If I remove this line, then there are weird linker problems. They're less if I add -shared. But still, there's one:

undefined reference to `_D2rt4util7utility10safeAssertFNbNiNfbMAyaMQemZv'

I guess it's in the assert in the unittests but I don't understand why.

As for the unittests per se, I surely see this problem:

// Should have enough data to test all 16-byte alignments, and still

whici is it doesn't have enough space for 32 bytes vectors and it doesn't test all 32 alignments. I suppose when you say generalizing, you mean the array size and the alignments.

@baziotis
Copy link
Author

Sorry I was late on this. I rebuilt LDC and everything ran just fine, meaning, all 4 druntime-test-runner combinations output PASS.

Now, the thing that remains is to generalize the unittests which I guess includes fixing this void16 thing on the storeUnaligned unittest.

@kinke
Copy link
Member

kinke commented Aug 10, 2019

I suppose when you say generalizing, you mean the array size and the alignments.

Exactly.

I guess includes fixing this void16 thing

Nope, that's a separate issue in the codegenerator and a nice finding - we error out when default-initializing a void-vector, so that needs to be fixed in the compiler.

@baziotis
Copy link
Author

baziotis commented Aug 10, 2019

I suppose when you say generalizing, you mean the array size and the alignments.

Exactly.

Good, but where do we put a stop? Meaning, since you pointed and as I can see here LLVM Vector Type, it doesn't seem that there's a limit on the size.

Edit: Although, one guess is to stop on 512 or 1024 since I think there's no CPU currently supporting more than 512.

Nope, that's a separate issue in the codegenerator and a nice finding - we error out when default-initializing a void-vector, so that needs to be fixed in the compiler.

Ah, great. I'll take a look on the codegen just for fun.

@kinke
Copy link
Member

kinke commented Aug 10, 2019

I'll take a look on the codegen just for fun.

Compile this with ldc2 -vv -c:

void foo()
{
    alias V = __vector(void[16]);
    V v;
}

=> last lines of output before the error:

VectorExp::toElem() cast(__vector(void[16]))cast(ubyte)0u
* normal (splat) expression
* IntegerExp::toElem: cast(ubyte)0u @ ubyte
* * IntegerExp::toConstElem: cast(ubyte)0u @ ubyte
* * * Building type: ubyte
* * * value = i8 0
* Casting from 'ubyte' to 'void'

You can then grep for VectorExp::toElem() (the string, there's no such function) and try to fix it. :)

@baziotis
Copy link
Author

baziotis commented Aug 10, 2019

Thank you! I see it has the DMD AST structure (which well.. expected since AFAIK it gets an AST from DMD) with toChars() etc. Great!

Edit: Unrelated, but LDC is really -- slow -- to -- build. :P
C++ hits.

@baziotis
Copy link
Author

It took me 1 hour of running the installed LDC and not the one I was building (hey, it was 12 midnight here).
Another 1 hour that I broke the LDCs I had by generating wrong code (fun times).
And about 2 hours of building LDC and trying to make sense of the codegen.
And here's the PR: ldc-developers/ldc#3130

I feel like it's very hacky. I'll see what the CI says and continue to look at it tomorrow.

@kinke
Copy link
Member

kinke commented Aug 12, 2019

LDC is really -- slow -- to -- build. C++ hits.

Compared to DMD, yes, but the backend complexity isn't comparable. Rebuilding ninja ldc2 after changing a single .cpp shouldn't take much longer than 10 secs though (most of which is spent for re-linking the ldc2 executable) - that's my experience on Windows. [There are useful CMake options when working on the D files.]

@baziotis
Copy link
Author

baziotis commented Aug 12, 2019

Rebuilding ninja ldc2 after changing a single .cpp shouldn't take much longer than 10 secs

Oh.. it is ninja ldc2. What a satisfactory comment that was. Well, see, I didn't know that. :P
So, every time I was doing ninja. Which takes about 10 minutes, for every change in a .cpp file. :)

that's my experience on Windows. [There are useful CMake options when working on the D files.]

To get LLVM and LDC running on Windows is pretty much the same procedure as in Linux?
I'm especially interested in Visual Studio for the debug experience.
Because for DMD, it's really not at all the same procedure. :)

@kinke
Copy link
Member

kinke commented Aug 12, 2019

To get LLVM and LDC running on Windows is pretty much the same procedure as in Linux?

More or less, since CMake is cross-platform. The prerequisites, incl. shell setup, are more involved though. See https://wiki.dlang.org/Building_and_hacking_LDC_on_Windows_using_MSVC. Prebuilt LDC-LLVM v7.0.0 (excl. 7.0.0-2) probably works out of the box, so you may be able to skip building LLVM yourself.

@baziotis
Copy link
Author

Experimental meaning I don't know how good they are. They don't seem that good to me.
For one, you can't use static foreach because of @nogc so there are these ugly mixins.
But also, the whole test function I think needs rethinking because you can't do this:

alias V = __vector(float[1023]);
V v;

because of alignment but can do the code that kinke wrote in a comment above.

@baziotis
Copy link
Author

My bad that I hadn't run the testsuite. Now that I did, 2 tests fail:
codegen/output_s_affect_codegen.d and codegen/simd_unaligned.d. This is expected but I don't know what should be done. With these wrappers, one can't import core.simd and ldc.simd together and use load/storeUnaligned as there's a conflict.
That seems like a breaking change.
So, do we leave a breaking change and require that the user fully qualifies the intrinsic (i.e. static import) ? Which means that the test cases have to be changed.
Alternatively.. well I can't think of an alternative except for not providing these wrappers.

@kinke
Copy link
Member

kinke commented Aug 12, 2019

That seems like a breaking change.

Indeed, no good. Maybe we should instead support the upstream signature in ldc.simd and then publicly import the template in core.simd. The upstream implementation could be streamlined with that version too then.

Something like:

private alias ElementType(V) = typeof(V.array[0]);

template loadUnaligned(V)
{
    deprecated("bla please use other version bla")
    V loadUnaligned(const V* p);
    V loadUnaligned(const ElementType!V* p);
}

void foo()
{
    alias float4 = __vector(float[4]);
    const float[4] a;
    static assert(is(typeof(loadUnaligned(cast(const float4*) a.ptr)) == float4));
    static assert(is(typeof(loadUnaligned!float4(a.ptr)) == float4));
}

@baziotis
Copy link
Author

The upstream implementation could be streamlined with that version too then.

Good indeed! No need for fully qualified name. I'll see what I can do tomorrow.

src/ldc/simd.di Outdated
alias BaseType!V ElementType;

pragma(inline, true)
deprecated("This is the DMD interface, use it only if cross-compiling. Otherwise, please use the LDC interface.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'Cross-compiling' usually means something completely different, compiling on a host for a different target. 'use it only for DMD compatibility' would be better IMO.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I meant to say "if you plan to compile with both DMD and LDC" but too versbose. "DMD compatibility" sells it better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not add the deprecation message here. If you work on a codebase where deprecations == errors, you force people to immediately start using LDC's interface, while the DMD interface has not been deprecated by Dlang. Adding the deprecation here would mean that LDC would deprecate something that GDC and DMD don't; let's not fork things here :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it seems good to have it in a comment though. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding it to the documentation of the function is fine. dmd -de ... should work as before :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't -de for features that are officially deprecated? i.e. we don't have control.
https://dlang.org/deprecate.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add deprecated calling the function will no longer compile with -de.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it :)

src/core/simd.d Outdated
static string generalTests(string T, string size)()
{
string res;
res ~= "test!(__vector("~T~"[8 / "~size~"]))();";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, we can do without these extra tests (the loadUnaligned tests haven't been extended). Sorry if I put you on the wrong track; I said you'd need to generalize the tests if you want to test non-128-bit vectors too. If you do want to keep the tests, then the mixin orgy and passing type and size as strings can be handled more elegantly:

static void generalTests(T)()
{
    static foreach (size; [8, 16, 32, 64])
        test!(__vector(T[size / T.sizeof]));
}

generalTests!void();
generalTests!byte();
...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool indeed, I haven't thought about it. TBH, since we moved the code to ldc/simd, I don't know if there is any point in adding anything to core/simd.

@kinke
Copy link
Member

kinke commented Sep 11, 2019

I can't push to your branch; could you please resync your local repo with LDC's, git cherry-pick 34f28ceafaff43c32963bab58db421637759251e and then push? I'll squash-merge it then.

@baziotis
Copy link
Author

baziotis commented Sep 11, 2019

I can't push to your branch; could you please resync your local repo with LDC's, git cherry-pick 34f28ceafaff43c32963bab58db421637759251e and then push? I'll squash-merge it then.

I don't know what happened. The command above gives me: fatal: bad object 34f28ceafaff43c32963bab58db421637759251e

Edit: Btw, I did the bad thing of doing the changes in the ldc branch. Maybe that's the problem.

@baziotis
Copy link
Author

Ok, I fetched the upstream. Now it should be ok.

@kinke
Copy link
Member

kinke commented Sep 11, 2019

Thx; I meant 'fetch' when saying 'resync', my bad.

@kinke kinke merged commit d995865 into ldc-developers:ldc Sep 11, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants