Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check Arch overrides on Windows #311

Closed
noc0lour opened this issue Nov 21, 2019 · 10 comments · Fixed by #393 or #411
Closed

Check Arch overrides on Windows #311

noc0lour opened this issue Nov 21, 2019 · 10 comments · Fixed by #393 or #411
Labels
Enhancement new kernel entirely or for some specific ARCH question

Comments

@noc0lour
Copy link
Member

Check if all arch overrides are still valid for windows. E.g. AVX should be available on Appveyor, but it's ruled out.

@jdemel
Copy link
Contributor

jdemel commented Nov 25, 2019

@noc0lour do you mind to take care of this?

@noc0lour
Copy link
Member Author

@jdemel If I find time to test this on one of my few windows installations, sure. But don't count on it.

@noc0lour
Copy link
Member Author

Also this is best checked across a variety of architectures to make sure we are doing the correct thing here. I don't know which SIMD instructions are supported by which VS version.

@jdemel
Copy link
Contributor

jdemel commented Nov 26, 2019

Let's start with the newest version available. If it works with that version, we can discuss if and which older versions we want to add.

@noc0lour
Copy link
Member Author

I've just checked the lastest appveyor build and Windows only runs generic tests. https://ci.appveyor.com/project/gnuradio/volk/builds/35919812

I think I can see the issue there with the way machines and archs are defined and enabled at compile time. Because for Windows SSE is removed from the list of archs the only matching "machine" which contains all availble archs is generic because every other machine contains a sse statement. This is checked int volk_compile_utils.py

Maybe this needs a rework now that we have cpu_features to help wtih this at runtime?

@jdemel
Copy link
Contributor

jdemel commented Oct 24, 2020

From the 393 discussion
MSVC docs state that on x64 only AVX|AVX2|AVX512 are available. For x32 it is IA32|SSE|SSE2|AVX|AVX2|AVX512.
Currently, we run our Windows CI on x64 and remove mmx and sse from our available archs. That causes all further archs to be removed as well because we rely on previous archs to be present to select available machines.

We do not distinguish between x32 or x64 for our arch checks. Thus, it is currently difficult to reliably choose one or the other architecture. What we do is, we choose AVX for SSSE3 etc. kernels for MSVC. It would be great to be able to compile all kernels with MSVC. I wonder if we should set the minimum compile flag to /arch:AVX and be done with it.
AVX is already around for quite a while but some really small CPUs didn't support it until very recently. Maybe this is irrelevant because Windows doesn't support them either and we can rely on AVX but we can't really enable optimized kernels with MSVC builds as long as we don't know. This doesn't make the current situation any worse than before. But it would be great to make all kernels work.

This exemplifies the problem:

volk/gen/machines.xml

Lines 45 to 47 in 24c7c84

<machine name="avx">
<archs>generic 32|64| mmx| sse sse2 sse3 ssse3 sse4_1 sse4_2 popcount avx orc|</archs>
</machine>

We require all those archs but MSVC fails to provide SSE archs. In the kernels, we rely on the assumption that these chains are valid. With cpu_features we should have a way to reliably detect all archs on runtime but this is a compiler issue.

I reopen this issue to make sure we don't forget about it.

@noc0lour
Copy link
Member Author

It seem that all x64 machines support SSE instructions and thus MSVC opted to remove these options from the commandline since they are always available? I have only found this: https://stackoverflow.com/questions/1067630/sse2-option-in-visual-c-x64 on the topic, but tbh evrything else would be a bit weird in terms of enabling speedups.

@jdemel
Copy link
Contributor

jdemel commented Oct 24, 2020

Wikipedia x86-64

Floating point operations are supported via mandatory SSE2-like instructions

ergo we can expect SSE2 but nothing further. Since MSVC goes directly from SSE2 to AVX, we can't use such fine grained controls like we do for other compilers.

Further:

The 32-bit edition of Windows 8, for example, requires the presence of SSE2 instructions.

I assume that requirement holds up for later versions as well.

Intel Celeron N4000 is a x64 CPU that only supports SSE4.2 and no AVX. It's released in Q4'17. I wonder how this is supposed to be handled? Ignore SSE4 or does the SSE2 flag enable it implicitly? Or are the intrinsics available but MSVC doesn't optimize for these archs?

@noc0lour
Copy link
Member Author

@jdemel Somehow I think we have to separate two issues here: Building as many intrinsics as possible for a given compiler only depends on this compilers ability to support extended notation for this instruction set.

The second issue is then only loading intrinsics for a given CPU this is executed on. For MSVC we are testing with the flags if the compiler is able to generate the intrinsics, runtime detection is now done with cpu_features?

@jdemel
Copy link
Contributor

jdemel commented Oct 24, 2020

Yeah. Runtime detection should work. At least we have every reason to believe that.

Your PR #411 looks promising.

@jdemel jdemel linked a pull request Nov 7, 2020 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement new kernel entirely or for some specific ARCH question
Projects
None yet
2 participants