Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[scsynth][supernova] fix Windows not guarding denormals #4504

Merged
merged 2 commits into from Feb 15, 2020

Conversation

jrsurge
Copy link
Member

@jrsurge jrsurge commented Jul 27, 2019

Purpose and Motivation

[Edit] Updated in line with #4504 (comment)

This fixes #4046 - Windows not guarding denormals.

MSVC doesn't set the usual __SSE__ and __SSE2__ etc definitions, so denormals aren't properly handled on Windows.

This PR uses boost.predef to check for requested SIMD instructions.

Types of changes

To-do list

  • Code is tested
    scsynth, yes. supernova doesn't compile on msvc
  • All tests are passing
    - [ ] Updated documentation
  • This PR is ready for review

@jrsurge jrsurge added bug Issues that relate to unexpected/unwanted behavior. Don't use for PRs. comp: build CMake build system comp: scsynth comp: supernova labels Jul 28, 2019
@sonoro1234
Copy link
Contributor

sonoro1234 commented Jul 30, 2019

Travis is failing because it has CMake 3.9 instead of 3.10

Perhaps can be done without CMake

this sets the flags

#ifdef _MSC_VER
    #if (defined(_M_AMD64) || defined(_M_X64))
        #define SC_SSE2
        #define SC_SSE
    #elif _M_IX86_FP == 2
        #define  SC_SSE2
        #define SC_SSE
    #elif _M_IX86_FP == 1
        #define SC_SSE
    #endif
#else
    #ifdef __SSE2__
        #define SC_SSE2
    #endif
    #ifdef __SSE__
        #define SC_SSE
    #endif
#endif 

which are then used as in this PR

CMakeLists.txt Outdated
# so we have to set our own

cmake_host_system_information(RESULT SC_SYSTEM_HAS_SSE QUERY HAS_SSE)
cmake_host_system_information(RESULT SC_SYSTEM_HAS_SSE2 QUERY HAS_SSE2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every x86 that doesn't have sse1/sse2 should probably go to a museum (especially they are available on all 64bit intel CPUs)

it might be better to port the feature detection macros to boost.predef to ensure portability:
https://www.boost.org/doc/libs/1_66_0/doc/html/predef/reference.html#predef.reference.boost_hw_hardware_macros

@sonoro1234
Copy link
Contributor

Except for requiring cmake 3.10 @jsurge solución is the simplest one.

@jrsurge
Copy link
Member Author

jrsurge commented Aug 3, 2019

after discussions with other devs, the case for bumping CMake is not strong enough here.

I'm currently re-writing to incorporate @sonoro1234's solution and remove the build system from this change.

@jrsurge jrsurge force-pushed the topic/fix-windows-denormals branch 2 times, most recently from 13b5ac2 to a2c3fb9 Compare August 3, 2019 23:45
@jrsurge jrsurge removed the comp: build CMake build system label Aug 3, 2019
@mossheim
Copy link
Contributor

mossheim commented Aug 4, 2019

@jrsurge why not use boost.predef as @timblechmann suggested? then we don't need to add an entirely new file that does the same thing.

Also, just want to make sure this is clear to everyone -- these macros do not detect whether SSE/SSE2/AVX/etc. are available on the hardware; rather they are set by the compiler to indicate that the corresponding instruction set may be used during compilation. it's totally legitimate to disallow certain instruction sets even if they are available on the machine used for compiling; for instance, so that the build can be copied to another machine without those instruction sets.

@jrsurge
Copy link
Member Author

jrsurge commented Aug 4, 2019

@jrsurge why not use boost.predef as @timblechmann suggested? then we don't need to add an entirely new file that does the same thing.

boost didn't seem to make this any easier, if not harder.
having looked at it again, I think I've got it working with boost, but I'm much less confident I'm correct.

Also, just want to make sure this is clear to everyone -- these macros do not detect whether SSE/SSE2/AVX/etc. are available on the hardware; rather they are set by the compiler to indicate that the corresponding instruction set may be used during compilation. it's totally legitimate to disallow certain instruction sets even if they are available on the machine used for compiling; for instance, so that the build can be copied to another machine without those instruction sets.

(with reference to the boost docs more specifically) the word detection is everywhere with reference to these macros. I can make the wording in commits less strong?

I'm slightly out of my depth on this one, I'm just trying to do what I'm told to fix an issue.

@jrsurge jrsurge force-pushed the topic/fix-windows-denormals branch 2 times, most recently from 514a0ab to bda6a96 Compare August 4, 2019 03:21
#include <boost/predef/architecture.h>
#include <boost/predef/hardware.h>

#ifdef BOOST_HW_SIMD_AVAILABLE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be BOOST_HW_SIMD_X86 >= BOOST_HW_SIMD_X86_SSE_VERSION (BOOST_HW_SIMD_AVAILABLE is for all SIMD sets, but xmmintrin.h is x86-specific)

@@ -207,7 +210,7 @@ static void name_current_thread(int thread_index) {
}

static void set_daz_ftz(void) {
#ifdef __SSE__
#if defined(BOOST_ARCH_X86_64) || (BOOST_HW_SIMD_X86 >= BOOST_X86_SSE_VERSION)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can simply be BOOST_HW_SIMD_X86 >= BOOST_X86_SSE_VERSION, and you can remove the architecture.h include


void sc_SetDenormalFlags() {
_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
_mm_setcsr(_mm_getcsr() | 0x40); // DAZ
}

#elif defined(__SSE__)
# include <xmmintrin.h>
#elif BOOST_HW_SIMD_X86 >= BOOST_X86_SSE_VERSION
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did some looking around -- this code can be changed to match the supernova implementation below. all the operations in it are available and work in the SSE instruction set. in fact the entire thing can be reduced to

static void sc_SetDenormalFlags() {
#if BOOST_HW_SIMD_X86 >= BOOST_HW_SIMD_X86_SSE_VERSION
    _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
    _mm_setcsr(_mm_getcsr() | 0x40); // DAZ
#endif
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can also then remove the architecture.h include above

@jrsurge
Copy link
Member Author

jrsurge commented Dec 29, 2019

thanks for the review @brianlheim!

Working on it.

this fixes denormals on Windows
this fixes denormals on Windows
@jrsurge jrsurge force-pushed the topic/fix-windows-denormals branch from bda6a96 to ff2c03d Compare January 4, 2020 12:38
Copy link
Contributor

@mossheim mossheim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@mossheim mossheim merged commit 2f42c47 into supercollider:develop Feb 15, 2020
@totalgee
Copy link
Contributor

Thanks, here's some feedback on 3.11.0-beta1. I tested my previous version (3.10.3) and the new beta version, and I can definitely see an improvement -- a major reduction in CPU spikes in the cases where I saw them before. And all this time I thought the issue was with my Windows laptop, doing CPU throttling or power management or something! (-;

In some cases where I used to regularly experience CPU spikes (using VSTs and also doing ambisonic stuff with SC-HOA), I no longer seem to get them. The most obvious "test case" was with some ambisonic stuff (lots of convolution)...when I was actively spatializing sources, I would get (depending on the setup), say 20-25% CPU usage. When I stopped playing ambisonic sources so nothing was going to the decoder, the CPU usage would (unintuitively) go up, often to around 60%. Then when I resumed playing spatialized sources, the CPU would go back to 20-25%. When I stopped, back up to 60-65%. In other cases, the variation was from ~10-12% up to 18-20%. With the new version, the CPU never goes above the "regular" values, of say 25% in the first case (decoding with HOADecLebedev50) and about 12% in the second (decoding with HOADecLebedev26).

And with another test case from a live-coding set with extensive use of VSTPlugin, I would sometimes get CPU spikes of 70% or even beyond 100% (causing audible glitches). With this new version, so far I've never seen the CPU usage go above 20-25% in any of those cases...

So I'm very happy, this seems to be from this fix -- thanks to those who figured this out! It's like I've got a new laptop... (-;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issues that relate to unexpected/unwanted behavior. Don't use for PRs. comp: scsynth comp: supernova
Projects
None yet
Development

Successfully merging this pull request may close these issues.

scsynth on Windows does not guard denormals
5 participants