-
Notifications
You must be signed in to change notification settings - Fork 52
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
3 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1a6ea48
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relying on the info from http://store.steampowered.com/hwsurvey (click "Other Setting" at the bottom), SSE4.1 covers 80% of the market. At the same time, SSE2 is a standard feature. Thus throwing the question in the air - is this something that should still be supported?
@lemire, do you think SSE2 is not sensible anymore? I haven't dig deep through the code yet, but could jump for that as well, if you mean it could make sense.
Thanks.
1a6ea48
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think using SSE4.1 will improve performance. SSE3 to SSE4.x only contains small extensions from SSE2. Anyway the library uses SSE2 intrinsics, so if you want to improve performance you should rewrite library using instructions from SSE4.1. If you want better performance with new instructions you will need AVX2 instructions because AVX is only for floating-points, and AVX2 is introduced from Haswell.
1a6ea48
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weltling @yjh0502
We do not use SSE 4.1 to improve performance per se. But we need SSE 4.1 for intrinsics such as _mm_min_epu32 and _mm_max_epu32 which are very handy for Frame-Of-Reference coding (a new feature of the library).
If we have a portable way to check for SSE 4.1 support, then we can possibly only activate this feature when SSE 4.1 is available. The counterpart is that clang is not very good yet at reliably detecting SSE 4.1 support: it is likely that even if your CPU does support SSE 4.1, clang will not tell you (with -march=native).
1a6ea48
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lemire yeah, instead of the run time detection, i'd rather suggest going by compile time enablement. Maybe through some makefile option or alike. If we had some emulation layer, that could enable users to still compile an SSE2 capable lib. I hold the same opinion that games with cpuid are not reliable enough.
The only question - in how far it's important to not to loose those 20% of the potential machines. We could maintain our own emulation layer, or even require some external like http://sseplus.sourceforge.net/ , or even bundle some.
@yjh0502 performance is a good thing, however not a competition to compatibility. Fe a couple of months ago I made a bug using the LZCNT intrinsic, on the processors where it's not present, it kinda "works", but in completely unexpected way :)
Cheers.
1a6ea48
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weltling I agree. We could make it a compile-time option.
1a6ea48
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lemire, great :) I've put this on my todo. I think picking some useful parts from SSEPlus would go quick and were also compatible with the license.
Thanks.
1a6ea48
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weltling
Sure. It would be best to avoid complicated dependencies however, since C does not have a great portable way to cope with that.
1a6ea48
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lemire yep, i meant just to pick the functions needed, those two you've mentioned for the start. Including the whole SSEPlus were obviously an overkill. And even it wouldn't exist, it could be implemented. Then this emulation layer could be extended incrementally by need.