Skip to content

Commit

Permalink
Forcing SSE 4.1 support.
Browse files Browse the repository at this point in the history
  • Loading branch information
lemire committed May 25, 2015
1 parent c2842b9 commit 1a6ea48
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 3 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ format. It is up to the (sophisticated) user to create a compressed format.
Requirements
-------------

- Your processor should support SSE2 (Pentium4 or better)
- Your processor should support SSE4.1 (It is supported by most Intel and AMD processors released since 2008.)
- C99 compliant compiler (GCC is assumed)
- A Linux-like distribution is assumed by the makefile

Expand Down
4 changes: 2 additions & 2 deletions makefile
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
#
.SUFFIXES: .cpp .o .c .h
ifeq ($(DEBUG),1)
CFLAGS = -fPIC -std=c89 -ggdb -march=native -Wall -Wextra -pedantic
CFLAGS = -fPIC -std=c89 -ggdb -msse4.1 -march=native -Wall -Wextra -pedantic
else
CFLAGS = -fPIC -std=c89 -O3 -march=native -Wall -Wextra -pedantic
CFLAGS = -fPIC -std=c89 -O3 -msse4.1 -march=native -Wall -Wextra -pedantic
endif # debug
LDFLAGS = -shared
LIBNAME=libsimdcomp.so.0.0.3
Expand Down

8 comments on commit 1a6ea48

@weltling
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relying on the info from http://store.steampowered.com/hwsurvey (click "Other Setting" at the bottom), SSE4.1 covers 80% of the market. At the same time, SSE2 is a standard feature. Thus throwing the question in the air - is this something that should still be supported?

@lemire, do you think SSE2 is not sensible anymore? I haven't dig deep through the code yet, but could jump for that as well, if you mean it could make sense.

Thanks.

@yjh0502
Copy link

@yjh0502 yjh0502 commented on 1a6ea48 Jun 8, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think using SSE4.1 will improve performance. SSE3 to SSE4.x only contains small extensions from SSE2. Anyway the library uses SSE2 intrinsics, so if you want to improve performance you should rewrite library using instructions from SSE4.1. If you want better performance with new instructions you will need AVX2 instructions because AVX is only for floating-points, and AVX2 is introduced from Haswell.

@lemire
Copy link
Owner Author

@lemire lemire commented on 1a6ea48 Jun 8, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weltling @yjh0502

We do not use SSE 4.1 to improve performance per se. But we need SSE 4.1 for intrinsics such as _mm_min_epu32 and _mm_max_epu32 which are very handy for Frame-Of-Reference coding (a new feature of the library).

If we have a portable way to check for SSE 4.1 support, then we can possibly only activate this feature when SSE 4.1 is available. The counterpart is that clang is not very good yet at reliably detecting SSE 4.1 support: it is likely that even if your CPU does support SSE 4.1, clang will not tell you (with -march=native).

@weltling
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lemire yeah, instead of the run time detection, i'd rather suggest going by compile time enablement. Maybe through some makefile option or alike. If we had some emulation layer, that could enable users to still compile an SSE2 capable lib. I hold the same opinion that games with cpuid are not reliable enough.

The only question - in how far it's important to not to loose those 20% of the potential machines. We could maintain our own emulation layer, or even require some external like http://sseplus.sourceforge.net/ , or even bundle some.

@yjh0502 performance is a good thing, however not a competition to compatibility. Fe a couple of months ago I made a bug using the LZCNT intrinsic, on the processors where it's not present, it kinda "works", but in completely unexpected way :)

Cheers.

@lemire
Copy link
Owner Author

@lemire lemire commented on 1a6ea48 Jun 8, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weltling I agree. We could make it a compile-time option.

@weltling
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lemire, great :) I've put this on my todo. I think picking some useful parts from SSEPlus would go quick and were also compatible with the license.

Thanks.

@lemire
Copy link
Owner Author

@lemire lemire commented on 1a6ea48 Jun 8, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weltling

Sure. It would be best to avoid complicated dependencies however, since C does not have a great portable way to cope with that.

@weltling
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lemire yep, i meant just to pick the functions needed, those two you've mentioned for the start. Including the whole SSEPlus were obviously an overkill. And even it wouldn't exist, it could be implemented. Then this emulation layer could be extended incrementally by need.

Please sign in to comment.