Skip to content

Commit

Permalink
Remove GEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK
Browse files Browse the repository at this point in the history
Fixes #117
  • Loading branch information
jart committed Jan 13, 2018
1 parent fcf32e7 commit 01448f1
Show file tree
Hide file tree
Showing 4 changed files with 13 additions and 32 deletions.
20 changes: 13 additions & 7 deletions README.md
Expand Up @@ -10,6 +10,8 @@ The meaning of "low precision" is detailed in this document:

Some of the general design is explained in [doc/design.md](doc/design.md).

**Warning:** This library goes very slow if compiled incorrectly; see below.

## Disclaimer

This is not an official Google product (experimental or otherwise), it is just
Expand Down Expand Up @@ -54,13 +56,17 @@ Current optimized code paths:
* ARM with NEON (both 32bit and 64bit).
* Intel x86 with SSE 4.1 (both 32bit and 64bit).

If you are building for x86, it's important that you pass in the `-msse4.1`
compiler flag when building, or you'll end up using slow reference code. If
you're building with Bazel, you can do this by running `bazel build gemmlowp:all
--copt=-msse4.1`. If you're building for a machine with no SIMD support in
gemmlowp then by default you'll see an error. If you want to run with the
reference implementations anyway, you can override the error by specifying
`GEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK` as a build define.
When building for x86, it's very important to pass `-msse4.1` to the compiler,
otherwise gemmlowp will use slow reference code. Bazel users can compile by
running `bazel build --copt=-msse4.1 //gemmlowp:all`. The compiled binary should
work on all Intel CPUs since 2008 (including low power microarchitectures) as
well as AMD CPUs since 2011.

Please note when compiling binaries that don't need to be distributed, it's
generally a better idea to pass `-march=native` to the compiler. That flag
implies `-msse4.1` flag, along with others that might be helpful. This of course
assumes the host machine supports those instructions. Bazel users should prefer
to run `bazel build --config=opt //gemmlowp:all` instead.

Details of what it takes to make an efficient port of gemmlowp, namely writing a
suitable GEMM kernel and accompanying packing code, are explained in this file:
Expand Down
5 changes: 0 additions & 5 deletions contrib/CMakeLists.txt
Expand Up @@ -12,11 +12,6 @@ set(CMAKE_CXX_STANDARD 11)

get_filename_component(gemmlowp_src ${gemmlowp_SOURCE_DIR} PATH)

# Enabling SIMD is recommended for modern x86 machines
#add_definitions(-msse4)
# However here we take the best compatibility
add_definitions(-DGEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK)

if(WIN32)
# one can enable simd from the cmake command line, ie -DCMAKE_CXX_FLAGS="/arch:AVX2
add_definitions(-DNOMINMAX -DWIN64 -DWIN32_LEAN_AND_MEAN -DNOGDI)
Expand Down
3 changes: 0 additions & 3 deletions eight_bit_int_gemm/eight_bit_int_gemm.cc
Expand Up @@ -12,9 +12,6 @@
// See the License for the specific language governing permissions and
// limitations under the License.

#ifndef GEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK
#define GEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK
#endif
#include "eight_bit_int_gemm.h"

#include <memory>
Expand Down
17 changes: 0 additions & 17 deletions internal/kernel_default.h
Expand Up @@ -79,23 +79,6 @@ GEMMLOWP_SET_DEFAULT_KERNEL(false, false, SSE4_32_Kernel4x4Depth2)
#include "kernel_sse.h"
GEMMLOWP_SET_DEFAULT_KERNEL(false, false, SSE4_64_Kernel12x4Depth2)
#else
#ifndef GEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK
#if defined __ARM_ARCH_5TE__
// SIMD is not available on this platform. The slow fallback will be used.
// Don't require GEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK because there's nothing
// the user can do about it.
#elif defined __powerpc__
// There is currently no fast kernel using SIMD instructions on POWER. Don't
// require GEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK because there's nothing the user
// can do about it.
#else
#error \
"SIMD not enabled, you'd be getting a slow software fallback. Consider \
enabling SIMD extensions (for example using -msse4 if you're on modern x86). \
If that's not an option, and you would like to continue with the \
slow fallback, define GEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK."
#endif
#endif
#include "kernel_reference.h"
namespace gemmlowp {
typedef ReferenceKernel<KernelFormat<
Expand Down

0 comments on commit 01448f1

Please sign in to comment.