Skip to content

Commit

Permalink
Issue #75 (frontend function to bypass cpu-id and to set arch_id): re…
Browse files Browse the repository at this point in the history
…named LIBXSMM_JIT environment variable to LIBXSMM_TARGET and updated the documentation accordingly. Adjusted dispatch sample code to rely in target_arch API rather than parsing an environment variable. Updated interface documentation (C/C++ and FORTRAN). Clamped the result of libxsmm_get_target_archid in case the JIT backend is disabled at compile-time. Code cleanup.
  • Loading branch information
hfp committed May 17, 2016
1 parent f093543 commit c7ea23c
Show file tree
Hide file tree
Showing 7 changed files with 68 additions and 68 deletions.
2 changes: 1 addition & 1 deletion .travis.yml
Expand Up @@ -20,7 +20,7 @@ env:
global:
- UPLOAD_DIR=libxsmm
matrix:
- LIBXSMM_JIT=0 LIBXSMM_VERBOSE=1
- LIBXSMM_TARGET=0 LIBXSMM_VERBOSE=1
- DBG=1

language: cpp
Expand Down
4 changes: 2 additions & 2 deletions README.md
Expand Up @@ -228,7 +228,7 @@ HSW/SP TRY JIT STA COL
23..80 3 3 0 0
```
The tables are distinct between single-precision and double-precision, but either table is pruned if all counters are zero. If both tables are pruned, the library shows the code path which would have been used for JIT'ting the code: `LIBXSMM_JIT=hsw` (otherwise the code path is shown in the table's header). The actual counters are collected for three buckets: small kernels (MNK<sup>1/3</sup>&#160;\<=&#160;13), medium-sized kernels (13&#160;\<&#160;MNK<sup>1/3</sup>&#160;\<=&#160;23), and larger kernels (23&#160;\<&#160;MNK<sup>1/3</sup>&#160;\<=&#160;80; the actual upper bound depends on LIBXSMM_MAX_MNK as selected at compile-time). Keep in mind, that "larger" is supposedly still fairly small in terms of arithmetic intensity (which grows linearly with the kernel size). Unfortunately, the arithmetic intensity depends on the way a kernel is used (which operands are loaded/stored into main memory) and it is not performance-neutral to collect this information.
The tables are distinct between single-precision and double-precision, but either table is pruned if all counters are zero. If both tables are pruned, the library shows the code path which would have been used for JIT'ting the code: `LIBXSMM_TARGET=hsw` (otherwise the code path is shown in the table's header). The actual counters are collected for three buckets: small kernels (MNK<sup>1/3</sup>&#160;\<=&#160;13), medium-sized kernels (13&#160;\<&#160;MNK<sup>1/3</sup>&#160;\<=&#160;23), and larger kernels (23&#160;\<&#160;MNK<sup>1/3</sup>&#160;\<=&#160;80; the actual upper bound depends on LIBXSMM_MAX_MNK as selected at compile-time). Keep in mind, that "larger" is supposedly still fairly small in terms of arithmetic intensity (which grows linearly with the kernel size). Unfortunately, the arithmetic intensity depends on the way a kernel is used (which operands are loaded/stored into main memory) and it is not performance-neutral to collect this information.
The TRY counter represents all attempts to register statically generated kernels and all attempts to dynamically generate and register kernels. The JIT and STA counters distinct the aforementioned event into dynamically (JIT) and statically (STA) generated code but also count only actually registered kernels. In case the capacity (O(*n*)&#160;=&#160;10<sup>5</sup>) of the code registry is exhausted, no more kernels can be registered although further attempts are not prevented. Registering many kernels (O(*n*)&#160;=&#160;10<sup>3</sup>) may ramp the number of hash key collisions (COL), which can degrade performance. The latter is prevented if the small thread-local cache is effectively utilized.
Expand Down Expand Up @@ -338,7 +338,7 @@ There might be situations in which it is up-front not clear which problem sizes
2. There is no support for the Intel&#160;SSE (Intel&#160;Xeon 5500/5600 series) and IMCI (Intel&#160;Xeon&#160;Phi coprocessor code-named Knights Corner) instruction set extensions. However, statically generated SSE-kernels can be leveraged without disabling support for JIT'ting AVX kernels.
3. There is no support for the Windows calling convention (only kernels with PREFETCH=0 signature).
The JIT backend can also be disabled at build time (`make JIT=0`) as well as at runtime (`LIBXSMM_JIT=0`). The latter is an environment variable which is also allowing to set a code path independent of the CPUID (LIBXSMM_JIT=0|1|snb|hsw|knl|skx). Please note that LIBXSMM_JIT=1 is only supported for symmetry, and this environment setup cannot enable the JIT backend if it was disabled at build time (JIT=0).
The JIT backend can also be disabled at build time (`make JIT=0`) as well as at runtime (`LIBXSMM_TARGET=0`, or anything prior to Intel&#160;AVX). The latter is an environment variable which allows to set a code path independent of the CPUID (LIBXSMM_TARGET=0|1|sse|snb|hsw|knl|skx). Please note that LIBXSMM_TARGET cannot enable the JIT backend if it was disabled at build time (JIT=0).
One can use the aforementioned THRESHOLD parameter to control the matrix sizes for which the JIT compilation will be automatically performed. However, explicitly requested kernels (by calling `libxsmm_?mmdispatch`) are not subject to a problem size threshold. In any case, JIT code generation can be used for accompanying statically generated code.
Expand Down
6 changes: 2 additions & 4 deletions samples/dispatch/dispatch.c
Expand Up @@ -42,10 +42,8 @@ int main(int argc, char* argv[])
1 >= nthreads ? "" : "s");

#if 0 != LIBXSMM_JIT
{ const char *const jit = getenv("LIBXSMM_JIT");
if (0 != jit && '0' == *jit) {
fprintf(stderr, "\tWarning: JIT support has been disabled at runtime!\n");
}
if (LIBXSMM_X86_AVX > libxsmm_get_target_archid()) {
fprintf(stderr, "\tWarning: JIT support is not available at runtime!\n");
}
#else
fprintf(stderr, "\tWarning: JIT support has been disabled at build time!\n");
Expand Down
70 changes: 35 additions & 35 deletions src/libxsmm.c
Expand Up @@ -468,11 +468,11 @@ LIBXSMM_INLINE LIBXSMM_RETARGETABLE void internal_update_statistic(const libxsmm
}


LIBXSMM_INLINE LIBXSMM_RETARGETABLE const char* internal_get_target_arch(int archid);
LIBXSMM_INLINE LIBXSMM_RETARGETABLE const char* internal_get_target_arch(int archid)
LIBXSMM_INLINE LIBXSMM_RETARGETABLE const char* internal_get_target_arch(int id);
LIBXSMM_INLINE LIBXSMM_RETARGETABLE const char* internal_get_target_arch(int id)
{
const char* target_arch = 0;
switch (archid) {
switch (id) {
case LIBXSMM_X86_AVX512_CORE: {
target_arch = "skx";
} break;
Expand All @@ -497,7 +497,7 @@ LIBXSMM_INLINE LIBXSMM_RETARGETABLE const char* internal_get_target_arch(int arc
case LIBXSMM_TARGET_ARCH_GENERIC: {
target_arch = "generic";
} break;
default: if (LIBXSMM_X86_GENERIC <= archid) {
default: if (LIBXSMM_X86_GENERIC <= id) {
target_arch = "x86";
}
else {
Expand Down Expand Up @@ -683,7 +683,7 @@ LIBXSMM_INLINE LIBXSMM_RETARGETABLE internal_regentry* internal_init(void)
if (0 == result) {
int init_code;
/* set internal_target_archid */
libxsmm_set_target_arch(getenv("LIBXSMM_JIT"));
libxsmm_set_target_arch(getenv("LIBXSMM_TARGET"));
{ /* select prefetch strategy for JIT */
const char *const env_prefetch = getenv("LIBXSMM_PREFETCH");
if (0 == env_prefetch || 0 == *env_prefetch) {
Expand Down Expand Up @@ -924,7 +924,7 @@ LIBXSMM_RETARGETABLE void libxsmm_finalize(void)
{
const unsigned int linebreak = 0 == internal_print_statistic(stderr, target_arch, 1/*SP*/, 1, 0) ? 1 : 0;
if (0 == internal_print_statistic(stderr, target_arch, 0/*DP*/, linebreak, 0) && 0 != linebreak) {
fprintf(stderr, "LIBXSMM_JIT=%s\n", target_arch);
fprintf(stderr, "LIBXSMM_TARGET=%s\n", target_arch);
}
}
}
Expand All @@ -945,14 +945,14 @@ LIBXSMM_EXTERN_C LIBXSMM_RETARGETABLE int libxsmm_get_target_archid(void)
#if !defined(_WIN32) && !defined(__MIC__) && (!defined(__CYGWIN__) || !defined(NDEBUG)/*code-coverage with Cygwin; fails@runtime!*/)
return internal_target_archid;
#else /* no JIT support */
return LIBXSMM_TARGET_ARCH_GENERIC;
return LIBXSMM_MIN(internal_target_archid, LIBXSMM_X86_SSE4_2);
#endif
}


LIBXSMM_EXTERN_C LIBXSMM_RETARGETABLE void libxsmm_set_target_archid(int archid)
LIBXSMM_EXTERN_C LIBXSMM_RETARGETABLE void libxsmm_set_target_archid(int id)
{
switch (archid) {
switch (id) {
case LIBXSMM_X86_AVX512_CORE:
case LIBXSMM_X86_AVX512_MIC:
case LIBXSMM_X86_AVX2:
Expand All @@ -961,9 +961,9 @@ LIBXSMM_EXTERN_C LIBXSMM_RETARGETABLE void libxsmm_set_target_archid(int archid)
case LIBXSMM_X86_SSE4_1:
case LIBXSMM_X86_SSE3:
case LIBXSMM_TARGET_ARCH_GENERIC: {
internal_target_archid = archid;
internal_target_archid = id;
} break;
default: if (LIBXSMM_X86_GENERIC <= archid) {
default: if (LIBXSMM_X86_GENERIC <= id) {
internal_target_archid = LIBXSMM_X86_GENERIC;
}
else {
Expand All @@ -973,11 +973,11 @@ LIBXSMM_EXTERN_C LIBXSMM_RETARGETABLE void libxsmm_set_target_archid(int archid)

#if !defined(NDEBUG) /* library code is expected to be mute */
{
const int cpuid_archid = libxsmm_cpuid_x86();
if (cpuid_archid < internal_target_archid) {
const int cpuid = libxsmm_cpuid_x86();
if (cpuid < internal_target_archid) {
fprintf(stderr, "LIBXSMM: \"%s\" code will fail to run on \"%s\"!\n",
internal_get_target_arch(internal_target_archid),
internal_get_target_arch(cpuid_archid));
internal_get_target_arch(cpuid));
}
}
#endif
Expand All @@ -992,54 +992,54 @@ LIBXSMM_EXTERN_C LIBXSMM_RETARGETABLE const char* libxsmm_get_target_arch(void)


/* function serves as a helper for implementing the Fortran interface */
LIBXSMM_EXTERN_C LIBXSMM_RETARGETABLE void get_target_arch(char* target_arch, int length);
LIBXSMM_EXTERN_C LIBXSMM_RETARGETABLE void get_target_arch(char* target_arch, int length)
LIBXSMM_EXTERN_C LIBXSMM_RETARGETABLE void get_target_arch(char* arch, int length);
LIBXSMM_EXTERN_C LIBXSMM_RETARGETABLE void get_target_arch(char* arch, int length)
{
const char* c = libxsmm_get_target_arch();
int i;
assert(0 != target_arch); /* valid here since function is not in the public interface */
for (i = 0; i < length && 0 != *c; ++i, ++c) target_arch[i] = *c;
for (; i < length; ++i) target_arch[i] = ' ';
assert(0 != arch); /* valid here since function is not in the public interface */
for (i = 0; i < length && 0 != *c; ++i, ++c) arch[i] = *c;
for (; i < length; ++i) arch[i] = ' ';
}


LIBXSMM_EXTERN_C LIBXSMM_RETARGETABLE void libxsmm_set_target_arch(const char* name)
LIBXSMM_EXTERN_C LIBXSMM_RETARGETABLE void libxsmm_set_target_arch(const char* arch)
{
int target_archid = LIBXSMM_TARGET_ARCH_UNKNOWN;

if (name && *name) {
const int jit = atoi(name);
if (0 == strcmp("0", name)) {
if (arch && *arch) {
const int jit = atoi(arch);
if (0 == strcmp("0", arch)) {
target_archid = LIBXSMM_TARGET_ARCH_GENERIC;
}
else if (1 < jit) {
target_archid = LIBXSMM_X86_GENERIC + jit;
}
else if (0 == strcmp("skx", name) || 0 == strcmp("avx3", name) || 0 == strcmp("avx512", name)) {
else if (0 == strcmp("skx", arch) || 0 == strcmp("avx3", arch) || 0 == strcmp("avx512", arch)) {
target_archid = LIBXSMM_X86_AVX512_CORE;
}
else if (0 == strcmp("knl", name) || 0 == strcmp("mic2", name)) {
else if (0 == strcmp("knl", arch) || 0 == strcmp("mic2", arch)) {
target_archid = LIBXSMM_X86_AVX512_MIC;
}
else if (0 == strcmp("hsw", name) || 0 == strcmp("avx2", name)) {
else if (0 == strcmp("hsw", arch) || 0 == strcmp("avx2", arch)) {
target_archid = LIBXSMM_X86_AVX2;
}
else if (0 == strcmp("snb", name) || 0 == strcmp("avx", name)) {
else if (0 == strcmp("snb", arch) || 0 == strcmp("avx", arch)) {
target_archid = LIBXSMM_X86_AVX;
}
else if (0 == strcmp("wsm", name) || 0 == strcmp("nhm", name) || 0 == strcmp("sse4", name) || 0 == strcmp("sse4_2", name) || 0 == strcmp("sse4.2", name)) {
else if (0 == strcmp("wsm", arch) || 0 == strcmp("nhm", arch) || 0 == strcmp("sse4", arch) || 0 == strcmp("sse4_2", arch) || 0 == strcmp("sse4.2", arch)) {
target_archid = LIBXSMM_X86_SSE4_2;
}
else if (0 == strcmp("sse4_1", name) || 0 == strcmp("sse4.1", name)) {
else if (0 == strcmp("sse4_1", arch) || 0 == strcmp("sse4.1", arch)) {
target_archid = LIBXSMM_X86_SSE4_1;
}
else if (0 == strcmp("sse3", name) || 0 == strcmp("sse", name)) {
else if (0 == strcmp("sse3", arch) || 0 == strcmp("sse", arch)) {
target_archid = LIBXSMM_X86_SSE3;
}
else if (0 == strcmp("x86", name) || 0 == strcmp("sse2", name)) {
else if (0 == strcmp("x86", arch) || 0 == strcmp("sse2", arch)) {
target_archid = LIBXSMM_X86_GENERIC;
}
else if (0 == strcmp("generic", name) || 0 == strcmp("none", name)) {
else if (0 == strcmp("generic", arch) || 0 == strcmp("none", arch)) {
target_archid = LIBXSMM_TARGET_ARCH_GENERIC;
}
}
Expand All @@ -1049,11 +1049,11 @@ LIBXSMM_EXTERN_C LIBXSMM_RETARGETABLE void libxsmm_set_target_arch(const char* n
}
#if !defined(NDEBUG) /* library code is expected to be mute */
else {
const int cpuid_archid = libxsmm_cpuid_x86();
if (cpuid_archid < target_archid) {
const int cpuid = libxsmm_cpuid_x86();
if (cpuid < target_archid) {
fprintf(stderr, "LIBXSMM: \"%s\" code will fail to run on \"%s\"!\n",
internal_get_target_arch(target_archid),
internal_get_target_arch(cpuid_archid));
internal_get_target_arch(cpuid));
}
}
#endif
Expand Down
34 changes: 17 additions & 17 deletions src/libxsmm.template.f
Expand Up @@ -258,26 +258,25 @@ SUBROUTINE libxsmm_init() BIND(C)
SUBROUTINE libxsmm_finalize() BIND(C)
END SUBROUTINE

! Returns the architecture and instruction set extension
! as determined by the CPUID flags. 0 != LIBXSMM_JIT and
! LIBXSMM_X86_AVX <= result, then this instruction set
! extension is targeted by the JIT code generator.
! Returns the architecture and instruction set extension as determined
! by the CPUID flags, as set by the libxsmm_get_target_arch* functions,
! or as set by the LIBXSMM_TARGET environment variable.
INTEGER(C_INT) PURE FUNCTION libxsmm_get_target_archid() BIND(C)
IMPORT :: C_INT
END FUNCTION

! Set target architecture (archid: see PARAMETER enumeration)
! Set target architecture (id: see PARAMETER enumeration)
! for subsequent code generation (JIT).
SUBROUTINE libxsmm_set_target_archid(archid) BIND(C)
SUBROUTINE libxsmm_set_target_archid(id) BIND(C)
IMPORT :: C_INT
INTEGER(C_INT), INTENT(IN), VALUE :: archid
INTEGER(C_INT), INTENT(IN), VALUE :: id
END SUBROUTINE

! Set target architecture (id=0|wsm|snb|hsw|knl|skx, 0/NULL: CPUID)
! Set target architecture (arch="0|sse|snb|hsw|knl|skx", "0": CPUID)
! for subsequent code generation (JIT).
SUBROUTINE libxsmm_set_target_arch(name) BIND(C)
SUBROUTINE libxsmm_set_target_arch(arch) BIND(C)
IMPORT :: C_CHAR
CHARACTER(C_CHAR), INTENT(IN) :: name(*)
CHARACTER(C_CHAR), INTENT(IN) :: arch(*)
END SUBROUTINE

! Non-pure function returning the current clock tick
Expand Down Expand Up @@ -318,22 +317,23 @@ SUBROUTINE libxsmm_omp_dgemm(transa, transb, m, n, k, &
END INTERFACE$MNK_INTERFACE_LIST

CONTAINS
! Returns a name for the target architecture as identified
! by libxsmm_get_target_arch().
! Returns the name of the target architecture as determined by
! the CPUID flags, as set by the libxsmm_get_target_arch* functions,
! or as set by the LIBXSMM_TARGET environment variable.
!DIR$ ATTRIBUTES OFFLOAD:MIC :: libxsmm_get_target_arch
FUNCTION libxsmm_get_target_arch() RESULT(name)
CHARACTER(LEN=:), ALLOCATABLE :: name
FUNCTION libxsmm_get_target_arch() RESULT(arch)
CHARACTER(LEN=:), ALLOCATABLE :: arch
CHARACTER(LEN=16) :: tmp
!DIR$ ATTRIBUTES OFFLOAD:MIC :: get_target_arch
INTERFACE
PURE SUBROUTINE get_target_arch(name, length) BIND(C)
PURE SUBROUTINE get_target_arch(arch, length) BIND(C)
IMPORT :: C_CHAR, C_INT
CHARACTER(C_CHAR), INTENT(OUT) :: name(*)
CHARACTER(C_CHAR), INTENT(OUT) :: arch(*)
INTEGER(C_INT), VALUE, INTENT(IN) :: length
END SUBROUTINE
END INTERFACE
CALL get_target_arch(tmp, LEN(tmp))
name = TRIM(tmp)
arch = TRIM(tmp)
END FUNCTION

!DIR$ ATTRIBUTES OFFLOAD:MIC :: srealptr
Expand Down
18 changes: 10 additions & 8 deletions src/libxsmm.template.h
Expand Up @@ -90,18 +90,20 @@ LIBXSMM_EXTERN_C LIBXSMM_RETARGETABLE void libxsmm_init(void);
LIBXSMM_EXTERN_C LIBXSMM_RETARGETABLE void libxsmm_finalize(void);

/**
* Returns the architecture and instruction set extension as determined by the CPUID flags.
* If 0 != LIBXSMM_JIT and LIBXSMM_X86_AVX <= result, then this instruction set extension
* is targeted by the JIT code generator.
* Returns the architecture and instruction set extension as determined by the CPUID flags, as set
* by the libxsmm_get_target_arch* functions, or as set by the LIBXSMM_TARGET environment variable.
*/
LIBXSMM_EXTERN_C LIBXSMM_RETARGETABLE int libxsmm_get_target_archid(void);
/** Set target architecture (archid: see libxsmm_typedefs.h) for subsequent code generation (JIT). */
LIBXSMM_EXTERN_C LIBXSMM_RETARGETABLE void libxsmm_set_target_archid(int archid);
/** Set target architecture (id: see libxsmm_typedefs.h) for subsequent code generation (JIT). */
LIBXSMM_EXTERN_C LIBXSMM_RETARGETABLE void libxsmm_set_target_archid(int id);

/** Returns a name for the target architecture as identified by libxsmm_get_target_archid(). */
/**
* Returns the name of the target architecture as determined by the CPUID flags, as set by the
* libxsmm_get_target_arch* functions, or as set by the LIBXSMM_TARGET environment variable.
*/
LIBXSMM_EXTERN_C LIBXSMM_RETARGETABLE const char* libxsmm_get_target_arch(void);
/** Set target architecture (name=0|wsm|snb|hsw|knl|skx, 0/NULL: CPUID) for subsequent code generation (JIT). */
LIBXSMM_EXTERN_C LIBXSMM_RETARGETABLE void libxsmm_set_target_arch(const char* name);
/** Set target architecture (arch="0|sse|snb|hsw|knl|skx", NULL/"0": CPUID) for subsequent code generation (JIT). */
LIBXSMM_EXTERN_C LIBXSMM_RETARGETABLE void libxsmm_set_target_arch(const char* arch);

/** Query or JIT-generate a function; return zero if it does not exist or if JIT is not supported (descriptor form). */
LIBXSMM_EXTERN_C LIBXSMM_RETARGETABLE libxsmm_xmmfunction libxsmm_xmmdispatch(const libxsmm_gemm_descriptor* descriptor);
Expand Down
2 changes: 1 addition & 1 deletion version.txt
@@ -1 +1 @@
master-1.4.1-78
master-1.4.1-79

0 comments on commit c7ea23c

Please sign in to comment.