Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add detection for Intel Advanced Matrix Extensions (AMX) instructions #231

Merged
merged 1 commit into from
Mar 28, 2024

Conversation

mingfeima
Copy link
Contributor

Tested using intel SDE: https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html

Test scripts:

bash scripts/local-build.sh

ISAS=()
OPTIONS=()
PLATFORMS=()

OPTIONS+=(-quark); PLATFORMS+=("Quark")
OPTIONS+=(-p4); PLATFORMS+=("Pentium4")
OPTIONS+=(-p4p); PLATFORMS+=("Pentium4 Prescott")
OPTIONS+=(-mrm); PLATFORMS+=("Merom")
OPTIONS+=(-pnr); PLATFORMS+=("Penryn")
OPTIONS+=(-nhm); PLATFORMS+=("Nehalem")
OPTIONS+=(-wsm); PLATFORMS+=("Westmere")
OPTIONS+=(-snb); PLATFORMS+=("Sandy Bridge")
OPTIONS+=(-ivb); PLATFORMS+=("Ivy Bridge")
OPTIONS+=(-hsw); PLATFORMS+=("Haswell")
OPTIONS+=(-bdw); PLATFORMS+=("Broadwell")
OPTIONS+=(-slt); PLATFORMS+=("Saltwell")
OPTIONS+=(-slm); PLATFORMS+=("Silvermont")
OPTIONS+=(-glm); PLATFORMS+=("Goldmont")
OPTIONS+=(-glp); PLATFORMS+=("Goldmont Plus")
OPTIONS+=(-tnt); PLATFORMS+=("Tremont")
OPTIONS+=(-snr); PLATFORMS+=("Snow Ridge")
OPTIONS+=(-skl); PLATFORMS+=("Skylake")
OPTIONS+=(-cnl); PLATFORMS+=("Cannon Lake")
OPTIONS+=(-icl); PLATFORMS+=("Ice Lake")
OPTIONS+=(-skx); PLATFORMS+=("Skylake server")
OPTIONS+=(-clx); PLATFORMS+=("Cascade Lake")
OPTIONS+=(-cpx); PLATFORMS+=("Cooper Lake")
OPTIONS+=(-icx); PLATFORMS+=("Ice Lake server")
OPTIONS+=(-knl); PLATFORMS+=("Knights landing")
OPTIONS+=(-knm); PLATFORMS+=("Knights mill")
OPTIONS+=(-tgl); PLATFORMS+=("Tiger Lake")
OPTIONS+=(-adl); PLATFORMS+=("Alder Lake")
OPTIONS+=(-mtl); PLATFORMS+=("Meteor Lake")
OPTIONS+=(-rpl); PLATFORMS+=("Raptor Lake")
OPTIONS+=(-spr); PLATFORMS+=("Sapphire Rapids")
OPTIONS+=(-gnr); PLATFORMS+=("Granite Rapids")
OPTIONS+=(-gnr256); PLATFORMS+=("Granite Rapids (AVX10.1 / 256VL)")
OPTIONS+=(-srf); PLATFORMS+=("Sierra Forest")
OPTIONS+=(-arl); PLATFORMS+=("Arrow Lake")
OPTIONS+=(-lnl); PLATFORMS+=("Lunar Lake")
OPTIONS+=(-future); PLATFORMS+=("Future chip")

ISAS+=("AMXBF16")
ISAS+=("AMXTILE")
ISAS+=("AMXINT8")
ISAS+=("AMXFP16")

SDE_BIN="/home/mingfeim/packages/sde-external-9.33.0-2024-01-07-lin/sde"

for I in "${!PLATFORMS[@]}"; do
  echo "${PLATFORMS["${I}"]}"
    for J in "${!ISAS[@]}"; do
      "${SDE_BIN}" "${OPTIONS[$I]}" -- ./build/local/isa-info | grep ${ISAS[$J]}
    done
done

Results:

Quark
SDE-ERROR: 64 bits applications are not supported by input chip: PENTIUM or by the input cpuid definition file
SDE-ERROR: 64 bits applications are not supported by input chip: PENTIUM or by the input cpuid definition file
SDE-ERROR: 64 bits applications are not supported by input chip: PENTIUM or by the input cpuid definition file
SDE-ERROR: 64 bits applications are not supported by input chip: PENTIUM or by the input cpuid definition file
Pentium4
SDE-ERROR: 64 bits applications are not supported by input chip: PENTIUM4 or by the input cpuid definition file
SDE-ERROR: 64 bits applications are not supported by input chip: PENTIUM4 or by the input cpuid definition file
SDE-ERROR: 64 bits applications are not supported by input chip: PENTIUM4 or by the input cpuid definition file
SDE-ERROR: 64 bits applications are not supported by input chip: PENTIUM4 or by the input cpuid definition file
Pentium4 Prescott
        AMXBF16: no
        AMXTILE: no
        AMXINT8: no
        AMXFP16: no
Merom
        AMXBF16: no
        AMXTILE: no
        AMXINT8: no
        AMXFP16: no
Penryn
        AMXBF16: no
        AMXTILE: no
        AMXINT8: no
        AMXFP16: no
Nehalem
        AMXBF16: no
        AMXTILE: no
        AMXINT8: no
        AMXFP16: no
Westmere
        AMXBF16: no
        AMXTILE: no
        AMXINT8: no
        AMXFP16: no
Sandy Bridge
        AMXBF16: no
        AMXTILE: no
        AMXINT8: no
        AMXFP16: no
Ivy Bridge
        AMXBF16: no
        AMXTILE: no
        AMXINT8: no
        AMXFP16: no
Haswell
        AMXBF16: no
        AMXTILE: no
        AMXINT8: no
        AMXFP16: no
Broadwell
        AMXBF16: no
        AMXTILE: no
        AMXINT8: no
        AMXFP16: no
Saltwell
        AMXBF16: no
        AMXTILE: no
        AMXINT8: no
        AMXFP16: no
Silvermont
        AMXBF16: no
        AMXTILE: no
        AMXINT8: no
        AMXFP16: no
Goldmont
        AMXBF16: no
        AMXTILE: no
        AMXINT8: no
        AMXFP16: no
Goldmont Plus
        AMXBF16: no
        AMXTILE: no
        AMXINT8: no
        AMXFP16: no
Tremont
        AMXBF16: no
        AMXTILE: no
        AMXINT8: no
        AMXFP16: no
Snow Ridge
        AMXBF16: no
        AMXTILE: no
        AMXINT8: no
        AMXFP16: no
Skylake
        AMXBF16: no
        AMXTILE: no
        AMXINT8: no
        AMXFP16: no
Cannon Lake
        AMXBF16: no
        AMXTILE: no
        AMXINT8: no
        AMXFP16: no
Ice Lake
        AMXBF16: no
        AMXTILE: no
        AMXINT8: no
        AMXFP16: no
Skylake server
        AMXBF16: no
        AMXTILE: no
        AMXINT8: no
        AMXFP16: no
Cascade Lake
        AMXBF16: no
        AMXTILE: no
        AMXINT8: no
        AMXFP16: no
Cooper Lake
        AMXBF16: no
        AMXTILE: no
        AMXINT8: no
        AMXFP16: no
Ice Lake server
        AMXBF16: no
        AMXTILE: no
        AMXINT8: no
        AMXFP16: no
Knights landing
        AMXBF16: no
        AMXTILE: no
        AMXINT8: no
        AMXFP16: no
Knights mill
        AMXBF16: no
        AMXTILE: no
        AMXINT8: no
        AMXFP16: no
Tiger Lake
        AMXBF16: no
        AMXTILE: no
        AMXINT8: no
        AMXFP16: no
Alder Lake
        AMXBF16: no
        AMXTILE: no
        AMXINT8: no
        AMXFP16: no
Meteor Lake
        AMXBF16: no
        AMXTILE: no
        AMXINT8: no
        AMXFP16: no
Raptor Lake
        AMXBF16: no
        AMXTILE: no
        AMXINT8: no
        AMXFP16: no
Sapphire Rapids
        AMXBF16: yes
        AMXTILE: yes
        AMXINT8: yes
        AMXFP16: no
Granite Rapids
        AMXBF16: yes
        AMXTILE: yes
        AMXINT8: yes
        AMXFP16: yes
Granite Rapids (AVX10.1 / 256VL)
        AMXBF16: yes
        AMXTILE: yes
        AMXINT8: yes
        AMXFP16: yes
Sierra Forest
        AMXBF16: no
        AMXTILE: no
        AMXINT8: no
        AMXFP16: no
Arrow Lake
        AMXBF16: no
        AMXTILE: no
        AMXINT8: no
        AMXFP16: no
Lunar Lake
        AMXBF16: no
        AMXTILE: no
        AMXINT8: no
        AMXFP16: no
Future chip
        AMXBF16: yes
        AMXTILE: yes
        AMXINT8: yes
        AMXFP16: yes

Copy link

@jgong5 jgong5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation looks problematic (guess tab vs. space). Others LGTM.

@mingfeima
Copy link
Contributor Author

@malfet @xuhancn @jgong5 could you please help review this one ? thx!

@mingfeima mingfeima force-pushed the pr_add_amx_support branch 7 times, most recently from 988603a to 94a969a Compare March 25, 2024 08:05
@mingfeima
Copy link
Contributor Author

The indentation looks problematic (guess tab vs. space). Others LGTM.

Fixed!

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, but let's add a separator for those names to match how it's done for AVX512.

Another question: are there CPUs on the market that say has fp16 but not int8 AMX support?

Would be good to add a much more details description with links back to the docs explaning what those extensions do and what CPUs support them

include/cpuinfo.h Outdated Show resolved Hide resolved
tools/isa-info.c Outdated Show resolved Hide resolved
@mingfeima
Copy link
Contributor Author

mingfeima commented Mar 26, 2024

Overall LGTM, but let's add a separator for those names to match how it's done for AVX512.

Another question: are there CPUs on the market that say has fp16 but not int8 AMX support?

Would be good to add a much more details description with links back to the docs explaning what those extensions do and what CPUs support them

Currently we do not have platforms that supports amx-fp16 but not amx-int8.
I put a note in before the amx detection functions:

/* [NOTE] Intel Advanced Matrix Extensions (AMX) detection
 *
 * I.  AMX is a new extensions to the x86 ISA to work on matrices, consists of
 *   1) 2-dimentional registers (tiles), hold sub-matrices from larger matrices in memory
 *   2) Accelerator called Tile Matrix Multiply (TMUL), contains instructions operating on tiles
 *
 * II. Platforms that supports AMX:
 * +-----------------+-----+----------+----------+----------+----------+
 * |    Platforms    | Gen | amx-bf16 | amx-tile | amx-int8 | amx-fp16 |
 * +-----------------+-----+----------+----------+----------+----------+
 * | Sapphire Rapids | 4th |   YES    |   YES    |   YES    |    NO    |
 * +-----------------+-----+----------+----------+----------+----------+
 * | Emerald Rapids  | 5th |   YES    |   YES    |   YES    |    NO    |
 * +-----------------+-----+----------+----------+----------+----------+
 * | Granite Rapids  | 6th |   YES    |   YES    |   YES    |   YES    |
 * +-----------------+-----+----------+----------+----------+----------+
 *
 * Reference: https://www.intel.com/content/www/us/en/products/docs
 *    /accelerator-engines/advanced-matrix-extensions/overview.html

@malfet If you find a better place to put this note, please let me know!

@malfet malfet merged commit f42f5ea into pytorch:main Mar 28, 2024
11 checks passed
@@ -812,6 +812,10 @@ struct cpuinfo_x86_isa {
bool avx512vp2intersect;
bool avx512_4vnniw;
bool avx512_4fmaps;
bool amx_bf16;
bool amx_tile;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove amx_tile?
is tile useful? all cpus that support amx_bf16 or amx_int8 will support amx_tile, and amd_tile by itself is not useful?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest we keep these flags at the low level that are mapped exactly to underlying CPU ISA feature bits. We can probably have some helper functions like has_amx_support at the higher level for ease of use purposes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is true that the existing platforms that supports amx_bf16 or amx_int8 will support amx_tile. However I prefer to leave a low level flag here, just in case of possible future changes.

* AMX_TILE instructions:
* - Intel: edx[bit 24] in structured feature info (ecx = 0).
*/
isa.amx_tile = avx512_regs && !!(structured_feature_info0.edx & UINT32_C(0x01000000));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you confirm this works on gnr256 with avx10 but not avx512?

Copy link
Contributor Author

@mingfeima mingfeima Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to this:

Granite Rapids
        AMXBF16: yes
        AMXTILE: yes
        AMXINT8: yes
        AMXFP16: yes
Granite Rapids (AVX10.1 / 256VL)
        AMXBF16: yes
        AMXTILE: yes
        AMXINT8: yes
        AMXFP16: yes

Results collected with intel Software Development Emulator

quote from https://www.tomshardware.com/news/intels-new-avx10-brings-avx-512-capabilities-to-e-cores

Intel will support AVX10 version 1 (AVX10.1) beginning with its sixth-gen Xeon "Granite Rapids" chips, but that generation will only support 512-bit vector instructions, and not the new converged 256-bit vector instructions. Instead, this first gen will serve as the transition chip from AVX-512 to AVX10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants