Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arch-vega: Template MFMA instructions #1128

Merged
merged 11 commits into from
May 22, 2024

Conversation

mjkpolo
Copy link
Contributor

@mjkpolo mjkpolo commented May 13, 2024

templated

  • v_mfma_f64_16x16x4f64

added support for

  • v_mfma_f32_32x32x2f32
  • v_mfma_f32_4x4x1_16b_f32
  • v_mfma_f32_16x16x4f32

formula for gprs needed

formulas for register layouts and lanes used in computation

Change-Id: I15d6c0a5865d58323ae8dbcb3f6dcb701a9ab3c7

@mattsinc mattsinc requested review from mattsinc and abmerop and removed request for mattsinc May 13, 2024 20:24
@mattsinc mattsinc added the arch-vega The VEGA ISA label May 13, 2024
@abmerop abmerop force-pushed the feature-mfma-template branch 2 times, most recently from 9a67157 to c0f8c67 Compare May 14, 2024 14:55
@abmerop
Copy link
Member

abmerop commented May 15, 2024

Added new unit test for the new data types.

@abmerop
Copy link
Member

abmerop commented May 15, 2024

This is ready from my side.

src/arch/amdgpu/common/dtype/fp8_e4m3.hh Outdated Show resolved Hide resolved
src/arch/amdgpu/common/dtype/fp8_e5m2.hh Outdated Show resolved Hide resolved
src/arch/amdgpu/common/dtype/mxfp.hh Show resolved Hide resolved
src/arch/amdgpu/common/dtype/mxfp.hh Show resolved Hide resolved
src/arch/amdgpu/common/dtype/mxfp.hh Show resolved Hide resolved
src/arch/amdgpu/vega/insts/instructions.hh Show resolved Hide resolved
src/arch/amdgpu/vega/insts/mubuf.cc Show resolved Hide resolved
src/arch/amdgpu/vega/insts/op_encodings.hh Show resolved Hide resolved
src/arch/amdgpu/vega/insts/vop1.cc Show resolved Hide resolved
src/arch/amdgpu/vega/insts/vop3.cc Outdated Show resolved Hide resolved
@mattsinc
Copy link
Contributor

@mjkpolo feel free to add comments as appropriate.

Copy link
Member

@abmerop abmerop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough review! Fixed most of your suggestions or explained why I didn't

src/arch/amdgpu/common/dtype/fp8_e4m3.hh Outdated Show resolved Hide resolved
src/arch/amdgpu/common/dtype/fp8_e5m2.hh Outdated Show resolved Hide resolved
bias = 15,

inf = 0x7c000000,
nan = 0x7c001000,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this btw. Last 4 hex digits should be 0 for this type. Changed to 0x7c10'0000

src/arch/amdgpu/common/dtype/mxfp.hh Show resolved Hide resolved
src/arch/amdgpu/common/dtype/mxfp.hh Show resolved Hide resolved
src/arch/amdgpu/vega/insts/instructions.hh Show resolved Hide resolved
src/arch/amdgpu/vega/insts/mubuf.cc Show resolved Hide resolved
src/arch/amdgpu/vega/insts/op_encodings.hh Show resolved Hide resolved
src/arch/amdgpu/vega/insts/vop1.cc Show resolved Hide resolved
src/arch/amdgpu/vega/insts/vop3.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@mattsinc mattsinc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the one comment seems to be missing or I'm missing it?

src/arch/amdgpu/vega/insts/mubuf.cc Show resolved Hide resolved
@abmerop
Copy link
Member

abmerop commented May 17, 2024

Just the one comment seems to be missing or I'm missing it?

Hmm, I must have messed up the rebase. I won't be able to update until Monday, unfortunately.

mjkpolo and others added 11 commits May 20, 2024 09:27
templated
- v_mfma_f64_16x16x4f64

added support for
- v_mfma_f32_32x32x2f32
- v_mfma_f32_4x4x1_16b_f32
- v_mfma_f32_16x16x4f32

[formula for gprs needed](https://github.com/ROCm/amd_matrix_instruction_calculator)

[formulas for register layouts and lanes used in computation](https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/instruction-set-architectures/amd-instinct-mi300-cdna3-instruction-set-architecture.pdf)

Change-Id: I15d6c0a5865d58323ae8dbcb3f6dcb701a9ab3c7
The open compute project (OCP) microscaling formats (MX) are used in the
GPU model. The specification is available at [1]. This implements a C++
version of MXFP formats with many constraints that conform to the
specification.

Actually arithmetic is not performed directly on the MXFP types. They
are rather converted to fp32 and the computation is performed. For most
of these types this is acceptable for the GPU model as there are no
instruction which directly perform arithmetic on them. For example, the
DOT/MFMA instructions operating may first convert to FP32 and then
perform arithmetic.

Change-Id: I7235722627f7f66c291792b5dbf9e3ea2f67883e
This instruction serves as a test for the MXFP8 type.

Change-Id: I2ce30bf7f3a3ecc850a445aebdf971c37c39a79e
This class can be used to load multiple operand dwords into an array and
then select bits from the span of that array. It handles cases where the
bits span two dwords (e.g., you have four dwords for a 128-bit value and
want to select bits 35:30) and cases where multiple values < 32-bits are
packed into a single dword (e.g., two bf16 values).

This is most useful for packed arrays and instructions which have more
than two dwords. Beyond two dwords, the operator[] overload of
VectorOperand is not available requiring additional logic to select from
an operand. This helper class handles that additional logic itself.

Change-Id: I74856d0f312f7549b3b6c405ab71eb2b174c70ac
The microscaling formats (MXFP) and INT8 types require additional size
checks which are not needed for the current MFMA template. The size
check is done using a constexpr method exclusive to the MXFP type,
therefore create a special class for MXFP types. This is preferrable to
attempting to shoehorn into the existing template as it helps with
readability. Similar, INT8 requires a size check to determine number of
elements per VGPR, but it not an MXFP type. Create a special template
for that as well.

This additionally implements all of the MFMA types which have test cases
in the amd-lab-notes repository (https://github.com/amd/amd-lab-notes/).
The implementations were tested using the applications in the
matrix-cores subfolder and achieve L2 norms equivalent or better than
MI200 hardware.

Change-Id: Ia5ae89387149928905e7bcd25302ed3d1df6af38
This adds the decodings for all of the matrix fused multiply add (MFMA)
and sparse matrix fused multiply accumulate (SMFMAC) instructions up to
and including MI300. This does not yet provide the implementation for
these instructions, however it is easier and less tedious to add them in
bulk rather that one at a time.

Change-Id: I5acd23ca8a26bdec843bead545d1f8820ad95b41
This instruction is new in MI300 and is used in some of the example
applications used to test MFMAs.

Change-Id: I739f8ab2be6a93ee3b6bdc4120d0117724edb0d4
These instructions are used in some of the F16 MFMA example applications
to convert to/from floating point types.

Change-Id: I7426ea663ce11a39fe8c60c8006d8cca11cfaf07
Add a unit test for the MXFP types (bf16, fp16, fp8, bf8). These types
are not currently operated on directly. Instead the are cast to float
values and then arithmetic is performed. As a result, the unit test
simply checks that when we convert a value from MXFP type to float and
back that the values of the MXFP type match. Exact values are used to
avoid discrepancies with rounding.

Can be run using scons build/VEGA_X86/unittests.opt .

Change-Id: I596e9368eb929d239dd2d917e3abd7927b15b71e
Implement a bfloat16 MFMA. This was tested with PyTorch using
dtype=torch.bfloat16.

Change-Id: I35b4e60e71477553a93020ef0ee31d1bcae9ca5d
This implements some missing loads and store that are commonly used in
applications with MFMA instructions to load 16-bit data types into
specific register locations: DS_READ_U16_D16, DS_READ_U16_D16_HI,
BUFFER_LOAD_SHORT_D16, BUFFER_LOAD_SHORT_D16_HI.

Change-Id: Ie22d81ef010328f4541553a9a674764dc16a9f4d
@abmerop abmerop added this to the v24.0 milestone May 20, 2024
@ivanaamit
Copy link
Contributor

@abmerop, could you let us know if this is complete and ready for a re-review? Thanks.

@abmerop
Copy link
Member

abmerop commented May 20, 2024

This is complete. I think @mattsinc just wanted one additional comment which I have added

@ivanaamit ivanaamit dismissed mattsinc’s stale review May 22, 2024 14:05

The requested comment was added.

@abmerop abmerop merged commit 1616d34 into gem5:develop May 22, 2024
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-vega The VEGA ISA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants