Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] Add disassembler diagnostics for invalid kernel descriptors #87400

Merged
merged 7 commits into from
Apr 18, 2024

Conversation

epilk
Copy link
Member

@epilk epilk commented Apr 2, 2024

These mostly are checking for various reserved bits being set. The diagnostics for gpu-dependent reserved bits have a bit more context since they seem like the most likely ones to be observed in practice.

Previously these were being ignored, but no disassembler actually used
them, so this is really NFC. This does repurpose the stated use of the
CStream parameter, which was for any comment. The AMDGPU and WebAssembly
disassemblers both just emit comments to outs(), so having a dedicated
parameter for this seems unnecessary.
These mostly are checking for various reserved bits being set. The
diagnostics for gpu-dependent reserved bits have a bit more context
since they seem like the most likely ones to be observed in practice.
@llvmbot
Copy link
Collaborator

llvmbot commented Apr 3, 2024

@llvm/pr-subscribers-mc
@llvm/pr-subscribers-llvm-binary-utilities
@llvm/pr-subscribers-backend-webassembly

@llvm/pr-subscribers-backend-amdgpu

Author: Emma Pilkington (epilk)

Changes

These mostly are checking for various reserved bits being set. The diagnostics for gpu-dependent reserved bits have a bit more context since they seem like the most likely ones to be observed in practice.


Patch is 31.10 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/87400.diff

6 Files Affected:

  • (modified) llvm/docs/AMDGPUUsage.rst (+3-3)
  • (modified) llvm/include/llvm/MC/MCDisassembler/MCDisassembler.h (+3-2)
  • (modified) llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp (+131-49)
  • (added) llvm/test/MC/Disassembler/AMDGPU/kernel-descriptor-errors.test (+73)
  • (added) llvm/test/MC/Disassembler/AMDGPU/kernel-descriptor-rsrc-errors.test (+92)
  • (modified) llvm/tools/llvm-objdump/llvm-objdump.cpp (+21-4)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 22c1d1f186ea54..0055b72160bfef 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -4652,7 +4652,7 @@ The fields used by CP for code objects before V3 also match those specified in
                                                      entry point instruction
                                                      which must be 256 byte
                                                      aligned.
-     351:272 20                                      Reserved, must be 0.
+     351:192 20                                      Reserved, must be 0.
              bytes
      383:352 4 bytes COMPUTE_PGM_RSRC3               GFX6-GFX9
                                                        Reserved, must be 0.
@@ -5248,14 +5248,14 @@ The fields used by CP for code objects before V3 also match those specified in
      5:0     6 bits  ACCUM_OFFSET                    Offset of a first AccVGPR in the unified register file. Granularity 4.
                                                      Value 0-63. 0 - accum-offset = 4, 1 - accum-offset = 8, ...,
                                                      63 - accum-offset = 256.
-     6:15    10                                      Reserved, must be 0.
+     15:6    10                                      Reserved, must be 0.
              bits
      16      1 bit   TG_SPLIT                        - If 0 the waves of a work-group are
                                                        launched in the same CU.
                                                      - If 1 the waves of a work-group can be
                                                        launched in different CUs. The waves
                                                        cannot use S_BARRIER or LDS.
-     17:31   15                                      Reserved, must be 0.
+     31:17   15                                      Reserved, must be 0.
              bits
      32      **Total size 4 bytes.**
      ======= ===================================================================================================================
diff --git a/llvm/include/llvm/MC/MCDisassembler/MCDisassembler.h b/llvm/include/llvm/MC/MCDisassembler/MCDisassembler.h
index 7dd8b0b7d3778d..21a6e92a2ab7c3 100644
--- a/llvm/include/llvm/MC/MCDisassembler/MCDisassembler.h
+++ b/llvm/include/llvm/MC/MCDisassembler/MCDisassembler.h
@@ -147,7 +147,8 @@ class MCDisassembler {
   /// \param Address  - The address, in the memory space of region, of the first
   ///                   byte of the symbol.
   /// \param Bytes    - A reference to the actual bytes at the symbol location.
-  /// \param CStream  - The stream to print comments and annotations on.
+  /// \param ErrS     - A stream to print newline separated error comments that
+  ///                   will be emitted if Fail is returned.
   /// \return         - MCDisassembler::Success if bytes are decoded
   ///                   successfully. Size must hold the number of bytes that
   ///                   were decoded.
@@ -160,7 +161,7 @@ class MCDisassembler {
   ///                   case.
   virtual std::optional<DecodeStatus>
   onSymbolStart(SymbolInfoTy &Symbol, uint64_t &Size, ArrayRef<uint8_t> Bytes,
-                uint64_t Address, raw_ostream &CStream) const;
+                uint64_t Address, raw_ostream &ErrS) const;
   // TODO:
   // Implement similar hooks that can be used at other points during
   // disassembly. Something along the following lines:
diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
index 8c42304ce0bee5..95278783952d92 100644
--- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
+++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
@@ -33,6 +33,7 @@
 #include "llvm/MC/MCSubtargetInfo.h"
 #include "llvm/MC/TargetRegistry.h"
 #include "llvm/Support/AMDHSAKernelDescriptor.h"
+#include "llvm/Support/SaveAndRestore.h"
 
 using namespace llvm;
 
@@ -1780,6 +1781,29 @@ bool AMDGPUDisassembler::hasKernargPreload() const {
 //===----------------------------------------------------------------------===//
 // AMDGPU specific symbol handling
 //===----------------------------------------------------------------------===//
+
+/// Print a string describing the reserved bit range specified by Mask with
+/// offset BaseBytes for use in error comments. Mask is a single continuous
+/// range of 1s surrounded by zeros. The format here is meant to align with the
+/// tables that describe these bits in llvm.org/docs/AMDGPUUsage.html.
+static SmallString<20> getBitRangeFromMask(uint32_t Mask, unsigned BaseBytes) {
+  SmallString<20> Result;
+  raw_svector_ostream S(Result);
+
+  int TrailingZeros = llvm::countr_zero(Mask);
+  int PopCount = llvm::popcount(Mask);
+
+  if (PopCount == 1) {
+    S << "bit (" << (TrailingZeros + BaseBytes * CHAR_BIT) << ")";
+  } else {
+    S << "bits in range ("
+      << (TrailingZeros + PopCount - 1 + BaseBytes * CHAR_BIT) << ":"
+      << (TrailingZeros + BaseBytes * CHAR_BIT) << ")";
+  }
+
+  return Result;
+}
+
 #define GET_FIELD(MASK) (AMDHSA_BITS_GET(FourByteBuffer, MASK))
 #define PRINT_DIRECTIVE(DIRECTIVE, MASK)                                       \
   do {                                                                         \
@@ -1791,6 +1815,23 @@ bool AMDGPUDisassembler::hasKernargPreload() const {
              << GET_FIELD(MASK) << '\n';                                       \
   } while (0)
 
+#define CHECK_RESERVED_BITS_IMPL(MASK, DESC, MSG)                              \
+  do {                                                                         \
+    if (FourByteBuffer & (MASK)) {                                             \
+      *CommentStream << "kernel descriptor " DESC " reserved "                 \
+                     << getBitRangeFromMask((MASK), 0) << " set" MSG "\n";     \
+      return MCDisassembler::Fail;                                             \
+    }                                                                          \
+  } while (0)
+
+#define CHECK_RESERVED_BITS(MASK) CHECK_RESERVED_BITS_IMPL(MASK, #MASK, "")
+#define CHECK_RESERVED_BITS_MSG(MASK, MSG)                                     \
+  CHECK_RESERVED_BITS_IMPL(MASK, #MASK, ", " MSG)
+#define CHECK_RESERVED_BITS_DESC(MASK, DESC)                                   \
+  CHECK_RESERVED_BITS_IMPL(MASK, DESC, "")
+#define CHECK_RESERVED_BITS_DESC_MSG(MASK, DESC, MSG)                          \
+  CHECK_RESERVED_BITS_IMPL(MASK, DESC, ", " MSG)
+
 // NOLINTNEXTLINE(readability-identifier-naming)
 MCDisassembler::DecodeStatus AMDGPUDisassembler::decodeCOMPUTE_PGM_RSRC1(
     uint32_t FourByteBuffer, raw_string_ostream &KdStream) const {
@@ -1833,8 +1874,9 @@ MCDisassembler::DecodeStatus AMDGPUDisassembler::decodeCOMPUTE_PGM_RSRC1(
   uint32_t GranulatedWavefrontSGPRCount =
       GET_FIELD(COMPUTE_PGM_RSRC1_GRANULATED_WAVEFRONT_SGPR_COUNT);
 
-  if (isGFX10Plus() && GranulatedWavefrontSGPRCount)
-    return MCDisassembler::Fail;
+  if (isGFX10Plus())
+    CHECK_RESERVED_BITS_MSG(COMPUTE_PGM_RSRC1_GRANULATED_WAVEFRONT_SGPR_COUNT,
+                            "must be zero on gfx10+");
 
   uint32_t NextFreeSGPR = (GranulatedWavefrontSGPRCount + 1) *
                           AMDGPU::IsaInfo::getSGPREncodingGranule(&STI);
@@ -1845,8 +1887,7 @@ MCDisassembler::DecodeStatus AMDGPUDisassembler::decodeCOMPUTE_PGM_RSRC1(
   KdStream << Indent << ".amdhsa_reserve_xnack_mask " << 0 << '\n';
   KdStream << Indent << ".amdhsa_next_free_sgpr " << NextFreeSGPR << "\n";
 
-  if (FourByteBuffer & COMPUTE_PGM_RSRC1_PRIORITY)
-    return MCDisassembler::Fail;
+  CHECK_RESERVED_BITS(COMPUTE_PGM_RSRC1_PRIORITY);
 
   PRINT_DIRECTIVE(".amdhsa_float_round_mode_32",
                   COMPUTE_PGM_RSRC1_FLOAT_ROUND_MODE_32);
@@ -1857,37 +1898,33 @@ MCDisassembler::DecodeStatus AMDGPUDisassembler::decodeCOMPUTE_PGM_RSRC1(
   PRINT_DIRECTIVE(".amdhsa_float_denorm_mode_16_64",
                   COMPUTE_PGM_RSRC1_FLOAT_DENORM_MODE_16_64);
 
-  if (FourByteBuffer & COMPUTE_PGM_RSRC1_PRIV)
-    return MCDisassembler::Fail;
+  CHECK_RESERVED_BITS(COMPUTE_PGM_RSRC1_PRIV);
 
   if (!isGFX12Plus())
     PRINT_DIRECTIVE(".amdhsa_dx10_clamp",
                     COMPUTE_PGM_RSRC1_GFX6_GFX11_ENABLE_DX10_CLAMP);
 
-  if (FourByteBuffer & COMPUTE_PGM_RSRC1_DEBUG_MODE)
-    return MCDisassembler::Fail;
+  CHECK_RESERVED_BITS(COMPUTE_PGM_RSRC1_DEBUG_MODE);
 
   if (!isGFX12Plus())
     PRINT_DIRECTIVE(".amdhsa_ieee_mode",
                     COMPUTE_PGM_RSRC1_GFX6_GFX11_ENABLE_IEEE_MODE);
 
-  if (FourByteBuffer & COMPUTE_PGM_RSRC1_BULKY)
-    return MCDisassembler::Fail;
-
-  if (FourByteBuffer & COMPUTE_PGM_RSRC1_CDBG_USER)
-    return MCDisassembler::Fail;
+  CHECK_RESERVED_BITS(COMPUTE_PGM_RSRC1_BULKY);
+  CHECK_RESERVED_BITS(COMPUTE_PGM_RSRC1_CDBG_USER);
 
   if (isGFX9Plus())
     PRINT_DIRECTIVE(".amdhsa_fp16_overflow", COMPUTE_PGM_RSRC1_GFX9_PLUS_FP16_OVFL);
 
   if (!isGFX9Plus())
-    if (FourByteBuffer & COMPUTE_PGM_RSRC1_GFX6_GFX8_RESERVED0)
-      return MCDisassembler::Fail;
-  if (FourByteBuffer & COMPUTE_PGM_RSRC1_RESERVED1)
-    return MCDisassembler::Fail;
+    CHECK_RESERVED_BITS_DESC_MSG(COMPUTE_PGM_RSRC1_GFX6_GFX8_RESERVED0,
+                                 "COMPUTE_PGM_RSRC1", "must be zero pre-gfx9");
+
+  CHECK_RESERVED_BITS_DESC(COMPUTE_PGM_RSRC1_RESERVED1, "COMPUTE_PGM_RSRC1");
+
   if (!isGFX10Plus())
-    if (FourByteBuffer & COMPUTE_PGM_RSRC1_GFX6_GFX9_RESERVED2)
-      return MCDisassembler::Fail;
+    CHECK_RESERVED_BITS_DESC_MSG(COMPUTE_PGM_RSRC1_GFX6_GFX9_RESERVED2,
+                                 "COMPUTE_PGM_RSRC1", "must be zero pre-gfx10");
 
   if (isGFX10Plus()) {
     PRINT_DIRECTIVE(".amdhsa_workgroup_processor_mode",
@@ -1925,14 +1962,9 @@ MCDisassembler::DecodeStatus AMDGPUDisassembler::decodeCOMPUTE_PGM_RSRC2(
   PRINT_DIRECTIVE(".amdhsa_system_vgpr_workitem_id",
                   COMPUTE_PGM_RSRC2_ENABLE_VGPR_WORKITEM_ID);
 
-  if (FourByteBuffer & COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_ADDRESS_WATCH)
-    return MCDisassembler::Fail;
-
-  if (FourByteBuffer & COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_MEMORY)
-    return MCDisassembler::Fail;
-
-  if (FourByteBuffer & COMPUTE_PGM_RSRC2_GRANULATED_LDS_SIZE)
-    return MCDisassembler::Fail;
+  CHECK_RESERVED_BITS(COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_ADDRESS_WATCH);
+  CHECK_RESERVED_BITS(COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_MEMORY);
+  CHECK_RESERVED_BITS(COMPUTE_PGM_RSRC2_GRANULATED_LDS_SIZE);
 
   PRINT_DIRECTIVE(
       ".amdhsa_exception_fp_ieee_invalid_op",
@@ -1951,8 +1983,7 @@ MCDisassembler::DecodeStatus AMDGPUDisassembler::decodeCOMPUTE_PGM_RSRC2(
   PRINT_DIRECTIVE(".amdhsa_exception_int_div_zero",
                   COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_INT_DIVIDE_BY_ZERO);
 
-  if (FourByteBuffer & COMPUTE_PGM_RSRC2_RESERVED0)
-    return MCDisassembler::Fail;
+  CHECK_RESERVED_BITS_DESC(COMPUTE_PGM_RSRC2_RESERVED0, "COMPUTE_PGM_RSRC2");
 
   return MCDisassembler::Success;
 }
@@ -1966,11 +1997,13 @@ MCDisassembler::DecodeStatus AMDGPUDisassembler::decodeCOMPUTE_PGM_RSRC3(
     KdStream << Indent << ".amdhsa_accum_offset "
              << (GET_FIELD(COMPUTE_PGM_RSRC3_GFX90A_ACCUM_OFFSET) + 1) * 4
              << '\n';
-    if (FourByteBuffer & COMPUTE_PGM_RSRC3_GFX90A_RESERVED0)
-      return MCDisassembler::Fail;
+
     PRINT_DIRECTIVE(".amdhsa_tg_split", COMPUTE_PGM_RSRC3_GFX90A_TG_SPLIT);
-    if (FourByteBuffer & COMPUTE_PGM_RSRC3_GFX90A_RESERVED1)
-      return MCDisassembler::Fail;
+
+    CHECK_RESERVED_BITS_DESC_MSG(COMPUTE_PGM_RSRC3_GFX90A_RESERVED0,
+                                 "COMPUTE_PGM_RSRC3", "must be zero on gfx90a");
+    CHECK_RESERVED_BITS_DESC_MSG(COMPUTE_PGM_RSRC3_GFX90A_RESERVED1,
+                                 "COMPUTE_PGM_RSRC3", "must be zero on gfx90a");
   } else if (isGFX10Plus()) {
     // Bits [0-3].
     if (!isGFX12Plus()) {
@@ -1983,8 +2016,9 @@ MCDisassembler::DecodeStatus AMDGPUDisassembler::decodeCOMPUTE_PGM_RSRC3(
             COMPUTE_PGM_RSRC3_GFX10_GFX11_SHARED_VGPR_COUNT);
       }
     } else {
-      if (FourByteBuffer & COMPUTE_PGM_RSRC3_GFX12_PLUS_RESERVED0)
-        return MCDisassembler::Fail;
+      CHECK_RESERVED_BITS_DESC_MSG(COMPUTE_PGM_RSRC3_GFX12_PLUS_RESERVED0,
+                                   "COMPUTE_PGM_RSRC3",
+                                   "must be zero on gfx12+");
     }
 
     // Bits [4-11].
@@ -1999,36 +2033,41 @@ MCDisassembler::DecodeStatus AMDGPUDisassembler::decodeCOMPUTE_PGM_RSRC3(
       PRINT_PSEUDO_DIRECTIVE_COMMENT(
           "INST_PREF_SIZE", COMPUTE_PGM_RSRC3_GFX12_PLUS_INST_PREF_SIZE);
     } else {
-      if (FourByteBuffer & COMPUTE_PGM_RSRC3_GFX10_RESERVED1)
-        return MCDisassembler::Fail;
+      CHECK_RESERVED_BITS_DESC_MSG(COMPUTE_PGM_RSRC3_GFX10_RESERVED1,
+                                   "COMPUTE_PGM_RSRC3",
+                                   "must be zero on gfx10");
     }
 
     // Bits [12].
-    if (FourByteBuffer & COMPUTE_PGM_RSRC3_GFX10_PLUS_RESERVED2)
-      return MCDisassembler::Fail;
+    CHECK_RESERVED_BITS_DESC_MSG(COMPUTE_PGM_RSRC3_GFX10_PLUS_RESERVED2,
+                                 "COMPUTE_PGM_RSRC3", "must be zero on gfx10+");
 
     // Bits [13].
     if (isGFX12Plus()) {
       PRINT_PSEUDO_DIRECTIVE_COMMENT("GLG_EN",
                                      COMPUTE_PGM_RSRC3_GFX12_PLUS_GLG_EN);
     } else {
-      if (FourByteBuffer & COMPUTE_PGM_RSRC3_GFX10_GFX11_RESERVED3)
-        return MCDisassembler::Fail;
+      CHECK_RESERVED_BITS_DESC_MSG(COMPUTE_PGM_RSRC3_GFX10_GFX11_RESERVED3,
+                                   "COMPUTE_PGM_RSRC3",
+                                   "must be zero on gfx10 or gfx11");
     }
 
     // Bits [14-30].
-    if (FourByteBuffer & COMPUTE_PGM_RSRC3_GFX10_PLUS_RESERVED4)
-      return MCDisassembler::Fail;
+    CHECK_RESERVED_BITS_DESC_MSG(COMPUTE_PGM_RSRC3_GFX10_PLUS_RESERVED4,
+                                 "COMPUTE_PGM_RSRC3", "must be zero on gfx10+");
 
     // Bits [31].
     if (isGFX11Plus()) {
       PRINT_PSEUDO_DIRECTIVE_COMMENT("IMAGE_OP",
                                      COMPUTE_PGM_RSRC3_GFX11_PLUS_IMAGE_OP);
     } else {
-      if (FourByteBuffer & COMPUTE_PGM_RSRC3_GFX10_RESERVED5)
-        return MCDisassembler::Fail;
+      CHECK_RESERVED_BITS_DESC_MSG(COMPUTE_PGM_RSRC3_GFX10_RESERVED5,
+                                   "COMPUTE_PGM_RSRC3",
+                                   "must be zero on gfx10");
     }
   } else if (FourByteBuffer) {
+    *CommentStream
+        << "kernel descriptor COMPUTE_PGM_RSRC3 must be all zero before gfx9\n";
     return MCDisassembler::Fail;
   }
   return MCDisassembler::Success;
@@ -2036,6 +2075,30 @@ MCDisassembler::DecodeStatus AMDGPUDisassembler::decodeCOMPUTE_PGM_RSRC3(
 #undef PRINT_PSEUDO_DIRECTIVE_COMMENT
 #undef PRINT_DIRECTIVE
 #undef GET_FIELD
+#undef CHECK_RESERVED_BITS_IMPL
+#undef CHECK_RESERVED_BITS
+#undef CHECK_RESERVED_BITS_MSG
+#undef CHECK_RESERVED_BITS_DESC
+#undef CHECK_RESERVED_BITS_DESC_MSG
+
+static void printReservedKDBitsError(raw_ostream *CommentStream, uint32_t Mask,
+                                     unsigned BaseInBytes, StringRef Msg = "") {
+  *CommentStream << "kernel descriptor reserved "
+                 << getBitRangeFromMask(Mask, BaseInBytes) << " set";
+  if (!Msg.empty())
+    *CommentStream << ", " << Msg;
+  *CommentStream << '\n';
+}
+
+static void printReservedKDBytesError(raw_ostream *CommentStream,
+                                      unsigned BaseInBytes,
+                                      unsigned WidthInBytes) {
+  // Print an error comment in the same format as the "Kernel Descriptor"
+  // table here: https://llvm.org/docs/AMDGPUUsage.html#kernel-descriptor .
+  *CommentStream << "kernel descriptor reserved bits set in range "
+                 << ((BaseInBytes + WidthInBytes) * CHAR_BIT - 1) << ":"
+                 << (BaseInBytes * CHAR_BIT) << '\n';
+}
 
 MCDisassembler::DecodeStatus
 AMDGPUDisassembler::decodeKernelDescriptorDirective(
@@ -2080,6 +2143,7 @@ AMDGPUDisassembler::decodeKernelDescriptorDirective(
     ReservedBytes = DE.getBytes(Cursor, 4);
     for (int I = 0; I < 4; ++I) {
       if (ReservedBytes[I] != 0) {
+        printReservedKDBytesError(CommentStream, amdhsa::RESERVED0_OFFSET, 4);
         return MCDisassembler::Fail;
       }
     }
@@ -2097,6 +2161,7 @@ AMDGPUDisassembler::decodeKernelDescriptorDirective(
     ReservedBytes = DE.getBytes(Cursor, 20);
     for (int I = 0; I < 20; ++I) {
       if (ReservedBytes[I] != 0) {
+        printReservedKDBytesError(CommentStream, amdhsa::RESERVED1_OFFSET, 20);
         return MCDisassembler::Fail;
       }
     }
@@ -2135,12 +2200,18 @@ AMDGPUDisassembler::decodeKernelDescriptorDirective(
     PRINT_DIRECTIVE(".amdhsa_user_sgpr_private_segment_size",
                     KERNEL_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_SIZE);
 
-    if (TwoByteBuffer & KERNEL_CODE_PROPERTY_RESERVED0)
+    if (TwoByteBuffer & KERNEL_CODE_PROPERTY_RESERVED0) {
+      printReservedKDBitsError(CommentStream, KERNEL_CODE_PROPERTY_RESERVED0,
+                               amdhsa::KERNEL_CODE_PROPERTIES_OFFSET);
       return MCDisassembler::Fail;
+    }
 
     // Reserved for GFX9
     if (isGFX9() &&
         (TwoByteBuffer & KERNEL_CODE_PROPERTY_ENABLE_WAVEFRONT_SIZE32)) {
+      printReservedKDBitsError(
+          CommentStream, KERNEL_CODE_PROPERTY_ENABLE_WAVEFRONT_SIZE32,
+          amdhsa::KERNEL_CODE_PROPERTIES_OFFSET, "must be zero on gfx9");
       return MCDisassembler::Fail;
     } else if (isGFX10Plus()) {
       PRINT_DIRECTIVE(".amdhsa_wavefront_size32",
@@ -2151,8 +2222,11 @@ AMDGPUDisassembler::decodeKernelDescriptorDirective(
       PRINT_DIRECTIVE(".amdhsa_uses_dynamic_stack",
                       KERNEL_CODE_PROPERTY_USES_DYNAMIC_STACK);
 
-    if (TwoByteBuffer & KERNEL_CODE_PROPERTY_RESERVED1)
+    if (TwoByteBuffer & KERNEL_CODE_PROPERTY_RESERVED1) {
+      printReservedKDBitsError(CommentStream, KERNEL_CODE_PROPERTY_RESERVED1,
+                               amdhsa::KERNEL_CODE_PROPERTIES_OFFSET);
       return MCDisassembler::Fail;
+    }
 
     return MCDisassembler::Success;
 
@@ -2174,8 +2248,10 @@ AMDGPUDisassembler::decodeKernelDescriptorDirective(
     // 4 bytes from here are reserved, must be 0.
     ReservedBytes = DE.getBytes(Cursor, 4);
     for (int I = 0; I < 4; ++I) {
-      if (ReservedBytes[I] != 0)
+      if (ReservedBytes[I] != 0) {
+        printReservedKDBytesError(CommentStream, amdhsa::RESERVED3_OFFSET, 4);
         return MCDisassembler::Fail;
+      }
     }
     return MCDisassembler::Success;
 
@@ -2189,8 +2265,10 @@ AMDGPUDisassembler::decodeKernelDescriptorDirective(
 MCDisassembler::DecodeStatus AMDGPUDisassembler::decodeKernelDescriptor(
     StringRef KdName, ArrayRef<uint8_t> Bytes, uint64_t KdAddress) const {
   // CP microcode requires the kernel descriptor to be 64 aligned.
-  if (Bytes.size() != 64 || KdAddress % 64 != 0)
+  if (Bytes.size() != 64 || KdAddress % 64 != 0) {
+    *CommentStream << "kernel descriptor must be 64-byte aligned\n";
     return MCDisassembler::Fail;
+  }
 
   // FIXME: We can't actually decode "in order" as is done below, as e.g. GFX10
   // requires us to know the setting of .amdhsa_wavefront_size32 in order to
@@ -2235,9 +2313,12 @@ AMDGPUDisassembler::onSymbolStart(SymbolInfoTy &Symbol, uint64_t &Size,
   // Fix the spurious symbol issue for AMDGPU kernels. Exists for both Code
   // Object V2 and V3 when symbols are marked protected.
 
+  SaveAndRestore CommentStreamRAII(CommentStream, &CStream);
+
   // amd_kernel_code_t for Code Object V2.
   if (Symbol.Type == ELF::STT_AMDGPU_HSA_KERNEL) {
     Size = 256;
+    *CommentStream << "code object v2 is not supported";
     return MCDisassembler::Fail;
   }
 
@@ -2247,6 +2328,7 @@ AMDGPUDisassembler::onSymbolStart(SymbolInfoTy &Symbol, uint64_t &Size,
     Size = 64; // Size = 64 regardless of success or failure.
     return decodeKernelDescriptor(Name.drop_back(3), Bytes, Address);
   }
+
   return std::nullopt;
 }
 
diff --git a/llvm/test/MC/Disassembler/AMDGPU/kernel-descriptor-errors.test b/llvm/test/MC/Disassembler/AMDGPU/kernel-descriptor-errors.test
new file mode 100644
index 00000000000000..bdfd70f8a022d1
--- /dev/null
+++ b/llvm/test/MC/Disassembler/AMD...
[truncated]

@@ -160,7 +161,7 @@ class MCDisassembler {
/// case.
virtual std::optional<DecodeStatus>
onSymbolStart(SymbolInfoTy &Symbol, uint64_t &Size, ArrayRef<uint8_t> Bytes,
uint64_t Address, raw_ostream &CStream) const;
uint64_t Address, raw_ostream &ErrS) const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have an error string stream, probably should convert this to use Error return instead of optional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using an Error to represent errors seems like a nice refactor, but an Error return isn't sufficient for this function. It has three cases: a) nullopt, meaning the disassembler didn't have any handling for this symbol, b) Fail, meaning an error was encountered, and c) Success. b & c both indicate to objdump that some bytes were consumed, so Size must be set. In the update I changed the return type to bool, where true means the target disassembler had some handling for this symbol, with an Error out param to indicate whether it contained errors.

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@slinder1 slinder1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small nits, otherwise LGTM! Thank you for making the errors so descriptive, it is a massive improvement for the exceptional cases where the tool is the most useful.

Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only taken a look at the llvm-objdump part.

llvm/tools/llvm-objdump/llvm-objdump.cpp Outdated Show resolved Hide resolved
llvm/tools/llvm-objdump/llvm-objdump.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llvm-objdump parts LGTM.

@epilk epilk merged commit 68e814d into llvm:main Apr 18, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants