Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV] Support instruction sizes up to 176-bits in disassembler. #90371

Merged
merged 4 commits into from
Apr 29, 2024

Conversation

topperc
Copy link
Collaborator

@topperc topperc commented Apr 28, 2024

No description provided.

@llvmbot llvmbot added backend:RISC-V mc Machine (object) code labels Apr 28, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Apr 28, 2024

@llvm/pr-subscribers-mc

@llvm/pr-subscribers-backend-risc-v

Author: Craig Topper (topperc)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/90371.diff

2 Files Affected:

  • (modified) llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp (+37-5)
  • (added) llvm/test/MC/RISCV/large-instructions.s (+29)
diff --git a/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp b/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
index 7ca20190731ad8..c43bda8521fbf7 100644
--- a/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
+++ b/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
@@ -656,12 +656,44 @@ DecodeStatus RISCVDisassembler::getInstruction(MCInst &MI, uint64_t &Size,
                                                ArrayRef<uint8_t> Bytes,
                                                uint64_t Address,
                                                raw_ostream &CS) const {
-  // TODO: This will need modification when supporting instruction set
-  // extensions with instructions > 32-bits (up to 176 bits wide).
+  // It's a 16 bit instruction if bit 0 and 1 are not 0x3.
+  if ((Bytes[0] & 0x3) != 0x3)
+    return getInstruction16(MI, Size, Bytes, Address, CS);
 
-  // It's a 32 bit instruction if bit 0 and 1 are 1.
-  if ((Bytes[0] & 0x3) == 0x3)
+  // It's a 32 bit instruction if bit 1:0 are 0x3(checked above) and bits 4:2
+  // are not 0x3.
+  if ((Bytes[0] & 0x1f) != 0x1f)
     return getInstruction32(MI, Size, Bytes, Address, CS);
 
-  return getInstruction16(MI, Size, Bytes, Address, CS);
+  // 48-bit instructions are encoded as 0bxx011111.
+  if ((Bytes[0] & 0x3f) == 0x1f) {
+    Size = Bytes.size() >= 6 ? 6 : 0;
+    return MCDisassembler::Fail;
+  }
+
+  // 64-bit instructions are encoded as 0bx0111111.
+  if ((Bytes[0] & 0x7f) == 0x3f) {
+    Size = Bytes.size() >= 8 ? 8 : 0;
+    return MCDisassembler::Fail;
+  }
+
+  // Need to read a second byte.
+  if (Bytes.size() < 2) {
+    Size = 0;
+    return MCDisassembler::Fail;
+  }
+
+  // 80-bit through 176-bit instructions are encoded as 0bxnnnxxxx_x1111111.
+  // Where number of bits is (80 + (nnn * 16)) for nnn != 0b111.
+  unsigned nnn = (Bytes[1] >> 4) & 0x7;
+  if (nnn != 0x7) {
+    Size = 10 + (nnn * 2);
+    if (Bytes.size() < Size)
+      Size = 0;
+    return MCDisassembler::Fail;
+  }
+
+  // Remaining encodings are reserved for > 176-bit instructions.
+  Size = 0;
+  return MCDisassembler::Fail;
 }
diff --git a/llvm/test/MC/RISCV/large-instructions.s b/llvm/test/MC/RISCV/large-instructions.s
new file mode 100644
index 00000000000000..b50dbde17d3801
--- /dev/null
+++ b/llvm/test/MC/RISCV/large-instructions.s
@@ -0,0 +1,29 @@
+# RUN: llvm-mc -filetype=obj -triple riscv32 < %s \
+# RUN:     | llvm-objdump -d - | FileCheck %s
+
+# CHECK: 011f 4523 8967 <unknown>
+.byte 0x1f, 0x01, 0x23, 0x45, 0x67, 0x89
+
+# CHECK: 4523013f cdab8967 <unknown>
+.byte 0x3f, 0x01, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd
+
+# CHECK: 007f 4523 8967 cdab feef <unknown>
+.byte 0x7f, 0x00, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef, 0xfe
+
+# CHECK: 4523107f cdab8967 badcfeef <unknown>
+.byte 0x7f, 0x10, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef, 0xfe, 0xdc, 0xba
+
+# CHECK: 207f 4523 8967 cdab feef badc 7698 <unknown>
+.byte 0x7f, 0x20, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef, 0xfe, 0xdc, 0xba, 0x98, 0x76
+
+# CHECK: 4523307f cdab8967 badcfeef 32547698 <unknown>
+.byte 0x7f, 0x30, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef, 0xfe, 0xdc, 0xba, 0x98, 0x76, 0x54, 0x32
+
+# CHECK: 407f 4523 8967 cdab feef badc 7698 3254 1210 <unknown>
+.byte 0x7f, 0x40, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef, 0xfe, 0xdc, 0xba, 0x98, 0x76, 0x54, 0x32, 0x10, 0x12
+
+# CHECK: 4523507f cdab8967 badcfeef 32547698 56341210 <unknown>
+.byte 0x7f, 0x50, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef, 0xfe, 0xdc, 0xba, 0x98, 0x76, 0x54, 0x32, 0x10, 0x12, 0x34, 0x56
+
+# CHECK: 607f 4523 8967 cdab feef badc 7698 3254 1210 5634 9a78 <unknown>
+.byte 0x7f, 0x60, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef, 0xfe, 0xdc, 0xba, 0x98, 0x76, 0x54, 0x32, 0x10, 0x12, 0x34, 0x56, 0x78, 0x9a

@wangpc-pp
Copy link
Contributor

Are we going to use larger instruction length?

@topperc
Copy link
Collaborator Author

topperc commented Apr 28, 2024

Are we going to use larger instruction length?

Not that I know of, but when I updated llvm-objdump in #90093 there was concern that I didn't handle 48-bit instructions. So I figured if that was really a concern we need to fix the disassembler too.

// TODO: This will need modification when supporting instruction set
// extensions with instructions > 32-bits (up to 176 bits wide).
// It's a 16 bit instruction if bit 0 and 1 are not 0x3.
if ((Bytes[0] & 0x3) != 0x3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use 0b111, 0b111111, ..., which should be more readable.

Copy link
Contributor

@wangpc-pp wangpc-pp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -656,12 +656,44 @@ DecodeStatus RISCVDisassembler::getInstruction(MCInst &MI, uint64_t &Size,
ArrayRef<uint8_t> Bytes,
uint64_t Address,
raw_ostream &CS) const {
// TODO: This will need modification when supporting instruction set
// extensions with instructions > 32-bits (up to 176 bits wide).
// It's a 16 bit instruction if bit 0 and 1 are not 0x3.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// It's a 16 bit instruction if bit 0 and 1 are not 0x3.
// It's a 16 bit instruction if bit 0 and 1 are not 0b11.

@topperc topperc merged commit 618adc7 into llvm:main Apr 29, 2024
3 of 4 checks passed
@topperc topperc deleted the pr/disassembler-large-instructions branch April 29, 2024 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:RISC-V mc Machine (object) code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants