[MLIR][Bytecode] Followup 8106c81 #157136

nikalra · 2025-09-05T16:27:44Z

Addressed code review feedback:

Fixed some issues in the unit test
Adjusted line wrapping in the docs
Clarified comments in the bytecode reader

Addressed code review feedback: - Fixed some issues in the unit test - Adjusted line wrapping in the docs - Clarified comments in the bytecode reader

llvmbot · 2025-09-05T16:28:17Z

@llvm/pr-subscribers-mlir

Author: Nikhil Kalra (nikalra)

Changes

Addressed code review feedback:

Fixed some issues in the unit test
Adjusted line wrapping in the docs
Clarified comments in the bytecode reader

Full diff: https://github.com/llvm/llvm-project/pull/157136.diff

3 Files Affected:

(modified) mlir/docs/BytecodeFormat.md (+3-1)
(modified) mlir/lib/Bytecode/Reader/BytecodeReader.cpp (+13-4)
(modified) mlir/unittests/Bytecode/BytecodeTest.cpp (+12-13)

diff --git a/mlir/docs/BytecodeFormat.md b/mlir/docs/BytecodeFormat.md
index 9846df8726295..f50ddeb25cc33 100644
--- a/mlir/docs/BytecodeFormat.md
+++ b/mlir/docs/BytecodeFormat.md
@@ -125,7 +125,9 @@ lazy-loading, and more. Each section contains a Section ID, whose high bit
 indicates if the section has alignment requirements, a length (which allows for
 skipping over the section), and an optional alignment. When an alignment is
 present, a variable number of padding bytes (0xCB) may appear before the section
-data. The alignment of a section must be a power of 2. The input bytecode buffer must satisfy the same alignment requirements as those of every section.
+data. The alignment of a section must be a power of 2. 
+The input bytecode buffer must satisfy the same alignment requirements as 
+those of every section.
 
 ## MLIR Encoding
 
diff --git a/mlir/lib/Bytecode/Reader/BytecodeReader.cpp b/mlir/lib/Bytecode/Reader/BytecodeReader.cpp
index d29053a2b6e65..287d926d66105 100644
--- a/mlir/lib/Bytecode/Reader/BytecodeReader.cpp
+++ b/mlir/lib/Bytecode/Reader/BytecodeReader.cpp
@@ -22,8 +22,6 @@
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Support/Endian.h"
-#include "llvm/Support/Format.h"
-#include "llvm/Support/LogicalResult.h"
 #include "llvm/Support/MemoryBufferRef.h"
 #include "llvm/Support/SourceMgr.h"
 
@@ -300,8 +298,19 @@ class EncodingReader {
       // alignment of the root buffer. If it is not, we cannot safely guarantee
       // that the specified alignment is globally correct.
       //
-      // E.g. if the buffer is 8k aligned and the section is 16k aligned,
-      // we could end up at an offset of 24k, which is not globally 16k aligned.
+      // E.g. if the buffer is 8k aligned and the section is marked to be 16k
+      // aligned:
+      // - (a) the alignTo call early returns when the pointer is 16k
+      // aligned but given the original 8k alignment we could offset into the
+      // padding by ~8k giving us 16k pointer alignment leaving another ~8k of
+      // padding in the bytecode file that will inadvertently be read when we
+      // attempt to parse the next section.
+      // - (b) we update alignTo to align relative to the start of the buffer,
+      // but given an 8k aligned buffer and section alignment of 16k, we could
+      // end up with a pointer that is 24k aligned (8k start alignment + 16k
+      // offset) instead of globally 16k aligned (versus 16k start alignment +
+      // 16k offset). This would result in incorrectly stated alignment for
+      // resources that reference data inside of the bytecode buffer.
       if (failed(alignmentValidator(alignment)))
         return emitError("failed to align section ID: ", unsigned(sectionID));
 
diff --git a/mlir/unittests/Bytecode/BytecodeTest.cpp b/mlir/unittests/Bytecode/BytecodeTest.cpp
index 9ea6560f712a1..d7b442f6832d0 100644
--- a/mlir/unittests/Bytecode/BytecodeTest.cpp
+++ b/mlir/unittests/Bytecode/BytecodeTest.cpp
@@ -72,6 +72,8 @@ TEST(Bytecode, MultiModuleWithResource) {
   ASSERT_TRUE(module);
 
   // Write the module to bytecode.
+  // Ensure that reserveExtraSpace is called with the size needed to write the
+  // bytecode buffer.
   MockOstream ostream;
   EXPECT_CALL(ostream, reserveExtraSpace).WillOnce([&](uint64_t space) {
     ostream.buffer = std::make_unique<std::byte[]>(space);
@@ -128,31 +130,28 @@ TEST(Bytecode, AlignmentFailure) {
   ASSERT_TRUE(module);
 
   // Write the module to bytecode.
-  MockOstream ostream;
-  EXPECT_CALL(ostream, reserveExtraSpace).WillOnce([&](uint64_t space) {
-    ostream.buffer = std::make_unique<std::byte[]>(space);
-    ostream.size = space;
-  });
+  std::string serializedBytecode;
+  llvm::raw_string_ostream ostream(serializedBytecode);
   ASSERT_TRUE(succeeded(writeBytecodeToFile(module.get(), ostream)));
 
   // Create copy of buffer which is not aligned to requested resource alignment.
-  std::string buffer((char *)ostream.buffer.get(),
-                     (char *)ostream.buffer.get() + ostream.size);
+  std::string buffer(serializedBytecode);
   size_t bufferSize = buffer.size();
 
-  // Increment into the buffer until we get to a power of 2 alignment that is
-  // not 32 bit aligned.
+  // Increment into the buffer until we get to an address that is 2 byte aligned
+  // but not 32 byte aligned.
   size_t pad = 0;
   while (true) {
-    if (llvm::isAddrAligned(Align(2), &buffer[pad]) &&
-        !llvm::isAddrAligned(Align(32), &buffer[pad]))
+    if (llvm::isAddrAligned(Align(2), buffer.data() + pad) &&
+        !llvm::isAddrAligned(Align(32), buffer.data() + pad))
       break;
 
     pad++;
-    buffer.reserve(bufferSize + pad);
+    // Pad the beginning of the buffer to push the start point to an unaligned
+    // value.
+    buffer.insert(0, 1, ' ');
   }
 
-  buffer.insert(0, pad, ' ');
   StringRef alignedBuffer(buffer.data() + pad, bufferSize);
 
   // Attach a diagnostic handler to get the error message.

llvmbot · 2025-09-05T16:28:18Z

@llvm/pr-subscribers-mlir-core

Author: Nikhil Kalra (nikalra)

Changes

Addressed code review feedback:

Fixed some issues in the unit test
Adjusted line wrapping in the docs
Clarified comments in the bytecode reader

Full diff: https://github.com/llvm/llvm-project/pull/157136.diff

3 Files Affected:

(modified) mlir/docs/BytecodeFormat.md (+3-1)
(modified) mlir/lib/Bytecode/Reader/BytecodeReader.cpp (+13-4)
(modified) mlir/unittests/Bytecode/BytecodeTest.cpp (+12-13)

diff --git a/mlir/docs/BytecodeFormat.md b/mlir/docs/BytecodeFormat.md
index 9846df8726295..f50ddeb25cc33 100644
--- a/mlir/docs/BytecodeFormat.md
+++ b/mlir/docs/BytecodeFormat.md
@@ -125,7 +125,9 @@ lazy-loading, and more. Each section contains a Section ID, whose high bit
 indicates if the section has alignment requirements, a length (which allows for
 skipping over the section), and an optional alignment. When an alignment is
 present, a variable number of padding bytes (0xCB) may appear before the section
-data. The alignment of a section must be a power of 2. The input bytecode buffer must satisfy the same alignment requirements as those of every section.
+data. The alignment of a section must be a power of 2. 
+The input bytecode buffer must satisfy the same alignment requirements as 
+those of every section.
 
 ## MLIR Encoding
 
diff --git a/mlir/lib/Bytecode/Reader/BytecodeReader.cpp b/mlir/lib/Bytecode/Reader/BytecodeReader.cpp
index d29053a2b6e65..287d926d66105 100644
--- a/mlir/lib/Bytecode/Reader/BytecodeReader.cpp
+++ b/mlir/lib/Bytecode/Reader/BytecodeReader.cpp
@@ -22,8 +22,6 @@
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Support/Endian.h"
-#include "llvm/Support/Format.h"
-#include "llvm/Support/LogicalResult.h"
 #include "llvm/Support/MemoryBufferRef.h"
 #include "llvm/Support/SourceMgr.h"
 
@@ -300,8 +298,19 @@ class EncodingReader {
       // alignment of the root buffer. If it is not, we cannot safely guarantee
       // that the specified alignment is globally correct.
       //
-      // E.g. if the buffer is 8k aligned and the section is 16k aligned,
-      // we could end up at an offset of 24k, which is not globally 16k aligned.
+      // E.g. if the buffer is 8k aligned and the section is marked to be 16k
+      // aligned:
+      // - (a) the alignTo call early returns when the pointer is 16k
+      // aligned but given the original 8k alignment we could offset into the
+      // padding by ~8k giving us 16k pointer alignment leaving another ~8k of
+      // padding in the bytecode file that will inadvertently be read when we
+      // attempt to parse the next section.
+      // - (b) we update alignTo to align relative to the start of the buffer,
+      // but given an 8k aligned buffer and section alignment of 16k, we could
+      // end up with a pointer that is 24k aligned (8k start alignment + 16k
+      // offset) instead of globally 16k aligned (versus 16k start alignment +
+      // 16k offset). This would result in incorrectly stated alignment for
+      // resources that reference data inside of the bytecode buffer.
       if (failed(alignmentValidator(alignment)))
         return emitError("failed to align section ID: ", unsigned(sectionID));
 
diff --git a/mlir/unittests/Bytecode/BytecodeTest.cpp b/mlir/unittests/Bytecode/BytecodeTest.cpp
index 9ea6560f712a1..d7b442f6832d0 100644
--- a/mlir/unittests/Bytecode/BytecodeTest.cpp
+++ b/mlir/unittests/Bytecode/BytecodeTest.cpp
@@ -72,6 +72,8 @@ TEST(Bytecode, MultiModuleWithResource) {
   ASSERT_TRUE(module);
 
   // Write the module to bytecode.
+  // Ensure that reserveExtraSpace is called with the size needed to write the
+  // bytecode buffer.
   MockOstream ostream;
   EXPECT_CALL(ostream, reserveExtraSpace).WillOnce([&](uint64_t space) {
     ostream.buffer = std::make_unique<std::byte[]>(space);
@@ -128,31 +130,28 @@ TEST(Bytecode, AlignmentFailure) {
   ASSERT_TRUE(module);
 
   // Write the module to bytecode.
-  MockOstream ostream;
-  EXPECT_CALL(ostream, reserveExtraSpace).WillOnce([&](uint64_t space) {
-    ostream.buffer = std::make_unique<std::byte[]>(space);
-    ostream.size = space;
-  });
+  std::string serializedBytecode;
+  llvm::raw_string_ostream ostream(serializedBytecode);
   ASSERT_TRUE(succeeded(writeBytecodeToFile(module.get(), ostream)));
 
   // Create copy of buffer which is not aligned to requested resource alignment.
-  std::string buffer((char *)ostream.buffer.get(),
-                     (char *)ostream.buffer.get() + ostream.size);
+  std::string buffer(serializedBytecode);
   size_t bufferSize = buffer.size();
 
-  // Increment into the buffer until we get to a power of 2 alignment that is
-  // not 32 bit aligned.
+  // Increment into the buffer until we get to an address that is 2 byte aligned
+  // but not 32 byte aligned.
   size_t pad = 0;
   while (true) {
-    if (llvm::isAddrAligned(Align(2), &buffer[pad]) &&
-        !llvm::isAddrAligned(Align(32), &buffer[pad]))
+    if (llvm::isAddrAligned(Align(2), buffer.data() + pad) &&
+        !llvm::isAddrAligned(Align(32), buffer.data() + pad))
       break;
 
     pad++;
-    buffer.reserve(bufferSize + pad);
+    // Pad the beginning of the buffer to push the start point to an unaligned
+    // value.
+    buffer.insert(0, 1, ' ');
   }
 
-  buffer.insert(0, pad, ' ');
   StringRef alignedBuffer(buffer.data() + pad, bufferSize);
 
   // Attach a diagnostic handler to get the error message.

joker-eph · 2025-09-06T09:46:30Z

mlir/lib/Bytecode/Reader/BytecodeReader.cpp

+      // end up with a pointer that is 24k aligned (8k start alignment + 16k
+      // offset) instead of globally 16k aligned (versus 16k start alignment +
+      // 16k offset). This would result in incorrectly stated alignment for
+      // resources that reference data inside of the bytecode buffer.


I found this explanation still quite confusing to be honest: the "24k alignment" in particular.
I iterated with ChatGPT to get something more clear (I think it can likely be pruned though):

// Sanity check: the requested alignment must not exceed the alignment of the // root buffer itself. Otherwise we cannot guarantee that pointers derived // from this buffer will actually satisfy the requested alignment globally. // // Why is this necessary? // // Consider a root buffer that is guaranteed to be 8k aligned, but not 16k // aligned. For example, suppose the buffer starts at absolute address // 5×8k = 40960. If a section inside this buffer declares a 16k alignment // requirement, two problems can arise: // // (a) If we simply "align forward" the current pointer to the next // 16k boundary, the amount of padding we skip depends on the buffer's // starting address. For example: // // buffer_start = 40960 // next 16k boundary = 49152 // bytes skipped = 49152 - 40960 = 8192 // // If the buffer had started at a different 8k-aligned address, the // skipped bytes would change accordingly. This makes the section start // unpredictable and leaves behind variable padding that could be // misinterpreted as part of the next section. // // (b) If instead we align relative to the buffer start, we may obtain // addresses that are multiples of "buffer_start + section_alignment" // rather than truly globally aligned addresses. For example: // // buffer_start = 40960 (5×8k, 8k aligned but not 16k) // offset = 16384 (first multiple of 16k) // section_ptr = 40960 + 16384 = 57344 // // 57344 is divisible by 8192, so it looks "8k aligned", but: // // 57344 % 16384 = 24576 ≠ 0 // // i.e. the section pointer is at absolute address 57344, which is not // truly 16k aligned. Any consumer expecting true 16k alignment would // see this as a violation. // // In short: the section's declared alignment must not exceed the alignment // of the root buffer; otherwise we cannot enforce it in a globally // consistent and deterministic way.

nikalra · 2025-09-11T02:05:02Z

@joker-eph Would you mind looking at this again when you get a chance? Thanks!

joker-eph

Thanks!

[MLIR][Bytecode] Followup 8106c81

4ace8e2

Addressed code review feedback: - Fixed some issues in the unit test - Adjusted line wrapping in the docs - Clarified comments in the bytecode reader

nikalra requested a review from joker-eph September 5, 2025 16:27

llvmbot added mlir:core MLIR Core Infrastructure mlir labels Sep 5, 2025

nikalra mentioned this pull request Sep 5, 2025

[MLIR][Bytecode] Enforce alignment requirements #157004

Merged

joker-eph reviewed Sep 6, 2025

View reviewed changes

update comment

bf9f0a7

joker-eph approved these changes Sep 11, 2025

View reviewed changes

nikalra merged commit 3168a62 into llvm:main Sep 11, 2025
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MLIR][Bytecode] Followup 8106c81 #157136

[MLIR][Bytecode] Followup 8106c81 #157136

Uh oh!

nikalra commented Sep 5, 2025

Uh oh!

llvmbot commented Sep 5, 2025

Uh oh!

llvmbot commented Sep 5, 2025

Uh oh!

joker-eph Sep 6, 2025 •

edited

Loading

Uh oh!

nikalra Sep 8, 2025

Uh oh!

nikalra commented Sep 11, 2025

Uh oh!

joker-eph left a comment

Uh oh!

Uh oh!

Uh oh!

[MLIR][Bytecode] Followup 8106c81 #157136

[MLIR][Bytecode] Followup 8106c81 #157136

Uh oh!

Conversation

nikalra commented Sep 5, 2025

Uh oh!

llvmbot commented Sep 5, 2025

Uh oh!

llvmbot commented Sep 5, 2025

Uh oh!

joker-eph Sep 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikalra Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

nikalra commented Sep 11, 2025

Uh oh!

joker-eph left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

joker-eph Sep 6, 2025 •

edited

Loading