Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mlir][bytecode] Check that bytecode source buffer is sufficiently aligned. #66380

Merged
merged 4 commits into from
Sep 17, 2023

Conversation

chsigg
Copy link
Contributor

@chsigg chsigg commented Sep 14, 2023

Before this change, the ByteCode test failed on CentOS 7 with devtoolset-9, because strings happen to be only 8 byte aligned. In general though, strings have no alignment guarantee.

Increase resource alignment in test.
Adjust test to make a copy of the buffer that is sufficiently aligned.
Add test to check error when buffer is insufficiently aligned.

@llvmbot llvmbot added mlir:core MLIR Core Infrastructure mlir labels Sep 14, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Sep 14, 2023

@llvm/pr-subscribers-mlir-core

@llvm/pr-subscribers-mlir

Changes Adjust test to make a copy of the source buffer that is sufficiently aligned.

I'm not entirely sure a check is sufficient here, or whether the required buffer alignment should somehow be reported back to the user by the ByteCodeWriter. If the current change is reasonable, I can add a test.

Full diff: https://github.com/llvm/llvm-project/pull/66380.diff

3 Files Affected:

  • (modified) mlir/lib/Bytecode/Reader/BytecodeReader.cpp (+24-9)
  • (modified) mlir/unittests/Bytecode/BytecodeTest.cpp (+15-3)
  • (modified) utils/bazel/llvm-project-overlay/mlir/unittests/BUILD.bazel (+21)
diff --git a/mlir/lib/Bytecode/Reader/BytecodeReader.cpp b/mlir/lib/Bytecode/Reader/BytecodeReader.cpp
index 483cbfda8d0e565..d6a7ab7f366d3bd 100644
--- a/mlir/lib/Bytecode/Reader/BytecodeReader.cpp
+++ b/mlir/lib/Bytecode/Reader/BytecodeReader.cpp
@@ -11,29 +11,33 @@
 #include "mlir/Bytecode/BytecodeImplementation.h"
 #include "mlir/Bytecode/BytecodeOpInterface.h"
 #include "mlir/Bytecode/Encoding.h"
-#include "mlir/IR/BuiltinDialect.h"
+#include "mlir/IR/AsmState.h"
 #include "mlir/IR/BuiltinOps.h"
 #include "mlir/IR/Diagnostics.h"
+#include "mlir/IR/Dialect.h"
 #include "mlir/IR/OpImplementation.h"
 #include "mlir/IR/Verifier.h"
 #include "mlir/IR/Visitors.h"
 #include "mlir/Support/LLVM.h"
 #include "mlir/Support/LogicalResult.h"
 #include "llvm/ADT/ArrayRef.h"
-#include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/ScopeExit.h"
-#include "llvm/ADT/SmallString.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Support/Endian.h"
+#include "llvm/Support/ErrorHandling.h"
 #include "llvm/Support/MemoryBufferRef.h"
-#include "llvm/Support/SaveAndRestore.h"
 #include "llvm/Support/SourceMgr.h"
+
+#include <cassert>
 #include <cstddef>
+#include <cstdint>
+#include <cstring>
 #include <list>
 #include <memory>
 #include <numeric>
 #include <optional>
+#include <string>
 
 #define DEBUG_TYPE "mlir-bytecode-reader"
 
@@ -93,23 +97,31 @@ namespace {
 class EncodingReader {
 public:
   explicit EncodingReader(ArrayRef<uint8_t> contents, Location fileLoc)
-      : dataIt(contents.data()), dataEnd(contents.end()), fileLoc(fileLoc) {}
+      : buffer(contents), dataIt(buffer.begin()), fileLoc(fileLoc) {}
   explicit EncodingReader(StringRef contents, Location fileLoc)
       : EncodingReader({reinterpret_cast<const uint8_t *>(contents.data()),
                         contents.size()},
                        fileLoc) {}
 
   /// Returns true if the entire section has been read.
-  bool empty() const { return dataIt == dataEnd; }
+  bool empty() const { return dataIt == buffer.end(); }
 
   /// Returns the remaining size of the bytecode.
-  size_t size() const { return dataEnd - dataIt; }
+  size_t size() const { return buffer.end() - dataIt; }
 
   /// Align the current reader position to the specified alignment.
   LogicalResult alignTo(unsigned alignment) {
     if (!llvm::isPowerOf2_32(alignment))
       return emitError("expected alignment to be a power-of-two");
 
+    // Ensure the data buffer was sufficiently aligned in the first place.
+    if (LLVM_UNLIKELY(
+            !llvm::isAddrAligned(llvm::Align(alignment), buffer.begin()))) {
+      return emitError("expected bytecode buffer to be aligned to ", alignment,
+                       ", but got pointer: '0x" +
+                           llvm::utohexstr((uintptr_t)buffer.begin()) + "'");
+    }
+
     // Shift the reader position to the next alignment boundary.
     while (uintptr_t(dataIt) & (uintptr_t(alignment) - 1)) {
       uint8_t padding;
@@ -320,8 +332,11 @@ class EncodingReader {
     return success();
   }
 
-  /// The current data iterator, and an iterator to the end of the buffer.
-  const uint8_t *dataIt, *dataEnd;
+  /// The bytecode buffer.
+  ArrayRef<uint8_t> buffer;
+
+  /// The current iterator within the 'buffer'.
+  const uint8_t *dataIt;
 
   /// A location for the bytecode used to report errors.
   Location fileLoc;
diff --git a/mlir/unittests/Bytecode/BytecodeTest.cpp b/mlir/unittests/Bytecode/BytecodeTest.cpp
index fc86f132dd60b4d..26e63a3bf9ed34b 100644
--- a/mlir/unittests/Bytecode/BytecodeTest.cpp
+++ b/mlir/unittests/Bytecode/BytecodeTest.cpp
@@ -6,7 +6,6 @@
 //
 //===----------------------------------------------------------------------===//
 
-#include "mlir/Bytecode/BytecodeReader.h"
 #include "mlir/Bytecode/BytecodeWriter.h"
 #include "mlir/IR/AsmState.h"
 #include "mlir/IR/BuiltinAttributes.h"
@@ -19,6 +18,10 @@
 #include "gmock/gmock.h"
 #include "gtest/gtest.h"
 
+#include <algorithm>
+#include <cstdlib>
+#include <memory>
+
 using namespace llvm;
 using namespace mlir;
 
@@ -50,9 +53,18 @@ TEST(Bytecode, MultiModuleWithResource) {
   llvm::raw_string_ostream ostream(buffer);
   ASSERT_TRUE(succeeded(writeBytecodeToFile(module.get(), ostream)));
 
+  // Make a sufficiently aligned copy of the buffer for reading it back.
+  ostream.flush();
+  constexpr std::size_t kAlignment = 16; // AsmResourceBlob alignment.
+  auto deleter = [](char *ptr) { std::free(ptr); };
+  std::unique_ptr<char, decltype(deleter)> aligned_buffer(
+      static_cast<char *>(std::aligned_alloc(kAlignment, buffer.size())),
+      deleter);
+  std::copy(buffer.begin(), buffer.end(), aligned_buffer.get());
+
   // Parse it back
-  OwningOpRef<Operation *> roundTripModule =
-      parseSourceString<Operation *>(ostream.str(), parseConfig);
+  OwningOpRef<Operation *> roundTripModule = parseSourceString<Operation *>(
+      {aligned_buffer.get(), buffer.size()}, parseConfig);
   ASSERT_TRUE(roundTripModule);
 
   // FIXME: Parsing external resources does not work on big-endian
diff --git a/utils/bazel/llvm-project-overlay/mlir/unittests/BUILD.bazel b/utils/bazel/llvm-project-overlay/mlir/unittests/BUILD.bazel
index 13f9822a61d1489..ca6c294171af6dd 100644
--- a/utils/bazel/llvm-project-overlay/mlir/unittests/BUILD.bazel
+++ b/utils/bazel/llvm-project-overlay/mlir/unittests/BUILD.bazel
@@ -358,6 +358,27 @@ cc_test(
     ],
 )
 
+cc_test(
+    name = "bytecode_tests",
+    size = "small",
+    srcs = glob([
+        "Bytecode/*.cpp",
+        "Bytecode/*.h",
+        "Bytecode/*/*.cpp",
+        "Bytecode/*/*.h",
+    ]),
+    deps = [
+        "//llvm:Support",
+        "//mlir:BytecodeReader",
+        "//mlir:BytecodeWriter",
+        "//mlir:IR",
+        "//mlir:Parser",
+        "//third-party/unittest:gmock",
+        "//third-party/unittest:gtest",
+        "//third-party/unittest:gtest_main",
+    ],
+)
+
 cc_test(
     name = "conversion_tests",
     size = "small",


/// Align the current reader position to the specified alignment.
LogicalResult alignTo(unsigned alignment) {
if (!llvm::isPowerOf2_32(alignment))
return emitError("expected alignment to be a power-of-two");

// Ensure the data buffer was sufficiently aligned in the first place.
if (LLVM_UNLIKELY(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the cost of checking the main buffer alignment every time we need to realign?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a few unnecessary instructions before, now it's just 2.

@River707
Copy link
Contributor

Adding the checks is really nice! Just wondering how the current placement impacts perf during reading.

#include <cstddef>
#include <cstdint>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised that all of these includes are necessary, were these auto-imported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got carried away with fixing IWYU nags. Undone.

@chsigg
Copy link
Contributor Author

chsigg commented Sep 14, 2023

Adding the checks is really nice! Just wondering how the current placement impacts perf during reading.

It looks pretty cheap, cheaper than checking the values of the alignment chunk. Maybe I'm missing something, but does this get called often?

What I still don't understand though, how does the user know what alignment the input buffer needs?

@chsigg
Copy link
Contributor Author

chsigg commented Sep 15, 2023

I added a test for insufficient alignment and increased the resource's alignment to 32, so that we don't get lucky on systems where strings happen to be 16 byte aligned.

@chsigg chsigg merged commit 1c8c365 into llvm:main Sep 17, 2023
2 checks passed
ZijunZhaoCCK pushed a commit to ZijunZhaoCCK/llvm-project that referenced this pull request Sep 19, 2023
…igned. (llvm#66380)

Before this change, the `ByteCode` test failed on CentOS 7 with
devtoolset-9, because strings happen to be only 8 byte aligned. In
general though, strings have no alignment guarantee.

Increase resource alignment in test to 32 bytes. 
Adjust test to sufficiently align buffer.
Add test to check error when buffer is insufficiently aligned.
Mogball added a commit that referenced this pull request Sep 30, 2023
This partially reverts #66380. The assertion that the underlying buffer
of an EncodingReader is aligned to any required alignments for resource
sections. Resources know their own alignment and pad their buffers
accordingly, but the bytecode reader doesn't know that ahead of time.
Consequently, it cannot give the resource EncodingReader a base buffer
aligned to the maximum required alignment.

A simple example from the test fails without this:

```mlir
module @TestDialectResources attributes {
  bytecode.test = dense_resource<resource> : tensor<4xi32>
} {}
{-#
  dialect_resources: {
    builtin: {
      resource: "0x2000000001000000020000000300000004000000",
      resource_2: "0x2000000001000000020000000300000004000000"
    }
  }
```
zahiraam pushed a commit to tahonermann/llvm-project that referenced this pull request Oct 24, 2023
…igned. (llvm#66380)

Before this change, the `ByteCode` test failed on CentOS 7 with
devtoolset-9, because strings happen to be only 8 byte aligned. In
general though, strings have no alignment guarantee.

Increase resource alignment in test to 32 bytes. 
Adjust test to sufficiently align buffer.
Add test to check error when buffer is insufficiently aligned.
zahiraam pushed a commit to tahonermann/llvm-project that referenced this pull request Oct 24, 2023
…igned. (llvm#66380)

Before this change, the `ByteCode` test failed on CentOS 7 with
devtoolset-9, because strings happen to be only 8 byte aligned. In
general though, strings have no alignment guarantee.

Increase resource alignment in test to 32 bytes. 
Adjust test to sufficiently align buffer.
Add test to check error when buffer is insufficiently aligned.
@chsigg chsigg deleted the piper_export_cl_565358964 branch February 15, 2024 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mlir:core MLIR Core Infrastructure mlir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants