[HIP] Support compressing device binary #67162

yxsamliu · 2023-09-22T16:30:19Z

Add option -f[no-]offload-compress to clang to enable/disable compression of device binary for HIP. By default it is disabled.

Add option -compress to clang-offload-bundler to enable compression of offload bundle. By default it is disabled.

When enabled, zstd or zlib is used for compression when available.

When disabled, it is NFC compared to previous behavior. The same offload bundle format is used as before.

Clang-offload-bundler automatically detects whether the input file to be unbundled is compressed and the compression method and decompress if necessary.

llvmbot · 2023-09-22T16:31:25Z

@llvm/pr-subscribers-llvm-binary-utilities
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-driver

Changes

Add option -f[no-]offload-compress to clang to enable/disable compression of device binary for HIP. By default it is disabled.

Add option -compress to clang-offload-bundler to enable compression of offload bundle. By default it is disabled.

When enabled, zstd or zlib is used for compression when available.

When disabled, it is NFC compared to previous behavior. The same offload bundle format is used as before.

Clang-offload-bundler automatically detects whether the input file to be unbundled is compressed and the compression method and decompress if necessary.

Patch is 36.32 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/67162.diff

8 Files Affected:

(modified) clang/docs/ClangOffloadBundler.rst (+27)
(modified) clang/include/clang/Driver/OffloadBundler.h (+37)
(modified) clang/include/clang/Driver/Options.td (+4)
(modified) clang/lib/Driver/OffloadBundler.cpp (+293-44)
(modified) clang/lib/Driver/ToolChains/Clang.cpp (+7)
(added) clang/test/Driver/clang-offload-bundler-zlib.c (+75)
(added) clang/test/Driver/clang-offload-bundler-zstd.c (+72)
(modified) clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp (+10)

diff --git a/clang/docs/ClangOffloadBundler.rst b/clang/docs/ClangOffloadBundler.rst
index d08bf4b97781fa4..1e21d3e7264d5c3 100644
--- a/clang/docs/ClangOffloadBundler.rst
+++ b/clang/docs/ClangOffloadBundler.rst
@@ -309,3 +309,30 @@ target by comparing bundle ID's. Two bundle ID's are considered compatible if:
   * Their offload kind are the same
   * Their target triple are the same
   * Their GPUArch are the same
+
+Compression and Decompression
+=============================
+
+``clang-offload-bundler`` provides features to compress and decompress the full
+bundle, leveraging inherent redundancies within the bundle entries. Use the
+`-compress` command-line option to enable this compression capability.
+
+The compressed offload bundle begins with a header followed by the compressed binary data:
+
+- **Magic Number (4 bytes)**:
+    This is a unique identifier to distinguish compressed offload bundles. The value is the string 'CCOB' (Compressed Clang Offload Bundle).
+
+- **Version Number (16-bit unsigned int)**:
+    This denotes the version of the compressed offload bundle format. The current version is `1`.
+
+- **Compression Method (16-bit unsigned int)**:
+    This field indicates the compression method used. The value corresponds to either `zlib` or `zstd`, represented as a 16-bit unsigned integer cast from the LLVM compression enumeration.
+
+- **Uncompressed Binary Size (32-bit unsigned int)**:
+    This is the size (in bytes) of the binary data before it was compressed.
+
+- **Hash (64-bit unsigned int)**:
+    This is a 64-bit truncated MD5 hash of the uncompressed binary data. It serves for verification and caching purposes.
+
+- **Compressed Data**:
+    The actual compressed binary data follows the header. Its size can be inferred from the total size of the file minus the header size.
diff --git a/clang/include/clang/Driver/OffloadBundler.h b/clang/include/clang/Driver/OffloadBundler.h
index 28473c53662de2c..17df31d31071d99 100644
--- a/clang/include/clang/Driver/OffloadBundler.h
+++ b/clang/include/clang/Driver/OffloadBundler.h
@@ -19,6 +19,7 @@
 
 #include "llvm/Support/Error.h"
 #include "llvm/TargetParser/Triple.h"
+#include <llvm/Support/MemoryBuffer.h>
 #include <string>
 #include <vector>
 
@@ -26,11 +27,15 @@ namespace clang {
 
 class OffloadBundlerConfig {
 public:
+  OffloadBundlerConfig();
+
   bool AllowNoHost = false;
   bool AllowMissingBundles = false;
   bool CheckInputArchive = false;
   bool PrintExternalCommands = false;
   bool HipOpenmpCompatible = false;
+  bool Compress = false;
+  bool Verbose = false;
 
   unsigned BundleAlignment = 1;
   unsigned HostInputIndex = ~0u;
@@ -84,6 +89,38 @@ struct OffloadTargetInfo {
   std::string str() const;
 };
 
+// CompressedOffloadBundle represents the format for the compressed offload
+// bundles.
+//
+// The format is as follows:
+// - Magic Number (4 bytes) - A constant "CCOB".
+// - Version (2 bytes)
+// - Compression Method (2 bytes) - Uses the values from
+// llvm::compression::Format.
+// - Uncompressed Size (4 bytes).
+// - Truncated MD5 Hash (8 bytes).
+// - Compressed Data (variable length).
+
+class CompressedOffloadBundle {
+private:
+  static inline const size_t MagicSize = 4;
+  static inline const size_t VersionFieldSize = sizeof(uint16_t);
+  static inline const size_t MethodFieldSize = sizeof(uint16_t);
+  static inline const size_t SizeFieldSize = sizeof(uint32_t);
+  static inline const size_t HashFieldSize = 8;
+  static inline const size_t HeaderSize = MagicSize + VersionFieldSize +
+                                          MethodFieldSize + SizeFieldSize +
+                                          HashFieldSize;
+  static inline const llvm::StringRef MagicNumber = "CCOB";
+  static inline const uint16_t Version = 1;
+
+public:
+  static llvm::Expected<std::unique_ptr<llvm::MemoryBuffer>>
+  compress(const llvm::MemoryBuffer &Input, bool Verbose = false);
+  static llvm::Expected<std::unique_ptr<llvm::MemoryBuffer>>
+  decompress(const llvm::MemoryBuffer &Input, bool Verbose = false);
+};
+
 } // namespace clang
 
 #endif // LLVM_CLANG_DRIVER_OFFLOADBUNDLER_H
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 0f93479170d73bc..b8295fa47327157 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -1183,6 +1183,10 @@ def fgpu_inline_threshold_EQ : Joined<["-"], "fgpu-inline-threshold=">,
 def fgpu_sanitize : Flag<["-"], "fgpu-sanitize">, Group<f_Group>,
   HelpText<"Enable sanitizer for supported offloading devices">;
 def fno_gpu_sanitize : Flag<["-"], "fno-gpu-sanitize">, Group<f_Group>;
+
+def foffload_compress : Flag<["-"], "foffload-compress">,
+  HelpText<"Compress offload device binaries (HIP only)">;
+def fno_offload_compress : Flag<["-"], "fno-offload-compress">;
 }
 
 // CUDA options
diff --git a/clang/lib/Driver/OffloadBundler.cpp b/clang/lib/Driver/OffloadBundler.cpp
index d11c41605bf39ee..7737c71485f5bf3 100644
--- a/clang/lib/Driver/OffloadBundler.cpp
+++ b/clang/lib/Driver/OffloadBundler.cpp
@@ -21,6 +21,7 @@
 #include "llvm/ADT/ArrayRef.h"
 #include "llvm/ADT/SmallString.h"
 #include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/StringMap.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Object/Archive.h"
@@ -28,17 +29,20 @@
 #include "llvm/Object/Binary.h"
 #include "llvm/Object/ObjectFile.h"
 #include "llvm/Support/Casting.h"
+#include "llvm/Support/Compression.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/EndianStream.h"
 #include "llvm/Support/Errc.h"
 #include "llvm/Support/Error.h"
 #include "llvm/Support/ErrorOr.h"
 #include "llvm/Support/FileSystem.h"
+#include "llvm/Support/MD5.h"
 #include "llvm/Support/MemoryBuffer.h"
 #include "llvm/Support/Path.h"
 #include "llvm/Support/Program.h"
 #include "llvm/Support/Signals.h"
 #include "llvm/Support/StringSaver.h"
+#include "llvm/Support/Timer.h"
 #include "llvm/Support/WithColor.h"
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/TargetParser/Host.h"
@@ -48,6 +52,7 @@
 #include <cstddef>
 #include <cstdint>
 #include <forward_list>
+#include <llvm/Support/Process.h>
 #include <memory>
 #include <set>
 #include <string>
@@ -58,6 +63,10 @@ using namespace llvm;
 using namespace llvm::object;
 using namespace clang;
 
+static llvm::TimerGroup
+    ClangOffloadBundlerTimerGroup("Clang Offload Bundler Timer Group",
+                                  "Timer group for clang offload bundler");
+
 /// Magic string that marks the existence of offloading data.
 #define OFFLOAD_BUNDLER_MAGIC_STR "__CLANG_OFFLOAD_BUNDLE__"
 
@@ -224,20 +233,22 @@ class FileHandler {
 
   /// Write the header of the bundled file to \a OS based on the information
   /// gathered from \a Inputs.
-  virtual Error WriteHeader(raw_fd_ostream &OS,
+  virtual Error WriteHeader(raw_ostream &OS,
                             ArrayRef<std::unique_ptr<MemoryBuffer>> Inputs) = 0;
 
   /// Write the marker that initiates a bundle for the triple \a TargetTriple to
   /// \a OS.
-  virtual Error WriteBundleStart(raw_fd_ostream &OS,
-                                 StringRef TargetTriple) = 0;
+  virtual Error WriteBundleStart(raw_ostream &OS, StringRef TargetTriple) = 0;
 
   /// Write the marker that closes a bundle for the triple \a TargetTriple to \a
   /// OS.
-  virtual Error WriteBundleEnd(raw_fd_ostream &OS, StringRef TargetTriple) = 0;
+  virtual Error WriteBundleEnd(raw_ostream &OS, StringRef TargetTriple) = 0;
 
   /// Write the bundle from \a Input into \a OS.
-  virtual Error WriteBundle(raw_fd_ostream &OS, MemoryBuffer &Input) = 0;
+  virtual Error WriteBundle(raw_ostream &OS, MemoryBuffer &Input) = 0;
+
+  /// Finalize output file.
+  virtual Error finalizeOutputFile() { return Error::success(); }
 
   /// List bundle IDs in \a Input.
   virtual Error listBundleIDs(MemoryBuffer &Input) {
@@ -311,7 +322,7 @@ static uint64_t Read8byteIntegerFromBuffer(StringRef Buffer, size_t pos) {
 }
 
 /// Write 8-byte integers to a buffer in little-endian format.
-static void Write8byteIntegerToBuffer(raw_fd_ostream &OS, uint64_t Val) {
+static void Write8byteIntegerToBuffer(raw_ostream &OS, uint64_t Val) {
   llvm::support::endian::write(OS, Val, llvm::support::little);
 }
 
@@ -435,7 +446,7 @@ class BinaryFileHandler final : public FileHandler {
     return Error::success();
   }
 
-  Error WriteHeader(raw_fd_ostream &OS,
+  Error WriteHeader(raw_ostream &OS,
                     ArrayRef<std::unique_ptr<MemoryBuffer>> Inputs) final {
 
     // Compute size of the header.
@@ -472,19 +483,27 @@ class BinaryFileHandler final : public FileHandler {
     return Error::success();
   }
 
-  Error WriteBundleStart(raw_fd_ostream &OS, StringRef TargetTriple) final {
+  Error WriteBundleStart(raw_ostream &OS, StringRef TargetTriple) final {
     CurWriteBundleTarget = TargetTriple.str();
     return Error::success();
   }
 
-  Error WriteBundleEnd(raw_fd_ostream &OS, StringRef TargetTriple) final {
+  Error WriteBundleEnd(raw_ostream &OS, StringRef TargetTriple) final {
     return Error::success();
   }
 
-  Error WriteBundle(raw_fd_ostream &OS, MemoryBuffer &Input) final {
+  Error WriteBundle(raw_ostream &OS, MemoryBuffer &Input) final {
     auto BI = BundlesInfo[CurWriteBundleTarget];
-    OS.seek(BI.Offset);
+
+    // Pad with 0 to reach specified offset.
+    size_t CurrentPos = OS.tell();
+    size_t PaddingSize = BI.Offset > CurrentPos ? BI.Offset - CurrentPos : 0;
+    for (size_t I = 0; I < PaddingSize; ++I)
+      OS.write('\0');
+    assert(OS.tell() == BI.Offset);
+
     OS.write(Input.getBufferStart(), Input.getBufferSize());
+
     return Error::success();
   }
 };
@@ -607,7 +626,7 @@ class ObjectFileHandler final : public FileHandler {
     return Error::success();
   }
 
-  Error WriteHeader(raw_fd_ostream &OS,
+  Error WriteHeader(raw_ostream &OS,
                     ArrayRef<std::unique_ptr<MemoryBuffer>> Inputs) final {
     assert(BundlerConfig.HostInputIndex != ~0u &&
            "Host input index not defined.");
@@ -617,12 +636,16 @@ class ObjectFileHandler final : public FileHandler {
     return Error::success();
   }
 
-  Error WriteBundleStart(raw_fd_ostream &OS, StringRef TargetTriple) final {
+  Error WriteBundleStart(raw_ostream &OS, StringRef TargetTriple) final {
     ++NumberOfProcessedInputs;
     return Error::success();
   }
 
-  Error WriteBundleEnd(raw_fd_ostream &OS, StringRef TargetTriple) final {
+  Error WriteBundleEnd(raw_ostream &OS, StringRef TargetTriple) final {
+    return Error::success();
+  }
+
+  Error finalizeOutputFile() final {
     assert(NumberOfProcessedInputs <= NumberOfInputs &&
            "Processing more inputs that actually exist!");
     assert(BundlerConfig.HostInputIndex != ~0u &&
@@ -640,10 +663,6 @@ class ObjectFileHandler final : public FileHandler {
     assert(BundlerConfig.ObjcopyPath != "" &&
            "llvm-objcopy path not specified");
 
-    // We write to the output file directly. So, we close it and use the name
-    // to pass down to llvm-objcopy.
-    OS.close();
-
     // Temporary files that need to be removed.
     TempFileHandlerRAII TempFiles;
 
@@ -684,7 +703,7 @@ class ObjectFileHandler final : public FileHandler {
     return Error::success();
   }
 
-  Error WriteBundle(raw_fd_ostream &OS, MemoryBuffer &Input) final {
+  Error WriteBundle(raw_ostream &OS, MemoryBuffer &Input) final {
     return Error::success();
   }
 
@@ -781,22 +800,22 @@ class TextFileHandler final : public FileHandler {
     return Error::success();
   }
 
-  Error WriteHeader(raw_fd_ostream &OS,
+  Error WriteHeader(raw_ostream &OS,
                     ArrayRef<std::unique_ptr<MemoryBuffer>> Inputs) final {
     return Error::success();
   }
 
-  Error WriteBundleStart(raw_fd_ostream &OS, StringRef TargetTriple) final {
+  Error WriteBundleStart(raw_ostream &OS, StringRef TargetTriple) final {
     OS << BundleStartString << TargetTriple << "\n";
     return Error::success();
   }
 
-  Error WriteBundleEnd(raw_fd_ostream &OS, StringRef TargetTriple) final {
+  Error WriteBundleEnd(raw_ostream &OS, StringRef TargetTriple) final {
     OS << BundleEndString << TargetTriple << "\n";
     return Error::success();
   }
 
-  Error WriteBundle(raw_fd_ostream &OS, MemoryBuffer &Input) final {
+  Error WriteBundle(raw_ostream &OS, MemoryBuffer &Input) final {
     OS << Input.getBuffer();
     return Error::success();
   }
@@ -881,6 +900,184 @@ CreateFileHandler(MemoryBuffer &FirstInput,
                            "'" + FilesType + "': invalid file type specified");
 }
 
+OffloadBundlerConfig::OffloadBundlerConfig() {
+  auto IgnoreEnvVarOpt =
+      llvm::sys::Process::GetEnv("OFFLOAD_BUNDLER_IGNORE_ENV_VAR");
+  if (IgnoreEnvVarOpt.has_value() && IgnoreEnvVarOpt.value() == "1")
+    return;
+
+  auto VerboseEnvVarOpt = llvm::sys::Process::GetEnv("OFFLOAD_BUNDLER_VERBOSE");
+  if (VerboseEnvVarOpt.has_value())
+    Verbose = VerboseEnvVarOpt.value() == "1";
+
+  auto CompressEnvVarOpt =
+      llvm::sys::Process::GetEnv("OFFLOAD_BUNDLER_COMPRESS");
+  if (CompressEnvVarOpt.has_value())
+    Compress = CompressEnvVarOpt.value() == "1";
+}
+
+llvm::Expected<std::unique_ptr<llvm::MemoryBuffer>>
+CompressedOffloadBundle::compress(const llvm::MemoryBuffer &Input,
+                                  bool Verbose) {
+  llvm::Timer HashTimer("Hash Calculation Timer", "Hash calculation time",
+                        ClangOffloadBundlerTimerGroup);
+  if (Verbose)
+    HashTimer.startTimer();
+  llvm::MD5 Hash;
+  llvm::MD5::MD5Result Result;
+  Hash.update(Input.getBuffer());
+  Hash.final(Result);
+  uint64_t TruncatedHash = Result.low();
+  if (Verbose)
+    HashTimer.stopTimer();
+
+  SmallVector<uint8_t, 0> CompressedBuffer;
+  auto BufferUint8 = llvm::ArrayRef<uint8_t>(
+      reinterpret_cast<const uint8_t *>(Input.getBuffer().data()),
+      Input.getBuffer().size());
+
+  llvm::compression::Format CompressionFormat;
+
+  if (llvm::compression::zstd::isAvailable())
+    CompressionFormat = llvm::compression::Format::Zstd;
+  else if (llvm::compression::zlib::isAvailable())
+    CompressionFormat = llvm::compression::Format::Zlib;
+  else
+    return createStringError(llvm::inconvertibleErrorCode(),
+                             "Compression not supported");
+
+  llvm::Timer CompressTimer("Compression Timer", "Compression time",
+                            ClangOffloadBundlerTimerGroup);
+  if (Verbose)
+    CompressTimer.startTimer();
+  llvm::compression::compress(CompressionFormat, BufferUint8, CompressedBuffer);
+  if (Verbose)
+    CompressTimer.stopTimer();
+
+  uint16_t CompressionMethod = static_cast<uint16_t>(CompressionFormat);
+  uint32_t UncompressedSize = Input.getBuffer().size();
+
+  SmallVector<char, 0> FinalBuffer;
+  FinalBuffer.append(MagicNumber.begin(), MagicNumber.end());
+  FinalBuffer.append(reinterpret_cast<const char *>(&Version),
+                     reinterpret_cast<const char *>(&Version) +
+                         sizeof(Version));
+  FinalBuffer.append(reinterpret_cast<char *>(&CompressionMethod),
+                     reinterpret_cast<char *>(&CompressionMethod) +
+                         sizeof(CompressionMethod));
+  FinalBuffer.append(reinterpret_cast<char *>(&UncompressedSize),
+                     reinterpret_cast<char *>(&UncompressedSize) +
+                         sizeof(UncompressedSize));
+  FinalBuffer.append(reinterpret_cast<char *>(&TruncatedHash),
+                     reinterpret_cast<char *>(&TruncatedHash) +
+                         sizeof(TruncatedHash));
+  FinalBuffer.append(CompressedBuffer.begin(), CompressedBuffer.end());
+
+  if (Verbose) {
+    auto MethodUsed =
+        CompressionFormat == llvm::compression::Format::Zstd ? "zstd" : "zlib";
+    llvm::errs() << "Compressed bundle format version: " << Version << "\n"
+                 << "Compression method used: " << MethodUsed << "\n"
+                 << "Binary size before compression: " << UncompressedSize
+                 << " bytes\n"
+                 << "Binary size after compression: " << CompressedBuffer.size()
+                 << " bytes\n"
+                 << "Truncated MD5 hash: "
+                 << llvm::format_hex(TruncatedHash, 16) << "\n";
+  }
+
+  return llvm::MemoryBuffer::getMemBufferCopy(
+      llvm::StringRef(FinalBuffer.data(), FinalBuffer.size()));
+}
+
+llvm::Expected<std::unique_ptr<llvm::MemoryBuffer>>
+CompressedOffloadBundle::decompress(const llvm::MemoryBuffer &Input,
+                                    bool Verbose) {
+
+  StringRef Blob = Input.getBuffer();
+
+  if (Blob.size() < HeaderSize) {
+    return llvm::MemoryBuffer::getMemBufferCopy(Blob);
+  }
+  StringRef Magic = Blob.substr(0, MagicSize);
+  if (Magic != MagicNumber) {
+    if (Verbose)
+      llvm::errs() << "Uncompressed bundle.\n";
+    return llvm::MemoryBuffer::getMemBufferCopy(Blob);
+  }
+
+  uint16_t ThisVersion = *reinterpret_cast<const uint16_t *>(
+      Input.getBuffer().data() + MagicNumber.size());
+  uint16_t CompressionMethod = *reinterpret_cast<const uint16_t *>(
+      Blob.data() + MagicSize + VersionFieldSize);
+  uint32_t UncompressedSize = *reinterpret_cast<const uint32_t *>(
+      Blob.data() + MagicSize + VersionFieldSize + MethodFieldSize);
+  uint64_t StoredHash = *reinterpret_cast<const uint64_t *>(
+      Blob.data() + MagicSize + VersionFieldSize + MethodFieldSize +
+      SizeFieldSize);
+
+  llvm::compression::Format CompressionFormat;
+  if (CompressionMethod ==
+      static_cast<uint16_t>(llvm::compression::Format::Zlib))
+    CompressionFormat = llvm::compression::Format::Zlib;
+  else if (CompressionMethod ==
+           static_cast<uint16_t>(llvm::compression::Format::Zstd))
+    CompressionFormat = llvm::compression::Format::Zstd;
+  else
+    return createStringError(inconvertibleErrorCode(),
+                             "Unknown compressing method");
+
+  llvm::Timer DecompressTimer("Decompression Timer", "Decompression time",
+                              ClangOffloadBundlerTimerGroup);
+  if (Verbose)
+    DecompressTimer.startTimer();
+
+  SmallVector<uint8_t, 0> DecompressedData;
+  StringRef CompressedData = Blob.substr(HeaderSize);
+  if (llvm::Error DecompressionError = llvm::compression::decompress(
+          CompressionFormat, llvm::arrayRefFromStringRef(CompressedData),
+          DecompressedData, UncompressedSize))
+    return createStringError(inconvertibleErrorCode(),
+                             "Could not decompress embedded file contents: " +
+                                 llvm::toString(std::move(DecompressionError)));
+
+  if (Verbose) {
+    DecompressTimer.stopTimer();
+
+    // Recalculate MD5 hash
+    llvm::Timer HashRecalcTimer("Hash Recalculation Timer",
+                                "Hash recalculation time",
+                                ClangOffloadBundlerTimerGroup);
+    HashRecalcTimer.startTimer();
+    llvm::MD5 Hash;
+    llvm::MD5::MD5Result Result;
+    Hash.update(llvm::ArrayRef<uint8_t>(DecompressedData.data(),
+                                        DecompressedData.size()));
+    Hash.final(Result);
+    uint64_t RecalculatedHash = Result.low();
+    HashRecalcTimer.stopTimer();
+    bool HashMatch = (StoredHash == RecalculatedHash);
+
+    llvm::errs() << "Compressed bundle format version: " << ThisVersion << "\n"
+                 << "Decompression method: "
+                 << (CompressionFormat == llvm::compression::Format::Zlib
+                         ? "zlib"
+                         : "zstd")
+                 << "\n"
+                 << "Size before decompression: " << CompressedData.size()
+                 << " bytes\n"
+                 << "Size after decompression: " << UncompressedSize
+                 << " bytes\n"
+                 << "Stored hash: " << llvm::format_hex(StoredHash, 16) << "\n"
+                 << "Recalculated hash: "
+                 << llvm::format_hex(RecalculatedHash, 16) << "\n"
+                 << "Hashes match: " << (HashMatch ? "Yes" : "No") << "\n";
+  }
+
+  return llvm::MemoryBuffer::getMemBufferCopy(
+      llvm::toStringRef(DecompressedData));
+}
+
 // List bundle IDs. Return true if an error was found.
 Error OffloadBundler::ListBundleIDsInFile(
     StringRef InputFileName, const OffloadBundlerConfig &BundlerConfig) {
@@ -890,28 +1087,35 @@ Error OffloadBundler::ListBundleIDsInFile(
   if (std::error_cod...
[truncated]

yxsamliu · 2023-09-28T13:23:16Z

ping

A little background: some HIP applications face size restrictions for their packages. Compressing can significantly reduce the bundle size with acceptable overhead of uncompressing. Therefore they need this feature.

jhuber6 · 2023-09-29T15:38:14Z

Unrelated, I need to hurry up and try to make the new driver the default for CUDA and HIP upstream at some point.

I've had some thoughts about how to accomplish this in the past. For the new driver, .llvm.offloading is a single section that contains all of the embedded device code to create a fat binary. It should be possible to use ELF compression to reduce the size of that binary. Can we do something similar here? If memory serves, the clang-offload-bundler does a similar section embedding for object files. This wouldn't work for LLVM-IR host binaries however, since the host binary won't be an ELF. Also it's not exactly portable with COFF.

@MaskRay do you know if ELF compression is feasible for this application?

jhuber6 · 2023-09-29T15:38:50Z

clang/include/clang/Driver/Options.td

@@ -1183,6 +1183,10 @@ def fgpu_inline_threshold_EQ : Joined<["-"], "fgpu-inline-threshold=">,
 def fgpu_sanitize : Flag<["-"], "fgpu-sanitize">, Group<f_Group>,
  HelpText<"Enable sanitizer for supported offloading devices">;
 def fno_gpu_sanitize : Flag<["-"], "fno-gpu-sanitize">, Group<f_Group>;
+
+def foffload_compress : Flag<["-"], "foffload-compress">,


All the other offload options use -- so we should probably stay consistent.

changed to --offload-compress

General comment: can we make sure --offload is not language specific.
The hip in the title makes me worry.

The convention is to introduce a generic option which works for a specific offloading language, and the help message indicates which languages are supported. It is not feasible to support a feature in all offloading languages at once by one developer since a developer is usually only familiar with one offloading language. Once a feature is introduced for one offloading language, it can be extended or adopted by other offloading languages. This is better than each offloading language introducing an individual option for the same purpose.

yxsamliu · 2023-09-29T16:40:28Z

Unrelated, I need to hurry up and try to make the new driver the default for CUDA and HIP upstream at some point.

I've had some thoughts about how to accomplish this in the past. For the new driver, .llvm.offloading is a single section that contains all of the embedded device code to create a fat binary. It should be possible to use ELF compression to reduce the size of that binary. Can we do something similar here? If memory serves, the clang-offload-bundler does a similar section embedding for object files. This wouldn't work for LLVM-IR host binaries however, since the host binary won't be an ELF. Also it's not exactly portable with COFF.

@MaskRay do you know if ELF compression is feasible for this application?

We need to support compression/uncompression of standalone bundled bitcode, and standalone bundled code objects that are not embedded in ELF or COFF binaries. An OS-neutral binary format is preferred.

jhuber6 · 2023-10-02T16:58:24Z

We need to support compression/uncompression of standalone bundled bitcode, and standalone bundled code objects that are not embedded in ELF or COFF binaries. An OS-neutral binary format is preferred.

I think my ideal solution would be LLVM supporting some module metadata such that a section emitted by the backend automatically uses an ELF compressed section, though this wouldn't apply to LTO objects unfortunately. That being said, it's not unreasonable to do this manually for this specified use-case.

I see this introduces a new header to the bundled format, will this break any backwards compatibility? I'm guessing old binaries can check for the LLVM bitcode, ELF, or COFF magic directly instead. If that's the case I'd probably recommend adding your magic number to the file_magic implementation.

yxsamliu · 2023-10-02T18:16:38Z

We need to support compression/uncompression of standalone bundled bitcode, and standalone bundled code objects that are not embedded in ELF or COFF binaries. An OS-neutral binary format is preferred.

I think my ideal solution would be LLVM supporting some module metadata such that a section emitted by the backend automatically uses an ELF compressed section, though this wouldn't apply to LTO objects unfortunately. That being said, it's not unreasonable to do this manually for this specified use-case.

I see this introduces a new header to the bundled format, will this break any backwards compatibility? I'm guessing old binaries can check for the LLVM bitcode, ELF, or COFF magic directly instead. If that's the case I'd probably recommend adding your magic number to the file_magic implementation.

Good point. The compiler and runtime are able to consume the offload bundle files created by the old compiler. If compression is not enabled, the old compiler and runtime will be able to consume it. However, old compiler and runtime will not be able to consume the compressed bundle.

I have added recognizing the clang offload bundle (compressed and uncompressed) to the magic number.

jhuber6 · 2023-10-02T18:19:12Z

clang/lib/Driver/OffloadBundler.cpp

+  uint16_t CompressionMethod = static_cast<uint16_t>(CompressionFormat);
+  uint32_t UncompressedSize = Input.getBuffer().size();
+
+  SmallVector<char, 0> FinalBuffer;


Wouldn't this be easier with a raw_svector_ostream?

Add option -f[no-]offload-compress to clang to enable/disable compression of device binary for HIP. By default it is disabled. Add option -compress to clang-offload-bundler to enable compression of offload bundle. By default it is disabled. When enabled, zstd or zlib is used for compression when available. When disabled, it is NFC compared to previous behavior. The same offload bundle format is used as before. Clang-offload-bundler automatically detects whether the input file to be unbundled is compressed and the compression method and decompress if necessary.

jhuber6

I think in the long term we need to move away from the bundler, which would suggest we probably want something similar in the "new driver". But overall the changes here are pretty self contained and the concept is straightforward.

This reverts commit a1e81d2. Revert "Fix test hip-offload-compress-zlib.hip" This reverts commit ba01ce6. Revert due to sanity fail at https://lab.llvm.org/buildbot/#/builders/5/builds/37188 https://lab.llvm.org/buildbot/#/builders/238/builds/5955 /b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Driver/OffloadBundler.cpp:1012:25: runtime error: load of misaligned address 0xaaaae2d90e7c for type 'const uint64_t' (aka 'const unsigned long'), which requires 8 byte alignment 0xaaaae2d90e7c: note: pointer points here bc 00 00 00 94 dc 29 9a 89 fb ca 2b 78 9c 8b 8f 77 f6 71 f4 73 8f f7 77 73 f3 f1 77 74 89 77 0a ^ #0 0xaaaaba125f70 in clang::CompressedOffloadBundle::decompress(llvm::MemoryBuffer const&, bool) /b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Driver/OffloadBundler.cpp:1012:25 #1 0xaaaaba126150 in clang::OffloadBundler::ListBundleIDsInFile(llvm::StringRef, clang::OffloadBundlerConfig const&) /b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Driver/OffloadBundler.cpp:1089:7 Will reland after fixing it.

mikaelholmen · 2023-10-05T08:04:54Z

Hi,

I noticed that with UBSan this new testcase fails

09:59:08 Failed Tests (1):
09:59:08   Clang :: Driver/clang-offload-bundler-zlib.c

with

09:39:53 ../../clang/lib/Driver/OffloadBundler.cpp:1012:25: runtime error: load of misaligned address 0x55ceca16e46c for type 'const uint64_t' (aka 'const unsigned long'), which requires 8 byte alignment
09:39:53 0x55ceca16e46c: note: pointer points here
09:39:53   bc 00 00 00 94 dc 29 9a  89 fb ca 2b 78 9c 8b 8f  77 f6 71 f4 73 8f f7 77  73 f3 f1 77 74 89 77 0a
09:39:53               ^ 
09:39:53 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../../clang/lib/Driver/OffloadBundler.cpp:1012:25 in

yxsamliu · 2023-10-05T13:08:03Z

[AMD Official Use Only - General] I have reverted the commit. Will fix and reland. Thanks. Sam From: mikaelholmen ***@***.***> Sent: Thursday, October 5, 2023 4:05 AM To: llvm/llvm-project ***@***.***> Cc: Liu, Yaxun (Sam) ***@***.***>; State change ***@***.***> Subject: Re: [llvm/llvm-project] [HIP] Support compressing device binary (PR #67162) Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. Hi, I noticed that with UBSan this new testcase fails 09:59:08 Failed Tests (1): 09:59:08 Clang :: Driver/clang-offload-bundler-zlib.c with 09:39:53 ../../clang/lib/Driver/OffloadBundler.cpp:1012:25: runtime error: load of misaligned address 0x55ceca16e46c for type 'const uint64_t' (aka 'const unsigned long'), which requires 8 byte alignment 09:39:53 0x55ceca16e46c: note: pointer points here 09:39:53 bc 00 00 00 94 dc 29 9a 89 fb ca 2b 78 9c 8b 8f 77 f6 71 f4 73 8f f7 77 73 f3 f1 77 74 89 77 0a 09:39:53 ^ 09:39:53 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../../clang/lib/Driver/OffloadBundler.cpp:1012:25 in — Reply to this email directly, view it on GitHub<#67162 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABO4LZ4HUWWX6BLSBJBNBXTX5ZS3HAVCNFSM6AAAAAA5DLFYLGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBYGMZTAMJYGY>. You are receiving this because you modified the open/close state.Message ID: ***@***.******@***.***>>

jhuber6 · 2023-10-05T13:22:18Z

For reference, I've had similar problems in the past. Usually it happens because the header needs to be aligned on a uint64_t to be read. The ELF section has 8-byte alignment, but that can be incorrect when extracting from something like an archive. I just make a copy in that case https://github.com/llvm/llvm-project/blob/main/llvm/lib/Object/OffloadBinary.cpp#L156.

yxsamliu · 2023-10-05T15:13:48Z

[AMD Official Use Only - General] Thanks. I will fix it by memcpy instead of dereferencing int* casted from char*. Sam From: Joseph Huber ***@***.***> Sent: Thursday, October 5, 2023 9:23 AM To: llvm/llvm-project ***@***.***> Cc: Liu, Yaxun (Sam) ***@***.***>; State change ***@***.***> Subject: Re: [llvm/llvm-project] [HIP] Support compressing device binary (PR #67162) Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. For reference, I've had similar problems in the past. Usually it happens because the header needs to be aligned on a uint64_t to be read. The ELF section has 8-byte alignment, but that can be incorrect when extracting from something like an archive. I just make a copy in that case https://github.com/llvm/llvm-project/blob/main/llvm/lib/Object/OffloadBinary.cpp#L156. — Reply to this email directly, view it on GitHub<#67162 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABO4LZ3TH2F7AQWO2CQH7ZTX52YBPAVCNFSM6AAAAAA5DLFYLGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBYHA4DSNRXGI>. You are receiving this because you modified the open/close state.Message ID: ***@***.******@***.***>>

Original PR: #67162 The commit was reverted due to UB detected by santizer: https://lab.llvm.org/buildbot/#/builders/238/builds/5955 clang/lib/Driver/OffloadBundler.cpp:1012:25: runtime error: load of misaligned address 0xaaaae2d90e7c for type 'const uint64_t' (aka 'const unsigned long'), which requires 8 byte alignment It was fixed by using memcpy instead of dereferencing int* casted from unaligned char*.

Reland "[HIP] Support compressing device binary" Original PR: llvm#67162 The commit was reverted due to UB detected by santizer: https://lab.llvm.org/buildbot/#/builders/238/builds/5955 clang/lib/Driver/OffloadBundler.cpp:1012:25: runtime error: load of misaligned address 0xaaaae2d90e7c for type 'const uint64_t' (aka 'const unsigned long'), which requires 8 byte alignment It was fixed by using memcpy instead of dereferencing int* casted from unaligned char*. Co-Authored-By: Martin Schwaighofer <mschwaig@users.noreply.github.com> (only did the backport)

Original PR: llvm#67162 The commit was reverted due to UB detected by santizer: https://lab.llvm.org/buildbot/#/builders/238/builds/5955 clang/lib/Driver/OffloadBundler.cpp:1012:25: runtime error: load of misaligned address 0xaaaae2d90e7c for type 'const uint64_t' (aka 'const unsigned long'), which requires 8 byte alignment It was fixed by using memcpy instead of dereferencing int* casted from unaligned char*. Note from committer: The original patch is splitted into two parts, one only for clang, and the other only for LLVM. This is to allow easier packaging for Nix. Signed-off-by: Gavin Zhao <git@gzgz.dev>

Original PR: llvm#67162 The commit was reverted due to UB detected by santizer: https://lab.llvm.org/buildbot/#/builders/238/builds/5955 clang/lib/Driver/OffloadBundler.cpp:1012:25: runtime error: load of misaligned address 0xaaaae2d90e7c for type 'const uint64_t' (aka 'const unsigned long'), which requires 8 byte alignment It was fixed by using memcpy instead of dereferencing int* casted from unaligned char*.

yxsamliu requested review from Artem-B and lamb-j September 22, 2023 16:30

llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' labels Sep 22, 2023

yxsamliu force-pushed the compress3 branch from dbe296e to aa1aee8 Compare September 23, 2023 01:53

yxsamliu requested review from MaskRay and jhuber6 September 29, 2023 15:30

jhuber6 reviewed Sep 29, 2023

View reviewed changes

yxsamliu force-pushed the compress3 branch from aa1aee8 to 71dc0a3 Compare October 2, 2023 16:45

yxsamliu force-pushed the compress3 branch from 71dc0a3 to b1c9550 Compare October 2, 2023 18:10

llvmbot added the llvm:binary-utilities label Oct 2, 2023

jhuber6 reviewed Oct 2, 2023

View reviewed changes

yxsamliu force-pushed the compress3 branch from b1c9550 to b4b4480 Compare October 3, 2023 00:46

yxsamliu force-pushed the compress3 branch from b4b4480 to c2d5f80 Compare October 3, 2023 03:35

jhuber6 approved these changes Oct 4, 2023

View reviewed changes

yxsamliu merged commit a1e81d2 into llvm:main Oct 4, 2023

stepthomas mentioned this pull request Oct 10, 2023

AMDGPU stepthomas atomic csub no rtn forms ver2 stepthomas/llvm-project#1

Closed

mschwaig mentioned this pull request Apr 16, 2024

rocmPackages.composable_kernel: compress output NixOS/nixpkgs#299589

Merged

13 tasks

[HIP] Support compressing device binary #67162

[HIP] Support compressing device binary #67162

Uh oh!

Conversation

yxsamliu commented Sep 22, 2023

Uh oh!

llvmbot commented Sep 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yxsamliu commented Sep 28, 2023

Uh oh!

jhuber6 commented Sep 29, 2023

Uh oh!

jhuber6 Sep 29, 2023

Choose a reason for hiding this comment

Uh oh!

yxsamliu Oct 2, 2023

Choose a reason for hiding this comment

Uh oh!

jdoerfert Oct 3, 2023

Choose a reason for hiding this comment

Uh oh!

yxsamliu Oct 3, 2023

Choose a reason for hiding this comment

Uh oh!

yxsamliu commented Sep 29, 2023

Uh oh!

jhuber6 commented Oct 2, 2023

Uh oh!

yxsamliu commented Oct 2, 2023

Uh oh!

jhuber6 Oct 2, 2023

Choose a reason for hiding this comment

Uh oh!

yxsamliu Oct 3, 2023

Choose a reason for hiding this comment

Uh oh!

jhuber6 left a comment

Choose a reason for hiding this comment

Uh oh!

mikaelholmen commented Oct 5, 2023

Uh oh!

yxsamliu commented Oct 5, 2023 via email

Uh oh!

jhuber6 commented Oct 5, 2023

Uh oh!

yxsamliu commented Oct 5, 2023 via email

Uh oh!

Uh oh!

llvmbot commented Sep 22, 2023 •

edited

Loading