Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HIP] Support compressing device binary #67162

Merged
merged 1 commit into from
Oct 4, 2023
Merged

Conversation

yxsamliu
Copy link
Collaborator

Add option -f[no-]offload-compress to clang to enable/disable compression of device binary for HIP. By default it is disabled.

Add option -compress to clang-offload-bundler to enable compression of offload bundle. By default it is disabled.

When enabled, zstd or zlib is used for compression when available.

When disabled, it is NFC compared to previous behavior. The same offload bundle format is used as before.

Clang-offload-bundler automatically detects whether the input file to be unbundled is compressed and the compression method and decompress if necessary.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' labels Sep 22, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Sep 22, 2023

@llvm/pr-subscribers-llvm-binary-utilities
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-driver

Changes

Add option -f[no-]offload-compress to clang to enable/disable compression of device binary for HIP. By default it is disabled.

Add option -compress to clang-offload-bundler to enable compression of offload bundle. By default it is disabled.

When enabled, zstd or zlib is used for compression when available.

When disabled, it is NFC compared to previous behavior. The same offload bundle format is used as before.

Clang-offload-bundler automatically detects whether the input file to be unbundled is compressed and the compression method and decompress if necessary.


Patch is 36.32 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/67162.diff

8 Files Affected:

  • (modified) clang/docs/ClangOffloadBundler.rst (+27)
  • (modified) clang/include/clang/Driver/OffloadBundler.h (+37)
  • (modified) clang/include/clang/Driver/Options.td (+4)
  • (modified) clang/lib/Driver/OffloadBundler.cpp (+293-44)
  • (modified) clang/lib/Driver/ToolChains/Clang.cpp (+7)
  • (added) clang/test/Driver/clang-offload-bundler-zlib.c (+75)
  • (added) clang/test/Driver/clang-offload-bundler-zstd.c (+72)
  • (modified) clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp (+10)
diff --git a/clang/docs/ClangOffloadBundler.rst b/clang/docs/ClangOffloadBundler.rst
index d08bf4b97781fa4..1e21d3e7264d5c3 100644
--- a/clang/docs/ClangOffloadBundler.rst
+++ b/clang/docs/ClangOffloadBundler.rst
@@ -309,3 +309,30 @@ target by comparing bundle ID's. Two bundle ID's are considered compatible if:
   * Their offload kind are the same
   * Their target triple are the same
   * Their GPUArch are the same
+
+Compression and Decompression
+=============================
+
+``clang-offload-bundler`` provides features to compress and decompress the full
+bundle, leveraging inherent redundancies within the bundle entries. Use the
+`-compress` command-line option to enable this compression capability.
+
+The compressed offload bundle begins with a header followed by the compressed binary data:
+
+- **Magic Number (4 bytes)**:
+    This is a unique identifier to distinguish compressed offload bundles. The value is the string 'CCOB' (Compressed Clang Offload Bundle).
+
+- **Version Number (16-bit unsigned int)**:
+    This denotes the version of the compressed offload bundle format. The current version is `1`.
+
+- **Compression Method (16-bit unsigned int)**:
+    This field indicates the compression method used. The value corresponds to either `zlib` or `zstd`, represented as a 16-bit unsigned integer cast from the LLVM compression enumeration.
+
+- **Uncompressed Binary Size (32-bit unsigned int)**:
+    This is the size (in bytes) of the binary data before it was compressed.
+
+- **Hash (64-bit unsigned int)**:
+    This is a 64-bit truncated MD5 hash of the uncompressed binary data. It serves for verification and caching purposes.
+
+- **Compressed Data**:
+    The actual compressed binary data follows the header. Its size can be inferred from the total size of the file minus the header size.
diff --git a/clang/include/clang/Driver/OffloadBundler.h b/clang/include/clang/Driver/OffloadBundler.h
index 28473c53662de2c..17df31d31071d99 100644
--- a/clang/include/clang/Driver/OffloadBundler.h
+++ b/clang/include/clang/Driver/OffloadBundler.h
@@ -19,6 +19,7 @@
 
 #include "llvm/Support/Error.h"
 #include "llvm/TargetParser/Triple.h"
+#include <llvm/Support/MemoryBuffer.h>
 #include <string>
 #include <vector>
 
@@ -26,11 +27,15 @@ namespace clang {
 
 class OffloadBundlerConfig {
 public:
+  OffloadBundlerConfig();
+
   bool AllowNoHost = false;
   bool AllowMissingBundles = false;
   bool CheckInputArchive = false;
   bool PrintExternalCommands = false;
   bool HipOpenmpCompatible = false;
+  bool Compress = false;
+  bool Verbose = false;
 
   unsigned BundleAlignment = 1;
   unsigned HostInputIndex = ~0u;
@@ -84,6 +89,38 @@ struct OffloadTargetInfo {
   std::string str() const;
 };
 
+// CompressedOffloadBundle represents the format for the compressed offload
+// bundles.
+//
+// The format is as follows:
+// - Magic Number (4 bytes) - A constant "CCOB".
+// - Version (2 bytes)
+// - Compression Method (2 bytes) - Uses the values from
+// llvm::compression::Format.
+// - Uncompressed Size (4 bytes).
+// - Truncated MD5 Hash (8 bytes).
+// - Compressed Data (variable length).
+
+class CompressedOffloadBundle {
+private:
+  static inline const size_t MagicSize = 4;
+  static inline const size_t VersionFieldSize = sizeof(uint16_t);
+  static inline const size_t MethodFieldSize = sizeof(uint16_t);
+  static inline const size_t SizeFieldSize = sizeof(uint32_t);
+  static inline const size_t HashFieldSize = 8;
+  static inline const size_t HeaderSize = MagicSize + VersionFieldSize +
+                                          MethodFieldSize + SizeFieldSize +
+                                          HashFieldSize;
+  static inline const llvm::StringRef MagicNumber = "CCOB";
+  static inline const uint16_t Version = 1;
+
+public:
+  static llvm::Expected<std::unique_ptr<llvm::MemoryBuffer>>
+  compress(const llvm::MemoryBuffer &Input, bool Verbose = false);
+  static llvm::Expected<std::unique_ptr<llvm::MemoryBuffer>>
+  decompress(const llvm::MemoryBuffer &Input, bool Verbose = false);
+};
+
 } // namespace clang
 
 #endif // LLVM_CLANG_DRIVER_OFFLOADBUNDLER_H
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 0f93479170d73bc..b8295fa47327157 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -1183,6 +1183,10 @@ def fgpu_inline_threshold_EQ : Joined<["-"], "fgpu-inline-threshold=">,
 def fgpu_sanitize : Flag<["-"], "fgpu-sanitize">, Group<f_Group>,
   HelpText<"Enable sanitizer for supported offloading devices">;
 def fno_gpu_sanitize : Flag<["-"], "fno-gpu-sanitize">, Group<f_Group>;
+
+def foffload_compress : Flag<["-"], "foffload-compress">,
+  HelpText<"Compress offload device binaries (HIP only)">;
+def fno_offload_compress : Flag<["-"], "fno-offload-compress">;
 }
 
 // CUDA options
diff --git a/clang/lib/Driver/OffloadBundler.cpp b/clang/lib/Driver/OffloadBundler.cpp
index d11c41605bf39ee..7737c71485f5bf3 100644
--- a/clang/lib/Driver/OffloadBundler.cpp
+++ b/clang/lib/Driver/OffloadBundler.cpp
@@ -21,6 +21,7 @@
 #include "llvm/ADT/ArrayRef.h"
 #include "llvm/ADT/SmallString.h"
 #include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/StringMap.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Object/Archive.h"
@@ -28,17 +29,20 @@
 #include "llvm/Object/Binary.h"
 #include "llvm/Object/ObjectFile.h"
 #include "llvm/Support/Casting.h"
+#include "llvm/Support/Compression.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/EndianStream.h"
 #include "llvm/Support/Errc.h"
 #include "llvm/Support/Error.h"
 #include "llvm/Support/ErrorOr.h"
 #include "llvm/Support/FileSystem.h"
+#include "llvm/Support/MD5.h"
 #include "llvm/Support/MemoryBuffer.h"
 #include "llvm/Support/Path.h"
 #include "llvm/Support/Program.h"
 #include "llvm/Support/Signals.h"
 #include "llvm/Support/StringSaver.h"
+#include "llvm/Support/Timer.h"
 #include "llvm/Support/WithColor.h"
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/TargetParser/Host.h"
@@ -48,6 +52,7 @@
 #include <cstddef>
 #include <cstdint>
 #include <forward_list>
+#include <llvm/Support/Process.h>
 #include <memory>
 #include <set>
 #include <string>
@@ -58,6 +63,10 @@ using namespace llvm;
 using namespace llvm::object;
 using namespace clang;
 
+static llvm::TimerGroup
+    ClangOffloadBundlerTimerGroup("Clang Offload Bundler Timer Group",
+                                  "Timer group for clang offload bundler");
+
 /// Magic string that marks the existence of offloading data.
 #define OFFLOAD_BUNDLER_MAGIC_STR "__CLANG_OFFLOAD_BUNDLE__"
 
@@ -224,20 +233,22 @@ class FileHandler {
 
   /// Write the header of the bundled file to \a OS based on the information
   /// gathered from \a Inputs.
-  virtual Error WriteHeader(raw_fd_ostream &OS,
+  virtual Error WriteHeader(raw_ostream &OS,
                             ArrayRef<std::unique_ptr<MemoryBuffer>> Inputs) = 0;
 
   /// Write the marker that initiates a bundle for the triple \a TargetTriple to
   /// \a OS.
-  virtual Error WriteBundleStart(raw_fd_ostream &OS,
-                                 StringRef TargetTriple) = 0;
+  virtual Error WriteBundleStart(raw_ostream &OS, StringRef TargetTriple) = 0;
 
   /// Write the marker that closes a bundle for the triple \a TargetTriple to \a
   /// OS.
-  virtual Error WriteBundleEnd(raw_fd_ostream &OS, StringRef TargetTriple) = 0;
+  virtual Error WriteBundleEnd(raw_ostream &OS, StringRef TargetTriple) = 0;
 
   /// Write the bundle from \a Input into \a OS.
-  virtual Error WriteBundle(raw_fd_ostream &OS, MemoryBuffer &Input) = 0;
+  virtual Error WriteBundle(raw_ostream &OS, MemoryBuffer &Input) = 0;
+
+  /// Finalize output file.
+  virtual Error finalizeOutputFile() { return Error::success(); }
 
   /// List bundle IDs in \a Input.
   virtual Error listBundleIDs(MemoryBuffer &Input) {
@@ -311,7 +322,7 @@ static uint64_t Read8byteIntegerFromBuffer(StringRef Buffer, size_t pos) {
 }
 
 /// Write 8-byte integers to a buffer in little-endian format.
-static void Write8byteIntegerToBuffer(raw_fd_ostream &OS, uint64_t Val) {
+static void Write8byteIntegerToBuffer(raw_ostream &OS, uint64_t Val) {
   llvm::support::endian::write(OS, Val, llvm::support::little);
 }
 
@@ -435,7 +446,7 @@ class BinaryFileHandler final : public FileHandler {
     return Error::success();
   }
 
-  Error WriteHeader(raw_fd_ostream &OS,
+  Error WriteHeader(raw_ostream &OS,
                     ArrayRef<std::unique_ptr<MemoryBuffer>> Inputs) final {
 
     // Compute size of the header.
@@ -472,19 +483,27 @@ class BinaryFileHandler final : public FileHandler {
     return Error::success();
   }
 
-  Error WriteBundleStart(raw_fd_ostream &OS, StringRef TargetTriple) final {
+  Error WriteBundleStart(raw_ostream &OS, StringRef TargetTriple) final {
     CurWriteBundleTarget = TargetTriple.str();
     return Error::success();
   }
 
-  Error WriteBundleEnd(raw_fd_ostream &OS, StringRef TargetTriple) final {
+  Error WriteBundleEnd(raw_ostream &OS, StringRef TargetTriple) final {
     return Error::success();
   }
 
-  Error WriteBundle(raw_fd_ostream &OS, MemoryBuffer &Input) final {
+  Error WriteBundle(raw_ostream &OS, MemoryBuffer &Input) final {
     auto BI = BundlesInfo[CurWriteBundleTarget];
-    OS.seek(BI.Offset);
+
+    // Pad with 0 to reach specified offset.
+    size_t CurrentPos = OS.tell();
+    size_t PaddingSize = BI.Offset > CurrentPos ? BI.Offset - CurrentPos : 0;
+    for (size_t I = 0; I < PaddingSize; ++I)
+      OS.write('\0');
+    assert(OS.tell() == BI.Offset);
+
     OS.write(Input.getBufferStart(), Input.getBufferSize());
+
     return Error::success();
   }
 };
@@ -607,7 +626,7 @@ class ObjectFileHandler final : public FileHandler {
     return Error::success();
   }
 
-  Error WriteHeader(raw_fd_ostream &OS,
+  Error WriteHeader(raw_ostream &OS,
                     ArrayRef<std::unique_ptr<MemoryBuffer>> Inputs) final {
     assert(BundlerConfig.HostInputIndex != ~0u &&
            "Host input index not defined.");
@@ -617,12 +636,16 @@ class ObjectFileHandler final : public FileHandler {
     return Error::success();
   }
 
-  Error WriteBundleStart(raw_fd_ostream &OS, StringRef TargetTriple) final {
+  Error WriteBundleStart(raw_ostream &OS, StringRef TargetTriple) final {
     ++NumberOfProcessedInputs;
     return Error::success();
   }
 
-  Error WriteBundleEnd(raw_fd_ostream &OS, StringRef TargetTriple) final {
+  Error WriteBundleEnd(raw_ostream &OS, StringRef TargetTriple) final {
+    return Error::success();
+  }
+
+  Error finalizeOutputFile() final {
     assert(NumberOfProcessedInputs <= NumberOfInputs &&
            "Processing more inputs that actually exist!");
     assert(BundlerConfig.HostInputIndex != ~0u &&
@@ -640,10 +663,6 @@ class ObjectFileHandler final : public FileHandler {
     assert(BundlerConfig.ObjcopyPath != "" &&
            "llvm-objcopy path not specified");
 
-    // We write to the output file directly. So, we close it and use the name
-    // to pass down to llvm-objcopy.
-    OS.close();
-
     // Temporary files that need to be removed.
     TempFileHandlerRAII TempFiles;
 
@@ -684,7 +703,7 @@ class ObjectFileHandler final : public FileHandler {
     return Error::success();
   }
 
-  Error WriteBundle(raw_fd_ostream &OS, MemoryBuffer &Input) final {
+  Error WriteBundle(raw_ostream &OS, MemoryBuffer &Input) final {
     return Error::success();
   }
 
@@ -781,22 +800,22 @@ class TextFileHandler final : public FileHandler {
     return Error::success();
   }
 
-  Error WriteHeader(raw_fd_ostream &OS,
+  Error WriteHeader(raw_ostream &OS,
                     ArrayRef<std::unique_ptr<MemoryBuffer>> Inputs) final {
     return Error::success();
   }
 
-  Error WriteBundleStart(raw_fd_ostream &OS, StringRef TargetTriple) final {
+  Error WriteBundleStart(raw_ostream &OS, StringRef TargetTriple) final {
     OS << BundleStartString << TargetTriple << "\n";
     return Error::success();
   }
 
-  Error WriteBundleEnd(raw_fd_ostream &OS, StringRef TargetTriple) final {
+  Error WriteBundleEnd(raw_ostream &OS, StringRef TargetTriple) final {
     OS << BundleEndString << TargetTriple << "\n";
     return Error::success();
   }
 
-  Error WriteBundle(raw_fd_ostream &OS, MemoryBuffer &Input) final {
+  Error WriteBundle(raw_ostream &OS, MemoryBuffer &Input) final {
     OS << Input.getBuffer();
     return Error::success();
   }
@@ -881,6 +900,184 @@ CreateFileHandler(MemoryBuffer &FirstInput,
                            "'" + FilesType + "': invalid file type specified");
 }
 
+OffloadBundlerConfig::OffloadBundlerConfig() {
+  auto IgnoreEnvVarOpt =
+      llvm::sys::Process::GetEnv("OFFLOAD_BUNDLER_IGNORE_ENV_VAR");
+  if (IgnoreEnvVarOpt.has_value() && IgnoreEnvVarOpt.value() == "1")
+    return;
+
+  auto VerboseEnvVarOpt = llvm::sys::Process::GetEnv("OFFLOAD_BUNDLER_VERBOSE");
+  if (VerboseEnvVarOpt.has_value())
+    Verbose = VerboseEnvVarOpt.value() == "1";
+
+  auto CompressEnvVarOpt =
+      llvm::sys::Process::GetEnv("OFFLOAD_BUNDLER_COMPRESS");
+  if (CompressEnvVarOpt.has_value())
+    Compress = CompressEnvVarOpt.value() == "1";
+}
+
+llvm::Expected<std::unique_ptr<llvm::MemoryBuffer>>
+CompressedOffloadBundle::compress(const llvm::MemoryBuffer &Input,
+                                  bool Verbose) {
+  llvm::Timer HashTimer("Hash Calculation Timer", "Hash calculation time",
+                        ClangOffloadBundlerTimerGroup);
+  if (Verbose)
+    HashTimer.startTimer();
+  llvm::MD5 Hash;
+  llvm::MD5::MD5Result Result;
+  Hash.update(Input.getBuffer());
+  Hash.final(Result);
+  uint64_t TruncatedHash = Result.low();
+  if (Verbose)
+    HashTimer.stopTimer();
+
+  SmallVector<uint8_t, 0> CompressedBuffer;
+  auto BufferUint8 = llvm::ArrayRef<uint8_t>(
+      reinterpret_cast<const uint8_t *>(Input.getBuffer().data()),
+      Input.getBuffer().size());
+
+  llvm::compression::Format CompressionFormat;
+
+  if (llvm::compression::zstd::isAvailable())
+    CompressionFormat = llvm::compression::Format::Zstd;
+  else if (llvm::compression::zlib::isAvailable())
+    CompressionFormat = llvm::compression::Format::Zlib;
+  else
+    return createStringError(llvm::inconvertibleErrorCode(),
+                             "Compression not supported");
+
+  llvm::Timer CompressTimer("Compression Timer", "Compression time",
+                            ClangOffloadBundlerTimerGroup);
+  if (Verbose)
+    CompressTimer.startTimer();
+  llvm::compression::compress(CompressionFormat, BufferUint8, CompressedBuffer);
+  if (Verbose)
+    CompressTimer.stopTimer();
+
+  uint16_t CompressionMethod = static_cast<uint16_t>(CompressionFormat);
+  uint32_t UncompressedSize = Input.getBuffer().size();
+
+  SmallVector<char, 0> FinalBuffer;
+  FinalBuffer.append(MagicNumber.begin(), MagicNumber.end());
+  FinalBuffer.append(reinterpret_cast<const char *>(&Version),
+                     reinterpret_cast<const char *>(&Version) +
+                         sizeof(Version));
+  FinalBuffer.append(reinterpret_cast<char *>(&CompressionMethod),
+                     reinterpret_cast<char *>(&CompressionMethod) +
+                         sizeof(CompressionMethod));
+  FinalBuffer.append(reinterpret_cast<char *>(&UncompressedSize),
+                     reinterpret_cast<char *>(&UncompressedSize) +
+                         sizeof(UncompressedSize));
+  FinalBuffer.append(reinterpret_cast<char *>(&TruncatedHash),
+                     reinterpret_cast<char *>(&TruncatedHash) +
+                         sizeof(TruncatedHash));
+  FinalBuffer.append(CompressedBuffer.begin(), CompressedBuffer.end());
+
+  if (Verbose) {
+    auto MethodUsed =
+        CompressionFormat == llvm::compression::Format::Zstd ? "zstd" : "zlib";
+    llvm::errs() << "Compressed bundle format version: " << Version << "\n"
+                 << "Compression method used: " << MethodUsed << "\n"
+                 << "Binary size before compression: " << UncompressedSize
+                 << " bytes\n"
+                 << "Binary size after compression: " << CompressedBuffer.size()
+                 << " bytes\n"
+                 << "Truncated MD5 hash: "
+                 << llvm::format_hex(TruncatedHash, 16) << "\n";
+  }
+
+  return llvm::MemoryBuffer::getMemBufferCopy(
+      llvm::StringRef(FinalBuffer.data(), FinalBuffer.size()));
+}
+
+llvm::Expected<std::unique_ptr<llvm::MemoryBuffer>>
+CompressedOffloadBundle::decompress(const llvm::MemoryBuffer &Input,
+                                    bool Verbose) {
+
+  StringRef Blob = Input.getBuffer();
+
+  if (Blob.size() < HeaderSize) {
+    return llvm::MemoryBuffer::getMemBufferCopy(Blob);
+  }
+  StringRef Magic = Blob.substr(0, MagicSize);
+  if (Magic != MagicNumber) {
+    if (Verbose)
+      llvm::errs() << "Uncompressed bundle.\n";
+    return llvm::MemoryBuffer::getMemBufferCopy(Blob);
+  }
+
+  uint16_t ThisVersion = *reinterpret_cast<const uint16_t *>(
+      Input.getBuffer().data() + MagicNumber.size());
+  uint16_t CompressionMethod = *reinterpret_cast<const uint16_t *>(
+      Blob.data() + MagicSize + VersionFieldSize);
+  uint32_t UncompressedSize = *reinterpret_cast<const uint32_t *>(
+      Blob.data() + MagicSize + VersionFieldSize + MethodFieldSize);
+  uint64_t StoredHash = *reinterpret_cast<const uint64_t *>(
+      Blob.data() + MagicSize + VersionFieldSize + MethodFieldSize +
+      SizeFieldSize);
+
+  llvm::compression::Format CompressionFormat;
+  if (CompressionMethod ==
+      static_cast<uint16_t>(llvm::compression::Format::Zlib))
+    CompressionFormat = llvm::compression::Format::Zlib;
+  else if (CompressionMethod ==
+           static_cast<uint16_t>(llvm::compression::Format::Zstd))
+    CompressionFormat = llvm::compression::Format::Zstd;
+  else
+    return createStringError(inconvertibleErrorCode(),
+                             "Unknown compressing method");
+
+  llvm::Timer DecompressTimer("Decompression Timer", "Decompression time",
+                              ClangOffloadBundlerTimerGroup);
+  if (Verbose)
+    DecompressTimer.startTimer();
+
+  SmallVector<uint8_t, 0> DecompressedData;
+  StringRef CompressedData = Blob.substr(HeaderSize);
+  if (llvm::Error DecompressionError = llvm::compression::decompress(
+          CompressionFormat, llvm::arrayRefFromStringRef(CompressedData),
+          DecompressedData, UncompressedSize))
+    return createStringError(inconvertibleErrorCode(),
+                             "Could not decompress embedded file contents: " +
+                                 llvm::toString(std::move(DecompressionError)));
+
+  if (Verbose) {
+    DecompressTimer.stopTimer();
+
+    // Recalculate MD5 hash
+    llvm::Timer HashRecalcTimer("Hash Recalculation Timer",
+                                "Hash recalculation time",
+                                ClangOffloadBundlerTimerGroup);
+    HashRecalcTimer.startTimer();
+    llvm::MD5 Hash;
+    llvm::MD5::MD5Result Result;
+    Hash.update(llvm::ArrayRef<uint8_t>(DecompressedData.data(),
+                                        DecompressedData.size()));
+    Hash.final(Result);
+    uint64_t RecalculatedHash = Result.low();
+    HashRecalcTimer.stopTimer();
+    bool HashMatch = (StoredHash == RecalculatedHash);
+
+    llvm::errs() << "Compressed bundle format version: " << ThisVersion << "\n"
+                 << "Decompression method: "
+                 << (CompressionFormat == llvm::compression::Format::Zlib
+                         ? "zlib"
+                         : "zstd")
+                 << "\n"
+                 << "Size before decompression: " << CompressedData.size()
+                 << " bytes\n"
+                 << "Size after decompression: " << UncompressedSize
+                 << " bytes\n"
+                 << "Stored hash: " << llvm::format_hex(StoredHash, 16) << "\n"
+                 << "Recalculated hash: "
+                 << llvm::format_hex(RecalculatedHash, 16) << "\n"
+                 << "Hashes match: " << (HashMatch ? "Yes" : "No") << "\n";
+  }
+
+  return llvm::MemoryBuffer::getMemBufferCopy(
+      llvm::toStringRef(DecompressedData));
+}
+
 // List bundle IDs. Return true if an error was found.
 Error OffloadBundler::ListBundleIDsInFile(
     StringRef InputFileName, const OffloadBundlerConfig &BundlerConfig) {
@@ -890,28 +1087,35 @@ Error OffloadBundler::ListBundleIDsInFile(
   if (std::error_cod...
[truncated]

@yxsamliu
Copy link
Collaborator Author

ping

A little background: some HIP applications face size restrictions for their packages. Compressing can significantly reduce the bundle size with acceptable overhead of uncompressing. Therefore they need this feature.

@jhuber6
Copy link
Contributor

jhuber6 commented Sep 29, 2023

Unrelated, I need to hurry up and try to make the new driver the default for CUDA and HIP upstream at some point.

I've had some thoughts about how to accomplish this in the past. For the new driver, .llvm.offloading is a single section that contains all of the embedded device code to create a fat binary. It should be possible to use ELF compression to reduce the size of that binary. Can we do something similar here? If memory serves, the clang-offload-bundler does a similar section embedding for object files. This wouldn't work for LLVM-IR host binaries however, since the host binary won't be an ELF. Also it's not exactly portable with COFF.

@MaskRay do you know if ELF compression is feasible for this application?

@@ -1183,6 +1183,10 @@ def fgpu_inline_threshold_EQ : Joined<["-"], "fgpu-inline-threshold=">,
def fgpu_sanitize : Flag<["-"], "fgpu-sanitize">, Group<f_Group>,
HelpText<"Enable sanitizer for supported offloading devices">;
def fno_gpu_sanitize : Flag<["-"], "fno-gpu-sanitize">, Group<f_Group>;

def foffload_compress : Flag<["-"], "foffload-compress">,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the other offload options use -- so we should probably stay consistent.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to --offload-compress

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment: can we make sure --offload is not language specific.
The hip in the title makes me worry.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention is to introduce a generic option which works for a specific offloading language, and the help message indicates which languages are supported. It is not feasible to support a feature in all offloading languages at once by one developer since a developer is usually only familiar with one offloading language. Once a feature is introduced for one offloading language, it can be extended or adopted by other offloading languages. This is better than each offloading language introducing an individual option for the same purpose.

@yxsamliu
Copy link
Collaborator Author

Unrelated, I need to hurry up and try to make the new driver the default for CUDA and HIP upstream at some point.

I've had some thoughts about how to accomplish this in the past. For the new driver, .llvm.offloading is a single section that contains all of the embedded device code to create a fat binary. It should be possible to use ELF compression to reduce the size of that binary. Can we do something similar here? If memory serves, the clang-offload-bundler does a similar section embedding for object files. This wouldn't work for LLVM-IR host binaries however, since the host binary won't be an ELF. Also it's not exactly portable with COFF.

@MaskRay do you know if ELF compression is feasible for this application?

We need to support compression/uncompression of standalone bundled bitcode, and standalone bundled code objects that are not embedded in ELF or COFF binaries. An OS-neutral binary format is preferred.

@jhuber6
Copy link
Contributor

jhuber6 commented Oct 2, 2023

We need to support compression/uncompression of standalone bundled bitcode, and standalone bundled code objects that are not embedded in ELF or COFF binaries. An OS-neutral binary format is preferred.

I think my ideal solution would be LLVM supporting some module metadata such that a section emitted by the backend automatically uses an ELF compressed section, though this wouldn't apply to LTO objects unfortunately. That being said, it's not unreasonable to do this manually for this specified use-case.

I see this introduces a new header to the bundled format, will this break any backwards compatibility? I'm guessing old binaries can check for the LLVM bitcode, ELF, or COFF magic directly instead. If that's the case I'd probably recommend adding your magic number to the file_magic implementation.

@yxsamliu
Copy link
Collaborator Author

yxsamliu commented Oct 2, 2023

We need to support compression/uncompression of standalone bundled bitcode, and standalone bundled code objects that are not embedded in ELF or COFF binaries. An OS-neutral binary format is preferred.

I think my ideal solution would be LLVM supporting some module metadata such that a section emitted by the backend automatically uses an ELF compressed section, though this wouldn't apply to LTO objects unfortunately. That being said, it's not unreasonable to do this manually for this specified use-case.

I see this introduces a new header to the bundled format, will this break any backwards compatibility? I'm guessing old binaries can check for the LLVM bitcode, ELF, or COFF magic directly instead. If that's the case I'd probably recommend adding your magic number to the file_magic implementation.

Good point. The compiler and runtime are able to consume the offload bundle files created by the old compiler. If compression is not enabled, the old compiler and runtime will be able to consume it. However, old compiler and runtime will not be able to consume the compressed bundle.

I have added recognizing the clang offload bundle (compressed and uncompressed) to the magic number.

uint16_t CompressionMethod = static_cast<uint16_t>(CompressionFormat);
uint32_t UncompressedSize = Input.getBuffer().size();

SmallVector<char, 0> FinalBuffer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be easier with a raw_svector_ostream?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. fixed

Add option -f[no-]offload-compress to clang to enable/disable
compression of device binary for HIP. By default it is disabled.

Add option -compress to clang-offload-bundler to enable
compression of offload bundle. By default it is disabled.

When enabled, zstd or zlib is used for compression when
available.

When disabled, it is NFC compared to previous behavior. The
same offload bundle format is used as before.

Clang-offload-bundler automatically detects whether the
input file to be unbundled is compressed and the compression
method and decompress if necessary.
Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the long term we need to move away from the bundler, which would suggest we probably want something similar in the "new driver". But overall the changes here are pretty self contained and the concept is straightforward.

@yxsamliu yxsamliu merged commit a1e81d2 into llvm:main Oct 4, 2023
3 checks passed
yxsamliu added a commit that referenced this pull request Oct 5, 2023
This reverts commit a1e81d2.

Revert "Fix test hip-offload-compress-zlib.hip"

This reverts commit ba01ce6.

Revert due to sanity fail at

https://lab.llvm.org/buildbot/#/builders/5/builds/37188

https://lab.llvm.org/buildbot/#/builders/238/builds/5955

/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Driver/OffloadBundler.cpp:1012:25: runtime error: load of misaligned address 0xaaaae2d90e7c for type 'const uint64_t' (aka 'const unsigned long'), which requires 8 byte alignment
0xaaaae2d90e7c: note: pointer points here
  bc 00 00 00 94 dc 29 9a  89 fb ca 2b 78 9c 8b 8f  77 f6 71 f4 73 8f f7 77  73 f3 f1 77 74 89 77 0a
              ^
    #0 0xaaaaba125f70 in clang::CompressedOffloadBundle::decompress(llvm::MemoryBuffer const&, bool) /b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Driver/OffloadBundler.cpp:1012:25
    #1 0xaaaaba126150 in clang::OffloadBundler::ListBundleIDsInFile(llvm::StringRef, clang::OffloadBundlerConfig const&) /b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Driver/OffloadBundler.cpp:1089:7

Will reland after fixing it.
@mikaelholmen
Copy link
Collaborator

Hi,

I noticed that with UBSan this new testcase fails

09:59:08 Failed Tests (1):
09:59:08   Clang :: Driver/clang-offload-bundler-zlib.c

with

09:39:53 ../../clang/lib/Driver/OffloadBundler.cpp:1012:25: runtime error: load of misaligned address 0x55ceca16e46c for type 'const uint64_t' (aka 'const unsigned long'), which requires 8 byte alignment
09:39:53 0x55ceca16e46c: note: pointer points here
09:39:53   bc 00 00 00 94 dc 29 9a  89 fb ca 2b 78 9c 8b 8f  77 f6 71 f4 73 8f f7 77  73 f3 f1 77 74 89 77 0a
09:39:53               ^ 
09:39:53 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../../clang/lib/Driver/OffloadBundler.cpp:1012:25 in 

@yxsamliu
Copy link
Collaborator Author

yxsamliu commented Oct 5, 2023 via email

@jhuber6
Copy link
Contributor

jhuber6 commented Oct 5, 2023

For reference, I've had similar problems in the past. Usually it happens because the header needs to be aligned on a uint64_t to be read. The ELF section has 8-byte alignment, but that can be incorrect when extracting from something like an archive. I just make a copy in that case https://github.com/llvm/llvm-project/blob/main/llvm/lib/Object/OffloadBinary.cpp#L156.

@yxsamliu
Copy link
Collaborator Author

yxsamliu commented Oct 5, 2023 via email

yxsamliu added a commit that referenced this pull request Oct 5, 2023
Original PR: #67162

The commit was reverted due to UB detected by santizer:

https://lab.llvm.org/buildbot/#/builders/238/builds/5955

clang/lib/Driver/OffloadBundler.cpp:1012:25: runtime error:
 load of misaligned address 0xaaaae2d90e7c for type
 'const uint64_t' (aka 'const unsigned long'), which
 requires 8 byte alignment

It was fixed by using memcpy instead of dereferencing int*
casted from unaligned char*.
mschwaig pushed a commit to mschwaig/llvm-project that referenced this pull request Apr 4, 2024
Reland "[HIP] Support compressing device binary"

Original PR: llvm#67162

The commit was reverted due to UB detected by santizer:

https://lab.llvm.org/buildbot/#/builders/238/builds/5955

clang/lib/Driver/OffloadBundler.cpp:1012:25: runtime error:
 load of misaligned address 0xaaaae2d90e7c for type
 'const uint64_t' (aka 'const unsigned long'), which
 requires 8 byte alignment

It was fixed by using memcpy instead of dereferencing int*
casted from unaligned char*.

Co-Authored-By: Martin Schwaighofer <mschwaig@users.noreply.github.com>
(only did the backport)
GZGavinZhao pushed a commit to GZGavinZhao/rocm-llvm-project that referenced this pull request Apr 21, 2024
Original PR: llvm#67162

The commit was reverted due to UB detected by santizer:

https://lab.llvm.org/buildbot/#/builders/238/builds/5955

clang/lib/Driver/OffloadBundler.cpp:1012:25: runtime error:
 load of misaligned address 0xaaaae2d90e7c for type
 'const uint64_t' (aka 'const unsigned long'), which
 requires 8 byte alignment

It was fixed by using memcpy instead of dereferencing int*
casted from unaligned char*.

Note from committer:
The original patch is splitted into two parts, one only for clang, and
the other only for LLVM. This is to allow easier packaging for Nix.

Signed-off-by: Gavin Zhao <git@gzgz.dev>
GZGavinZhao pushed a commit to GZGavinZhao/rocm-llvm-project that referenced this pull request Apr 21, 2024
Original PR: llvm#67162

The commit was reverted due to UB detected by santizer:

https://lab.llvm.org/buildbot/#/builders/238/builds/5955

clang/lib/Driver/OffloadBundler.cpp:1012:25: runtime error:
 load of misaligned address 0xaaaae2d90e7c for type
 'const uint64_t' (aka 'const unsigned long'), which
 requires 8 byte alignment

It was fixed by using memcpy instead of dereferencing int*
casted from unaligned char*.

Note from committer:
The original patch is splitted into two parts, one only for clang, and
the other only for LLVM. This is to allow easier packaging for Nix.

Signed-off-by: Gavin Zhao <git@gzgz.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category llvm:binary-utilities
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants