-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LLVM] add LZMA for compression/decompression #83297
Conversation
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-driver Author: Yaxun (Sam) Liu (yxsamliu) ChangesLZMA (Lempel-Ziv/Markov-chain Algorithm) provides better comparession rate than zstd and zlib for clang-offload-bundler bundles which often contains large number of similar entries. This patch adds liblzma as an alternative to the existing compression/decompression methods zlib and zstd to LLVM and let clang-offload-bundler use it as preferred compression/decompression method. Full diff: https://github.com/llvm/llvm-project/pull/83297.diff 15 Files Affected:
diff --git a/clang/lib/Driver/OffloadBundler.cpp b/clang/lib/Driver/OffloadBundler.cpp
index 99a34d25cfcd56..4497944f70c42d 100644
--- a/clang/lib/Driver/OffloadBundler.cpp
+++ b/clang/lib/Driver/OffloadBundler.cpp
@@ -943,7 +943,9 @@ CompressedOffloadBundle::compress(const llvm::MemoryBuffer &Input,
llvm::compression::Format CompressionFormat;
- if (llvm::compression::zstd::isAvailable())
+ if (llvm::compression::lzma::isAvailable())
+ CompressionFormat = llvm::compression::Format::Lzma;
+ else if (llvm::compression::zstd::isAvailable())
CompressionFormat = llvm::compression::Format::Zstd;
else if (llvm::compression::zlib::isAvailable())
CompressionFormat = llvm::compression::Format::Zlib;
@@ -977,7 +979,10 @@ CompressedOffloadBundle::compress(const llvm::MemoryBuffer &Input,
if (Verbose) {
auto MethodUsed =
- CompressionFormat == llvm::compression::Format::Zstd ? "zstd" : "zlib";
+ CompressionFormat == llvm::compression::Format::Lzma
+ ? "lzma"
+ : (CompressionFormat == llvm::compression::Format::Zstd ? "zstd"
+ : "zlib");
llvm::errs() << "Compressed bundle format version: " << Version << "\n"
<< "Compression method used: " << MethodUsed << "\n"
<< "Binary size before compression: " << UncompressedSize
@@ -1026,7 +1031,10 @@ CompressedOffloadBundle::decompress(const llvm::MemoryBuffer &Input,
llvm::compression::Format CompressionFormat;
if (CompressionMethod ==
- static_cast<uint16_t>(llvm::compression::Format::Zlib))
+ static_cast<uint16_t>(llvm::compression::Format::Lzma))
+ CompressionFormat = llvm::compression::Format::Lzma;
+ else if (CompressionMethod ==
+ static_cast<uint16_t>(llvm::compression::Format::Zlib))
CompressionFormat = llvm::compression::Format::Zlib;
else if (CompressionMethod ==
static_cast<uint16_t>(llvm::compression::Format::Zstd))
@@ -1070,7 +1078,9 @@ CompressedOffloadBundle::decompress(const llvm::MemoryBuffer &Input,
<< "Decompression method: "
<< (CompressionFormat == llvm::compression::Format::Zlib
? "zlib"
- : "zstd")
+ : (CompressionFormat == llvm::compression::Format::Lzma
+ ? "lzma"
+ : "zstd"))
<< "\n"
<< "Size before decompression: " << CompressedData.size()
<< " bytes\n"
diff --git a/clang/test/CMakeLists.txt b/clang/test/CMakeLists.txt
index fcfca354f4a75f..ca57daa6fc8651 100644
--- a/clang/test/CMakeLists.txt
+++ b/clang/test/CMakeLists.txt
@@ -12,6 +12,7 @@ llvm_canonicalize_cmake_booleans(
ENABLE_BACKTRACES
LLVM_ENABLE_ZLIB
LLVM_ENABLE_ZSTD
+ LLVM_ENABLE_LZMA
LLVM_ENABLE_PER_TARGET_RUNTIME_DIR
LLVM_ENABLE_THREADS
LLVM_ENABLE_REVERSE_ITERATION
diff --git a/clang/test/Driver/clang-offload-bundler-lzma.c b/clang/test/Driver/clang-offload-bundler-lzma.c
new file mode 100644
index 00000000000000..3c254af85936fb
--- /dev/null
+++ b/clang/test/Driver/clang-offload-bundler-lzma.c
@@ -0,0 +1,76 @@
+// REQUIRES: lzma
+// REQUIRES: x86-registered-target
+// UNSUPPORTED: target={{.*}}-darwin{{.*}}, target={{.*}}-aix{{.*}}
+
+//
+// Generate the host binary to be bundled.
+//
+// RUN: %clang -O0 -target %itanium_abi_triple %s -c -emit-llvm -o %t.bc
+
+//
+// Generate an empty file to help with the checks of empty files.
+//
+// RUN: touch %t.empty
+
+//
+// Generate device binaries to be bundled.
+//
+// RUN: echo 'Content of device file 1' > %t.tgt1
+// RUN: echo 'Content of device file 2' > %t.tgt2
+
+//
+// Check compression/decompression of offload bundle.
+//
+// RUN: env OFFLOAD_BUNDLER_COMPRESS=1 OFFLOAD_BUNDLER_VERBOSE=1 \
+// RUN: clang-offload-bundler -type=bc -targets=hip-amdgcn-amd-amdhsa--gfx900,hip-amdgcn-amd-amdhsa--gfx906 \
+// RUN: -input=%t.tgt1 -input=%t.tgt2 -output=%t.hip.bundle.bc 2>&1 | \
+// RUN: FileCheck -check-prefix=COMPRESS %s
+// RUN: clang-offload-bundler -type=bc -list -input=%t.hip.bundle.bc | FileCheck -check-prefix=NOHOST %s
+// RUN: env OFFLOAD_BUNDLER_VERBOSE=1 \
+// RUN: clang-offload-bundler -type=bc -targets=hip-amdgcn-amd-amdhsa--gfx900,hip-amdgcn-amd-amdhsa--gfx906 \
+// RUN: -output=%t.res.tgt1 -output=%t.res.tgt2 -input=%t.hip.bundle.bc -unbundle 2>&1 | \
+// RUN: FileCheck -check-prefix=DECOMPRESS %s
+// RUN: diff %t.tgt1 %t.res.tgt1
+// RUN: diff %t.tgt2 %t.res.tgt2
+
+//
+// COMPRESS: Compression method used: lzma
+// DECOMPRESS: Decompression method: lzma
+// NOHOST-NOT: host-
+// NOHOST-DAG: hip-amdgcn-amd-amdhsa--gfx900
+// NOHOST-DAG: hip-amdgcn-amd-amdhsa--gfx906
+//
+
+//
+// Check -bundle-align option.
+//
+
+// RUN: clang-offload-bundler -bundle-align=4096 -type=bc -targets=host-%itanium_abi_triple,openmp-powerpc64le-ibm-linux-gnu,openmp-x86_64-pc-linux-gnu -input=%t.bc -input=%t.tgt1 -input=%t.tgt2 -output=%t.bundle3.bc -compress
+// RUN: clang-offload-bundler -type=bc -targets=host-%itanium_abi_triple,openmp-powerpc64le-ibm-linux-gnu,openmp-x86_64-pc-linux-gnu -output=%t.res.bc -output=%t.res.tgt1 -output=%t.res.tgt2 -input=%t.bundle3.bc -unbundle
+// RUN: diff %t.bc %t.res.bc
+// RUN: diff %t.tgt1 %t.res.tgt1
+// RUN: diff %t.tgt2 %t.res.tgt2
+
+//
+// Check unbundling archive.
+//
+// RUN: clang-offload-bundler -type=bc -targets=hip-amdgcn-amd-amdhsa--gfx900,hip-amdgcn-amd-amdhsa--gfx906 \
+// RUN: -input=%t.tgt1 -input=%t.tgt2 -output=%t.hip_bundle1.bc -compress
+// RUN: clang-offload-bundler -type=bc -targets=hip-amdgcn-amd-amdhsa--gfx900,hip-amdgcn-amd-amdhsa--gfx906 \
+// RUN: -input=%t.tgt1 -input=%t.tgt2 -output=%t.hip_bundle2.bc -compress
+// RUN: rm -f %t.hip_archive.a
+// RUN: llvm-ar cr %t.hip_archive.a %t.hip_bundle1.bc %t.hip_bundle2.bc
+// RUN: clang-offload-bundler -unbundle -type=a -targets=hip-amdgcn-amd-amdhsa--gfx900,hip-amdgcn-amd-amdhsa--gfx906 \
+// RUN: -output=%t.hip_900.a -output=%t.hip_906.a -input=%t.hip_archive.a
+// RUN: llvm-ar t %t.hip_900.a | FileCheck -check-prefix=HIP-AR-900 %s
+// RUN: llvm-ar t %t.hip_906.a | FileCheck -check-prefix=HIP-AR-906 %s
+// HIP-AR-900-DAG: hip_bundle1-hip-amdgcn-amd-amdhsa--gfx900
+// HIP-AR-900-DAG: hip_bundle2-hip-amdgcn-amd-amdhsa--gfx900
+// HIP-AR-906-DAG: hip_bundle1-hip-amdgcn-amd-amdhsa--gfx906
+// HIP-AR-906-DAG: hip_bundle2-hip-amdgcn-amd-amdhsa--gfx906
+
+// Some code so that we can create a binary out of this file.
+int A = 0;
+void test_func(void) {
+ ++A;
+}
diff --git a/clang/test/lit.site.cfg.py.in b/clang/test/lit.site.cfg.py.in
index ef75770a2c3c9a..0ad5d0887c103e 100644
--- a/clang/test/lit.site.cfg.py.in
+++ b/clang/test/lit.site.cfg.py.in
@@ -22,6 +22,7 @@ config.host_cxx = "@CMAKE_CXX_COMPILER@"
config.llvm_use_sanitizer = "@LLVM_USE_SANITIZER@"
config.have_zlib = @LLVM_ENABLE_ZLIB@
config.have_zstd = @LLVM_ENABLE_ZSTD@
+config.have_lzma = @LLVM_ENABLE_LZMA@
config.clang_arcmt = @CLANG_ENABLE_ARCMT@
config.clang_default_pie_on_linux = @CLANG_DEFAULT_PIE_ON_LINUX@
config.clang_default_cxx_stdlib = "@CLANG_DEFAULT_CXX_STDLIB@"
diff --git a/llvm/CMakeLists.txt b/llvm/CMakeLists.txt
index f5f7d3f3253fd3..be500d51d22a7a 100644
--- a/llvm/CMakeLists.txt
+++ b/llvm/CMakeLists.txt
@@ -552,6 +552,8 @@ set(LLVM_ENABLE_ZLIB "ON" CACHE STRING "Use zlib for compression/decompression i
set(LLVM_ENABLE_ZSTD "ON" CACHE STRING "Use zstd for compression/decompression if available. Can be ON, OFF, or FORCE_ON")
+set(LLVM_ENABLE_LZMA "ON" CACHE STRING "Use lzma for compression/decompression if available. Can be ON, OFF, or FORCE_ON")
+
set(LLVM_USE_STATIC_ZSTD FALSE CACHE BOOL "Use static version of zstd. Can be TRUE, FALSE")
set(LLVM_ENABLE_CURL "OFF" CACHE STRING "Use libcurl for the HTTP client if available. Can be ON, OFF, or FORCE_ON")
diff --git a/llvm/cmake/config-ix.cmake b/llvm/cmake/config-ix.cmake
index bf1b110245bb2f..4ac1e58cf565b1 100644
--- a/llvm/cmake/config-ix.cmake
+++ b/llvm/cmake/config-ix.cmake
@@ -162,6 +162,31 @@ if(LLVM_ENABLE_ZSTD)
endif()
set(LLVM_ENABLE_ZSTD ${zstd_FOUND})
+set(LZMA_FOUND 0)
+if(LLVM_ENABLE_LZMA)
+ if(LLVM_ENABLE_LZMA STREQUAL FORCE_ON)
+ find_package(LibLZMA REQUIRED)
+ if(NOT LIBLZMA_FOUND)
+ message(FATAL_ERROR "Failed to configure lzma, but LLVM_ENABLE_LZMA is FORCE_ON")
+ endif()
+ else()
+ find_package(LibLZMA QUIET)
+ endif()
+ if(LIBLZMA_FOUND)
+ # Check if lzma we found is usable; for example, we may have found a 32-bit
+ # library on a 64-bit system which would result in a link-time failure.
+ cmake_push_check_state()
+ list(APPEND CMAKE_REQUIRED_INCLUDES ${LIBLZMA_INCLUDE_DIRS})
+ list(APPEND CMAKE_REQUIRED_LIBRARIES ${LIBLZMA_LIBRARIES})
+ check_symbol_exists(lzma_lzma_preset lzma.h HAVE_LZMA)
+ cmake_pop_check_state()
+ if(LLVM_ENABLE_LZMA STREQUAL FORCE_ON AND NOT HAVE_LZMA)
+ message(FATAL_ERROR "Failed to configure lzma")
+ endif()
+ endif()
+endif()
+set(LLVM_ENABLE_LZMA ${LIBLZMA_FOUND})
+
if(LLVM_ENABLE_LIBXML2)
if(LLVM_ENABLE_LIBXML2 STREQUAL FORCE_ON)
find_package(LibXml2 REQUIRED)
diff --git a/llvm/cmake/modules/LLVMConfig.cmake.in b/llvm/cmake/modules/LLVMConfig.cmake.in
index 770a9caea322e6..660e056f113859 100644
--- a/llvm/cmake/modules/LLVMConfig.cmake.in
+++ b/llvm/cmake/modules/LLVMConfig.cmake.in
@@ -80,6 +80,11 @@ if(LLVM_ENABLE_ZSTD)
find_package(zstd)
endif()
+set(LLVM_ENABLE_LZMA @LLVM_ENABLE_LZMA@)
+if(LLVM_ENABLE_LZMA)
+ find_package(LibLZMA)
+endif()
+
set(LLVM_ENABLE_LIBXML2 @LLVM_ENABLE_LIBXML2@)
if(LLVM_ENABLE_LIBXML2)
find_package(LibXml2)
diff --git a/llvm/docs/CMake.rst b/llvm/docs/CMake.rst
index abef4f8103140f..d7f86caa71202b 100644
--- a/llvm/docs/CMake.rst
+++ b/llvm/docs/CMake.rst
@@ -629,6 +629,11 @@ enabled sub-projects. Nearly all of these variable names begin with
zstd. Allowed values are ``OFF``, ``ON`` (default, enable if zstd is found),
and ``FORCE_ON`` (error if zstd is not found).
+**LLVM_ENABLE_LZMA**:STRING
+ Used to decide if LLVM tools should support compression/decompression with
+ lzma. Allowed values are ``OFF``, ``ON`` (default, enable if lzma is found),
+ and ``FORCE_ON`` (error if lzma is not found).
+
**LLVM_EXPERIMENTAL_TARGETS_TO_BUILD**:STRING
Semicolon-separated list of experimental targets to build and linked into
llvm. This will build the experimental target without needing it to add to the
diff --git a/llvm/include/llvm/Config/llvm-config.h.cmake b/llvm/include/llvm/Config/llvm-config.h.cmake
index 6605ea60df99e1..47e53f8b4ee7bc 100644
--- a/llvm/include/llvm/Config/llvm-config.h.cmake
+++ b/llvm/include/llvm/Config/llvm-config.h.cmake
@@ -173,6 +173,9 @@
/* Define if zstd compression is available */
#cmakedefine01 LLVM_ENABLE_ZSTD
+/* Define if lzma compression is available */
+#cmakedefine01 LLVM_ENABLE_LZMA
+
/* Define if LLVM is using tflite */
#cmakedefine LLVM_HAVE_TFLITE
diff --git a/llvm/include/llvm/Support/Compression.h b/llvm/include/llvm/Support/Compression.h
index c3ba3274d6ed87..6dc7b162772d90 100644
--- a/llvm/include/llvm/Support/Compression.h
+++ b/llvm/include/llvm/Support/Compression.h
@@ -73,9 +73,31 @@ Error decompress(ArrayRef<uint8_t> Input, SmallVectorImpl<uint8_t> &Output,
} // End of namespace zstd
+namespace lzma {
+
+constexpr int NoCompression = 0;
+constexpr int BestSpeedCompression = 1;
+constexpr int DefaultCompression = 6;
+constexpr int BestSizeCompression = 9;
+
+bool isAvailable();
+
+void compress(ArrayRef<uint8_t> Input,
+ SmallVectorImpl<uint8_t> &CompressedBuffer,
+ int Level = DefaultCompression);
+
+Error decompress(ArrayRef<uint8_t> Input, uint8_t *Output,
+ size_t &UncompressedSize);
+
+Error decompress(ArrayRef<uint8_t> Input, SmallVectorImpl<uint8_t> &Output,
+ size_t UncompressedSize);
+
+} // End of namespace lzma
+
enum class Format {
Zlib,
Zstd,
+ Lzma,
};
inline Format formatFor(DebugCompressionType Type) {
@@ -104,8 +126,8 @@ struct Params {
};
// Return nullptr if LLVM was built with support (LLVM_ENABLE_ZLIB,
-// LLVM_ENABLE_ZSTD) for the specified compression format; otherwise
-// return a string literal describing the reason.
+// LLVM_ENABLE_ZSTD, LLVM_ENABLE_LZMA) for the specified compression format;
+// otherwise return a string literal describing the reason.
const char *getReasonIfUnsupported(Format F);
// Compress Input with the specified format P.Format. If Level is -1, use
diff --git a/llvm/lib/Support/CMakeLists.txt b/llvm/lib/Support/CMakeLists.txt
index 1f2d82427552f7..1ed0dcd435ecf8 100644
--- a/llvm/lib/Support/CMakeLists.txt
+++ b/llvm/lib/Support/CMakeLists.txt
@@ -37,6 +37,10 @@ if(LLVM_ENABLE_ZSTD)
list(APPEND imported_libs ${zstd_target})
endif()
+if(LLVM_ENABLE_LZMA)
+ list(APPEND imported_libs LibLZMA::LibLZMA)
+endif()
+
if( MSVC OR MINGW )
# libuuid required for FOLDERID_Profile usage in lib/Support/Windows/Path.inc.
# advapi32 required for CryptAcquireContextW in lib/Support/Windows/Path.inc.
@@ -323,6 +327,19 @@ if(LLVM_ENABLE_ZSTD)
set(llvm_system_libs ${llvm_system_libs} "${zstd_library}")
endif()
+if(LLVM_ENABLE_LZMA)
+ # CMAKE_BUILD_TYPE is only meaningful to single-configuration generators.
+ if(CMAKE_BUILD_TYPE)
+ string(TOUPPER ${CMAKE_BUILD_TYPE} build_type)
+ get_property(lzma_library TARGET LibLZMA::LibLZMA PROPERTY LOCATION_${build_type})
+ endif()
+ if(NOT lzma_library)
+ get_property(lzma_library TARGET LibLZMA::LibLZMA PROPERTY LOCATION)
+ endif()
+ get_library_name(${lzma_library} lzma_library)
+ set(llvm_system_libs ${llvm_system_libs} "${lzma_library}")
+endif()
+
if(LLVM_ENABLE_TERMINFO)
if(NOT terminfo_library)
get_property(terminfo_library TARGET Terminfo::terminfo PROPERTY LOCATION)
diff --git a/llvm/lib/Support/Compression.cpp b/llvm/lib/Support/Compression.cpp
index 8e57ba798f5207..f88560e58e8135 100644
--- a/llvm/lib/Support/Compression.cpp
+++ b/llvm/lib/Support/Compression.cpp
@@ -23,6 +23,9 @@
#if LLVM_ENABLE_ZSTD
#include <zstd.h>
#endif
+#if LLVM_ENABLE_LZMA
+#include <lzma.h>
+#endif
using namespace llvm;
using namespace llvm::compression;
@@ -39,6 +42,11 @@ const char *compression::getReasonIfUnsupported(compression::Format F) {
return nullptr;
return "LLVM was not built with LLVM_ENABLE_ZSTD or did not find zstd at "
"build time";
+ case compression::Format::Lzma:
+ if (lzma::isAvailable())
+ return nullptr;
+ return "LLVM was not built with LLVM_ENABLE_LZMA or did not find lzma at "
+ "build time";
}
llvm_unreachable("");
}
@@ -52,6 +60,9 @@ void compression::compress(Params P, ArrayRef<uint8_t> Input,
case compression::Format::Zstd:
zstd::compress(Input, Output, P.level);
break;
+ case compression::Format::Lzma:
+ lzma::compress(Input, Output, P.level);
+ break;
}
}
@@ -62,6 +73,8 @@ Error compression::decompress(DebugCompressionType T, ArrayRef<uint8_t> Input,
return zlib::decompress(Input, Output, UncompressedSize);
case compression::Format::Zstd:
return zstd::decompress(Input, Output, UncompressedSize);
+ case compression::Format::Lzma:
+ break;
}
llvm_unreachable("");
}
@@ -74,6 +87,8 @@ Error compression::decompress(compression::Format F, ArrayRef<uint8_t> Input,
return zlib::decompress(Input, Output, UncompressedSize);
case compression::Format::Zstd:
return zstd::decompress(Input, Output, UncompressedSize);
+ case compression::Format::Lzma:
+ return lzma::decompress(Input, Output, UncompressedSize);
}
llvm_unreachable("");
}
@@ -218,3 +233,86 @@ Error zstd::decompress(ArrayRef<uint8_t> Input,
llvm_unreachable("zstd::decompress is unavailable");
}
#endif
+#if LLVM_ENABLE_LZMA
+
+bool lzma::isAvailable() { return true; }
+
+void lzma::compress(ArrayRef<uint8_t> Input,
+ SmallVectorImpl<uint8_t> &CompressedBuffer, int Level) {
+ lzma_options_lzma Opt;
+ if (lzma_lzma_preset(&Opt, Level) != LZMA_OK) {
+ report_bad_alloc_error("lzma::compress failed: preset error");
+ return;
+ }
+
+ lzma_filter Filters[] = {{LZMA_FILTER_LZMA2, &Opt},
+ {LZMA_VLI_UNKNOWN, nullptr}};
+
+ size_t MaxOutSize = lzma_stream_buffer_bound(Input.size());
+ CompressedBuffer.resize_for_overwrite(MaxOutSize);
+
+ size_t OutPos = 0;
+ lzma_ret Ret = lzma_stream_buffer_encode(
+ Filters, LZMA_CHECK_CRC64, nullptr, Input.data(), Input.size(),
+ CompressedBuffer.data(), &OutPos, MaxOutSize);
+ if (Ret == LZMA_OK)
+ CompressedBuffer.resize(OutPos);
+ else
+ report_bad_alloc_error("lzma::compress failed");
+}
+
+Error lzma::decompress(ArrayRef<uint8_t> Input, uint8_t *Output,
+ size_t &UncompressedSize) {
+ const size_t DecoderMemoryLimit = 100 * 1024 * 1024;
+ lzma_stream Strm = LZMA_STREAM_INIT;
+ size_t InPos = 0;
+ size_t OutPos = 0;
+
+ lzma_ret Ret = lzma_auto_decoder(&Strm, DecoderMemoryLimit, 0);
+ if (Ret != LZMA_OK)
+ return make_error<StringError>("Failed to initialize LZMA decoder",
+ inconvertibleErrorCode());
+
+ Strm.next_in = Input.data();
+ Strm.avail_in = Input.size();
+ Strm.next_out = Output;
+ Strm.avail_out = UncompressedSize;
+
+ Ret = lzma_code(&Strm, LZMA_FINISH);
+ if (Ret == LZMA_STREAM_END) {
+ UncompressedSize = Strm.total_out;
+ lzma_end(&Strm);
+ return Error::success();
+ } else {
+ lzma_end(&Strm);
+ return make_error<StringError>("LZMA decompression failed",
+ inconvertibleErrorCode());
+ }
+}
+
+Error lzma::decompress(ArrayRef<uint8_t> Input,
+ SmallVectorImpl<uint8_t> &Output,
+ size_t UncompressedSize) {
+ Output.resize_for_overwrite(UncompressedSize);
+ Error E = lzma::decompress(Input, Output.data(), UncompressedSize);
+ if (UncompressedSize < Output.size())
+ Output.truncate(UncompressedSize);
+ return E;
+}
+
+#else
+bool lzma::isAvailable() { return false; }
+void lzma::compress(ArrayRef<uint8_t> Input,
+ SmallVectorImpl<uint8_t> &CompressedBuffer, int Level) {
+ llvm_unreachable("lzma::compress is unavailable");
+}
+Error lzma::decompress(ArrayRef<uint8_t> Input, uint8_t *Output,
+ size_t &UncompressedSize) {
+ llvm_unreachable("lzma::decompress is unavailable");
+}
+Error lzma::decompress(ArrayRef<uint8_t> Input,
+ SmallVectorImpl<uint8_t> &Output,
+ size_t UncompressedSize) {
+ llvm_unreachable("lzma::decompress is unavailable");
+}
+#endif
diff --git a/llvm/test/CMakeLists.txt b/llvm/test/CMakeLists.txt
index 6127b76db06b7f..777a54784203a4 100644
--- a/llvm/test/CMakeLists.txt
+++ b/llvm/test/CMakeLists.txt
@@ -8,6 +8,7 @@ llvm_canonicalize_cmake_booleans(
LLVM_ENABLE_HTTPLIB
LLVM_ENABLE_ZLIB
LLVM_ENABLE_ZSTD
+ LLVM_ENABLE_LZMA
LLVM_ENABLE_LIBXML2
LLVM_LINK_LLVM_DYLIB
LLVM_TOOL_LTO_BUILD
diff --git a/llvm/test/lit.site.cfg.py.in b/llvm/test/lit.site.cfg.py.in
index b6f255d472d16f..7cdca4083295f5 100644
--- a/llvm/test/lit.site.cfg.py.in
+++ b/llvm/test/lit.site.cfg.py.in
@@ -35,6 +35,7 @@ config.llvm_use_intel_jitevents = @LLVM_USE_INTEL_JITEVENTS@
config.llvm_use_sanitizer = "@LLVM_USE_SANITIZER@"
config.have_zlib = @LLVM_ENABLE_ZLIB@
config.have_zstd = @LLVM_ENABLE_ZSTD@
+config.have_lzma = @LLVM_ENABLE_LZMA@
config.have_libxml2 = @LLVM_ENABLE_LIBXML2@
config.have_curl = @LLVM_ENABLE_CURL@
config.have_httplib = @LLVM_ENABLE_HTTPLIB@
diff --git a/llvm/utils/lit/lit/llvm/config.py b/llvm/utils/lit/lit/llvm/config.py
index 96b4f7bc86772d..6e307da7354118 100644
--- a/llvm/utils/lit/lit/llvm/config.py
+++ b/llvm/utils/lit/lit/llvm/config.py
@@ -131,6 +131,9 @@ def __init__(self, lit_config, config):
have_zstd = getattr(config, "have_zstd", None)
if have_zstd:
features.add("zstd")
+ have_lzma = getattr(config, "have_lzma", None)
+ if have_lzma:
+ features.add("lzma")
if getattr(config, "reverse_iteration", None):
features.add("reverse_iteration")
|
This seems to be adding an entirely new compression scheme to LLVM. I feel like that should be a separate patch and the part where we make HIP use it is a follow-up. |
This patch adds liblzma as an alternative compression/decompression method to zlib/zstd.
Keep this PR for LLVM changes only. Will open another PR for clang changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this seems pretty straightforward so it looks good to me. However I'll wait until some of the other LLVM contributors chime in.
Thanks for doing this @yxsamliu ! Does |
Do you have any benchmarks to support this assertion? For huge binaries, decompression speed may be more important than compression ratio. E.g. it's not unusual to have large ML apps carrying O(gigabyte) of GPU code blobs. lzma's somewhat better compression ratio (vs zstd) comes at a price of a relatively slow decompression speed. zstd gives comparable compression ratio at much higher decompression speed: |
I have heard several applications switched from lzma/lzma2 to zstd and therefore I am curious to see justifications for adding lzma. The compression ratio is better but compression/decompression is extremely slow. In addition, lzma is not good at compressing binary data. I had surveyed multiple compression implementations when I added zstd to llvm: https://groups.google.com/g/generic-abi/c/satyPkuMisk |
I do have a different perspective here. I worked with LZMA in the past and it is by far one of the best compression schemes out there in many regards. I do not understand the assertion around its decompression speed. Compression is certainly slower, but it is not a lot slower than the competition. I also do have practical use-cases today for its use, as opposed to "weaker" compression formats. In the past I had a realtime streaming LZMA decompressor running on a 16 MHz ARM7TDMI, sharing timeslices along with many other runtime jobs for rendering a video stream. Admittedly it was hand optimized asm, but we had the same issues with memory latency as today, and the low bitrate of LZMA stream allowed less data to be read from the ROM. The gap has widened today, memory reads are a lot more expensive that cpu cyles, even if that data is already in the caches. Most likely the LZMA window would have to be crafted for today's cache hierarchy/target CPU architecture. Even though COFF doesn't support internal compression today AFAIK, I tried compressing the .OBJ files for a LLVM Windows build including debug info, in this folder:
All figures are single-threaded. The assumption is that A practical counter-argument to the comp/decomp speed (which does not seem to be that terrible at the light of the figures I'm seeing above) is that people working from home are usually on poor/not-that-good internet connections. Upload speed from their end isn't that great, but their CPU power is. To avoid on cloud costs, it makes sense to distribute compilation on a private network on user's PC, which includes at-home PCs. In this case, the size of the generated assets/.OBJs is more important than the time spent on compressing/decompressing them, as long as it remains within reasonable terms. If 40 sec are spent on compiling an .OBJ and 2-3 secs on compression, this has a great value if it generates 2x smaller assets (great value for network transfer that is) However I understand that this figures could be different if compressing individual sections within a DWARF file. I'd like to give the OP the benefit of the doubt, if they can come with tangible figures for their use-case? In terms of compression/decompression speed, and size, comparing with existing compression schemes in LLVM. @yxsamliu |
I was able to build libzlma with https://github.com/tukaani-project/xz on Windows with VS 2022. LLVM cmake config is able to find its include file but not the library with -DLIBLZMA_ROOT. I am still investigating. |
I will collect some benchmarking results. |
The following is measurement for compressing/decompressing Blender 4.1 bundled bitcode for 6 GPU arch's: It is surprising that LZMA level 9 gets higher compressing rate with less compressing/decompressing time, but it did happen.
|
For small apps (let's say < 100MB), it probably does not matter. Most of compression algorithms will be fast enough.
An order of magnitude difference on @yxsamliu sample would qualify as "a lot", IMO. @yxsamliu zstd's compression levels don't seem to match those of lzma (e.g 9 is the highest compression for lzma, but only about a middle of the range for zstd). Could you also measure with On a side note, given that there's a huge jump in compression ratio between lzma -6 and -9, it suggests that it may have something to do with the compression window size. It may need to be large enough to cover multiple similar chunks of the binaries. I suspect we may be able to tweak zstd parameters to improve its compression ratio, too. Did anybody try training zstd on binaries and check how much it would help us in this case? |
It seems we could use zstd level 20 for clang-offload-bundler to achieve similar compression rate as lzma level 9. |
This compression ratio cliff bothers me a bit. I wonder if there's something special about the data the benchmark was ran on that triggers it. for both compression algorithms. @yxsamliu would it be possible for you to rerun the benchmarks one more with the data set split into 1/3 and 2/3 of the original input in size and see if compression ratio cliff happens at lower compression levels for smaller inputs? |
The specialty about the data is that the bitcode for different GPU arch's are very similar, which is common for HIP, therefore the file to be compressed contains N similar portions for N GPU archs. The following tables shows zstd level 20 results bundled bitcode for 2, 4, and 6 GPU arch:
You can see the compressed size, compression and decompression time are almost the same. This means the more GPU archs, the better compression rate we will get. Only zstd level 20 and above can achieve this. |
I do get the part that multiple GPU variants give us a lot of redundancy in the data to compress away. It's was not quite clear to me why compression ratio dramatically improves between Though the bit that we're compressing multiple similar GPU blobs may be the likely explanation here, too. If compression window is smaller than the size of one GPU blob, it may not benefit from the commonality across multiple blobs. By the time we get to the beginning of the second GPU variant, we've essentially forgot what we had at the beginning of the first one.
Interesting. So, compression ratio for a single GPU blob is around 2.0-3.0x, and all subsequent blobs for other GPU variants get compressed essentially into nothing, as long as we can squeeze one complete GPU blob into compression window. It sounds like there may be further room for improvement by tweaking zstd parameters to exploit specific properties of the data we're packing. |
Excuse the outlandish suggestion, but given:
Is there any chance of some sort of domain-specific compression, especially that would be more resilient to the size of the kernels? (seems like increasing the compression level increases the compression window size, which has some cliff/break points for kernels of certain sizes, which seems unfortunately non-general - like it'd be nice to not have to push the compression algorithm so hard for smaller kernels, and it'd be nice if larger kernels could still be deduplicated) |
The key domain-specific quirk we can exploit here is that we produce N very similar blobs (same code, with minor differences due to GPU-specific intrinsics, etc.) There's nothing particularly interesting about the individual blobs.
One way to achieve that would be to interleave GPU blobs. Instead of Increasing compression window while keeping the rest of parameters at a lower compression level may work, too. At least on my experiments |
I was thinking something even more domain specific (like an actual domain specific compression scheme - not that it wouldn't be able to be further compressed by something generic - but encoding the data with less duplication to start with) - but I don't know enough about the structure/contents of these kernels to know what that'd look like. If I were speculating rampantly - maybe some kind of macro scheme to describe the architectural differences, that could be quickly stripped out when the arch specific version was needed on-device. (wonder if it'd be feasible to even compile for multiple targets simultaneously - keeping these differences in conditional blocks - rather than redundantly generating all the kernels then trying to figure out their commonalities/merge them again) But I reaize this is all quite out of my depth and you folks who work on this stuff probably already know what's feasible or not here.
That sounds pretty promising (though perhaps still interesting to know how much the window size helps/hurts compared to the distribution of kernel sizes? Like do we have population data about kernel sizes? Does wlog=25 cover the 90% case? is the population widely distributed, or fairly tightly clustered? is it increasing over time, such that today wlog=25 is 90%, but in a year or two it'll be only 50%?) |
That sounds like a reasonable research project topic. :-)
That's largely what happens in CUDA and AMDGPU. Unfortunately, those minor differences percolate through the rest of the code and we usually end up with similar, but not identical compiler outputs, and it's hard to generalize which parts will be affected, so in practice we do need to compile everything.
Anecdotally, individual kernel size varies from nothing to O(megabytes). Individual TUs (I think that's what object bundler ends up dealing with) will likely be on the smaller side, but the outliers are fairly common.
My guess is that it should be sufficient for most of the use cases. |
close this PR since we decide to use zstd |
LZMA (Lempel-Ziv/Markov-chain Algorithm) provides better comparession rate than zstd and zlib for clang-offload-bundler bundles which often contains large number of similar entries.
This patch adds liblzma as an alternative to the existing compression/decompression methods zlib and zstd to LLVM.