Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[llvm-objcopy] Add --compress-sections #85036

Closed

Conversation

MaskRay
Copy link
Member

@MaskRay MaskRay commented Mar 13, 2024

--compress-sections is similar to --compress-debug-sections but applies
to arbitrary sections.

  • --compress-sections <section>=none: decompress sections
  • --compress-sections <section>=[zlib|zstd]: compress sections with zlib/zstd

Like --remove-section, the pattern is by default a glob, but a regex
when --regex is specified.

For --remove-section like options, ! prevents matches and is not
dependent on ordering (see ELF/wildcard-syntax.test). Since
--compress-sections a=zlib --compress-sections a=none naturally allows
overriding, having an order-independent ! would be confusing.
Therefore, ! is disallowed.

Sections within a segment are effectively immutable. Report an error for
an attempt to (de)compress them. SHF_ALLOC sections in a relocatable
file can be compressed, but linkers usually reject them.

Link: https://discourse.llvm.org/t/rfc-compress-arbitrary-sections-with-ld-lld-compress-sections/71674

Created using spr 1.3.5-bogner

[skip ci]
Created using spr 1.3.5-bogner
@llvmbot
Copy link
Collaborator

llvmbot commented Mar 13, 2024

@llvm/pr-subscribers-llvm-binary-utilities

Author: Fangrui Song (MaskRay)

Changes

--compress-sections is similar to --compress-debug-sections but applies
to arbitrary sections.

  • --compress-sections &lt;section&gt;=none: decompress sections
  • --compress-sections &lt;section&gt;=[zlib|zstd]: compress sections with zlib/zstd

Like --remove-section, the pattern is by default a glob, but a regex
when --regex is specified.

For --remove-section like options, ! prevents matches and is not
dependent on ordering (see ELF/wildcard-syntax.test). Since
--compress-sections a=zlib --compress-sections a=none naturally allows
overriding, having an order-independent ! would be confusing.
Therefore, ! is disallowed.

Sections within a segment are effectively immutable. Report an error for
an attempt to (de)compress them. SHF_ALLOC sections in a relocatable
file can be compressed, but linkers usually reject them.

Link: https://discourse.llvm.org/t/rfc-compress-arbitrary-sections-with-ld-lld-compress-sections/71674


Full diff: https://github.com/llvm/llvm-project/pull/85036.diff

8 Files Affected:

  • (modified) llvm/docs/CommandGuide/llvm-objcopy.rst (+9-1)
  • (modified) llvm/include/llvm/ObjCopy/CommonConfig.h (+3)
  • (modified) llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp (+23-7)
  • (added) llvm/test/tools/llvm-objcopy/ELF/compress-sections-within-segment.s (+32)
  • (added) llvm/test/tools/llvm-objcopy/ELF/compress-sections.s (+117)
  • (modified) llvm/test/tools/llvm-objcopy/ELF/decompress-sections.test (+29)
  • (modified) llvm/tools/llvm-objcopy/ObjcopyOptions.cpp (+36)
  • (modified) llvm/tools/llvm-objcopy/ObjcopyOpts.td (+6)
diff --git a/llvm/docs/CommandGuide/llvm-objcopy.rst b/llvm/docs/CommandGuide/llvm-objcopy.rst
index 9d0cb7ad119589..3b1af210c28bf7 100644
--- a/llvm/docs/CommandGuide/llvm-objcopy.rst
+++ b/llvm/docs/CommandGuide/llvm-objcopy.rst
@@ -309,6 +309,14 @@ them.
  Compress DWARF debug sections in the output, using the specified format.
  Supported formats are ``zlib`` and ``zstd``. Use ``zlib`` if ``<format>`` is omitted.
 
+.. option:: --compress-sections <section>=<format>
+
+ Compress or decompress sections matched by ``<section>`` using the specified
+ format. Supported formatss are ``zlib`` and ``zstd``. Specify ``none`` for
+ decompression. When a section is matched by multiple options, the last one
+ wins. A wildcard ``<section>`` starting with '!' is disallowed.
+ Sections within a segment cannot be (de)compressed.
+
 .. option:: --decompress-debug-sections
 
  Decompress any compressed DWARF debug sections in the output.
@@ -331,7 +339,7 @@ them.
 
 .. option:: --gap-fill <value>
 
- For binary outputs, fill the gaps between sections with ``<value>`` instead
+ For binary outputs, fill the gaps between sections with ````<value>`` instead
  of zero. The value must be an unsigned 8-bit integer.
 
 .. option:: --globalize-symbol <symbol>
diff --git a/llvm/include/llvm/ObjCopy/CommonConfig.h b/llvm/include/llvm/ObjCopy/CommonConfig.h
index 8f69c9fbeaf5ef..13d0d1443b2d66 100644
--- a/llvm/include/llvm/ObjCopy/CommonConfig.h
+++ b/llvm/include/llvm/ObjCopy/CommonConfig.h
@@ -261,6 +261,9 @@ struct CommonConfig {
   bool DecompressDebugSections = false;
 
   DebugCompressionType CompressionType = DebugCompressionType::None;
+
+  llvm::SmallVector<std::pair<NameMatcher, llvm::DebugCompressionType>, 0>
+      compressSections;
 };
 
 } // namespace objcopy
diff --git a/llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp b/llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp
index e4d6e02f3aa60d..c95b866dfa1c89 100644
--- a/llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp
+++ b/llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp
@@ -215,23 +215,39 @@ static Error dumpSectionToFile(StringRef SecName, StringRef Filename,
 }
 
 Error Object::compressOrDecompressSections(const CommonConfig &Config) {
-  // Build a list of the debug sections we are going to replace.
+  // Build a list of sections we are going to replace.
   // We can't call `AddSection` while iterating over sections,
   // because it would mutate the sections array.
   SmallVector<std::pair<SectionBase *, std::function<SectionBase *()>>, 0>
       ToReplace;
   for (SectionBase &Sec : sections()) {
-    if ((Sec.Flags & SHF_ALLOC) || !StringRef(Sec.Name).starts_with(".debug"))
+    std::optional<DebugCompressionType> CType;
+    for (auto &[Matcher, T] : Config.compressSections)
+      if (Matcher.matches(Sec.Name))
+        CType = T;
+    if (!(Sec.Flags & SHF_ALLOC) && StringRef(Sec.Name).starts_with(".debug")) {
+      if (Config.CompressionType != DebugCompressionType::None)
+        CType = Config.CompressionType;
+      else if (Config.DecompressDebugSections)
+        CType = DebugCompressionType::None;
+    }
+    if (!CType)
       continue;
+
+    if (Sec.ParentSegment)
+      return createStringError(
+          errc::invalid_argument,
+          "section '" + Sec.Name +
+              "' within a segment cannot be (de)compressed");
+
     if (auto *CS = dyn_cast<CompressedSection>(&Sec)) {
-      if (Config.DecompressDebugSections) {
+      if (*CType == DebugCompressionType::None)
         ToReplace.emplace_back(
             &Sec, [=] { return &addSection<DecompressedSection>(*CS); });
-      }
-    } else if (Config.CompressionType != DebugCompressionType::None) {
-      ToReplace.emplace_back(&Sec, [&, S = &Sec] {
+    } else if (*CType != DebugCompressionType::None) {
+      ToReplace.emplace_back(&Sec, [=, S = &Sec] {
         return &addSection<CompressedSection>(
-            CompressedSection(*S, Config.CompressionType, Is64Bits));
+            CompressedSection(*S, *CType, Is64Bits));
       });
     }
   }
diff --git a/llvm/test/tools/llvm-objcopy/ELF/compress-sections-within-segment.s b/llvm/test/tools/llvm-objcopy/ELF/compress-sections-within-segment.s
new file mode 100644
index 00000000000000..b0c1364bfcca13
--- /dev/null
+++ b/llvm/test/tools/llvm-objcopy/ELF/compress-sections-within-segment.s
@@ -0,0 +1,32 @@
+## Disallow (de)compression for sections within a segment as they are
+## effectively immutable.
+# RUN: rm -rf %t && mkdir %t && cd %t
+# RUN: yaml2obj %s -o a
+# RUN: not llvm-objcopy a /dev/null --compress-sections .text=zlib --compress-sections foo=none 2>&1 | FileCheck %s
+
+# CHECK: error: 'a': section '.text' within a segment cannot be (de)compressed
+
+--- !ELF
+FileHeader:
+  Class:      ELFCLASS64
+  Data:       ELFDATA2LSB
+  Type:       ET_EXEC
+  Machine:    EM_X86_64
+ProgramHeaders:
+  - Type:     PT_LOAD
+    FirstSec: .text
+    LastSec:  foo
+    VAddr:    0x201000
+    Align:    0x1000
+    Offset:   0x1000
+Sections:
+  - Name:     .text
+    Type:     SHT_PROGBITS
+    Flags:    [ SHF_ALLOC ]
+    Address:  0x201000
+    Offset:   0x1000
+    Content:  C3
+  - Name:     foo
+    Type:     SHT_PROGBITS
+    Flags:    [ SHF_ALLOC, SHF_COMPRESSED ]
+    Content:  010000000000000040000000000000000100000000000000789cd36280002d3269002f800151
diff --git a/llvm/test/tools/llvm-objcopy/ELF/compress-sections.s b/llvm/test/tools/llvm-objcopy/ELF/compress-sections.s
new file mode 100644
index 00000000000000..77064a6fef0e54
--- /dev/null
+++ b/llvm/test/tools/llvm-objcopy/ELF/compress-sections.s
@@ -0,0 +1,117 @@
+# REQUIRES: x86-registered-target, zlib, zstd
+
+# RUN: rm -rf %t && mkdir %t && cd %t
+# RUN: llvm-mc -filetype=obj -triple=x86_64 %s -o a.o
+# RUN: llvm-objcopy a.o out --compress-sections='*0=zlib' --compress-sections '*0=none' --compress-sections 'nomatch=none'
+# RUN: llvm-readelf -S out | FileCheck %s --check-prefix=CHECK1
+
+# CHECK1:      Name           Type          Address     Off      Size     ES Flg Lk Inf Al
+# CHECK1:      .text          PROGBITS [[#%x,TEXT:]]    [[#%x,]] [[#%x,]] 00 AX   0   0  4
+# CHECK1:      foo0           PROGBITS [[#%x,FOO0:]]    [[#%x,]] [[#%x,]] 00 A    0   0  8
+# CHECK1-NEXT: .relafoo0      RELA     [[#%x,]]         [[#%x,]] [[#%x,]] 18  I  11   3  8
+# CHECK1-NEXT: foo1           PROGBITS [[#%x,FOO1:]]    [[#%x,]] [[#%x,]] 00 A    0   0  8
+# CHECK1-NEXT: .relafoo1      RELA     [[#%x,]]         [[#%x,]] [[#%x,]] 18  I  11   5  8
+# CHECK1:      nonalloc0      PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00      0   0  8
+# CHECK1-NEXT: .relanonalloc0 RELA     [[#%x,]]         [[#%x,]] [[#%x,]] 18  I  11   7  8
+# CHECK1-NEXT: nonalloc1      PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00      0   0  8
+# CHECK1-NEXT: .debug_str     PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 01 MS   0   0  1
+
+# RUN: llvm-objcopy a.o out2 --compress-sections '*c0=zlib' --compress-sections .debug_str=zstd
+# RUN: llvm-readelf -Sr -x nonalloc0 -x .debug_str out2 2>&1 | FileCheck %s --check-prefix=CHECK2
+# RUN: llvm-readelf -z -x nonalloc0 -x .debug_str out2 | FileCheck %s --check-prefix=CHECK2DE
+
+# RUN: llvm-objcopy a.o out2 --regex --compress-sections '.*c0=zlib' --compress-sections .debug_str=zstd
+# RUN: llvm-readelf -Sr -x nonalloc0 -x .debug_str out2 2>&1 | FileCheck %s --check-prefix=CHECK2
+
+# CHECK2:      Name           Type          Address     Off      Size     ES Flg Lk Inf Al
+# CHECK2:      .text          PROGBITS [[#%x,TEXT:]]    [[#%x,]] [[#%x,]] 00 AX   0   0  4
+# CHECK2:      foo0           PROGBITS [[#%x,FOO0:]]    [[#%x,]] [[#%x,]] 00 A    0   0  8
+# CHECK2-NEXT: .relafoo0      RELA     [[#%x,]]         [[#%x,]] [[#%x,]] 18  I  11   3  8
+# CHECK2-NEXT: foo1           PROGBITS [[#%x,FOO1:]]    [[#%x,]] [[#%x,]] 00 A    0   0  8
+# CHECK2-NEXT: .relafoo1      RELA     [[#%x,]]         [[#%x,]] [[#%x,]] 18  I  11   5  8
+# CHECK2:      nonalloc0      PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00   C  0   0  8
+# CHECK2-NEXT: .relanonalloc0 RELA     [[#%x,]]         [[#%x,]] [[#%x,]] 18  IC 11   7  8
+# CHECK2-NEXT: nonalloc1      PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00      0   0  8
+# CHECK2-NEXT: .debug_str     PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 01 MSC  0   0  8
+
+## llvm-readelf doesn't support SHF_COMPRESSED SHT_RELA.
+# CHECK2: warning: {{.*}}: unable to read relocations from SHT_RELA section with index 8: section [index 8] has an invalid sh_size ([[#]]) which is not a multiple of its sh_entsize (24)
+
+# CHECK2:      Hex dump of section 'nonalloc0':
+## zlib with ch_size=0x10
+# CHECK2-NEXT: 01000000 00000000 10000000 00000000
+# CHECK2-NEXT: 08000000 00000000 {{.*}}
+# CHECK2:      Hex dump of section '.debug_str':
+## zstd with ch_size=0x38
+# CHECK2-NEXT: 02000000 00000000 38000000 00000000
+# CHECK2-NEXT: 01000000 00000000 {{.*}}
+
+# CHECK2DE:       Hex dump of section 'nonalloc0':
+# CHECK2DE-NEXT:  0x00000000 00000000 00000000 00000000 00000000 ................
+# CHECK2DE-EMPTY:
+# CHECK2DE-NEXT:  Hex dump of section '.debug_str':
+# CHECK2DE-NEXT:  0x00000000 41414141 41414141 41414141 41414141 AAAAAAAAAAAAAAAA
+
+## --compress-debug-sections=none takes precedence.
+# RUN: llvm-objcopy a.o out3 --decompress-debug-sections --compress-sections .debug_str=zstd
+# RUN: llvm-readelf -S out3 | FileCheck %s --check-prefix=CHECK3
+
+# CHECK3:      .debug_str PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 01 MS   0   0  1
+
+# RUN: llvm-objcopy a.o out4 --compress-sections '*0=zlib'
+# RUN: llvm-readelf -S out4 | FileCheck %s --check-prefix=CHECK4
+
+# CHECK4:      Name           Type          Address     Off      Size     ES Flg Lk Inf Al
+# CHECK4:      .text          PROGBITS [[#%x,TEXT:]]    [[#%x,]] [[#%x,]] 00  AX  0   0  4
+# CHECK4:      foo0           PROGBITS [[#%x,FOO0:]]    [[#%x,]] [[#%x,]] 00  AC  0   0  8
+# CHECK4-NEXT: .relafoo0      RELA     [[#%x,]]         [[#%x,]] [[#%x,]] 18  IC 11   3  8
+# CHECK4-NEXT: foo1           PROGBITS [[#%x,FOO1:]]    [[#%x,]] [[#%x,]] 00   A  0   0  8
+# CHECK4-NEXT: .relafoo1      RELA     [[#%x,]]         [[#%x,]] [[#%x,]] 18   I 11   5  8
+# CHECK4:      nonalloc0      PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00   C  0   0  8
+# CHECK4-NEXT: .relanonalloc0 RELA     [[#%x,]]         [[#%x,]] [[#%x,]] 18  IC 11   7  8
+# CHECK4-NEXT: nonalloc1      PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00      0   0  8
+# CHECK4-NEXT: .debug_str     PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 01  MS  0   0  1
+
+# RUN: not llvm-objcopy --compress-sections=foo a.o out 2>&1 | \
+# RUN:   FileCheck %s --check-prefix=ERR1 --implicit-check-not=error:
+# ERR1:      error: --compress-sections: parse error, not 'section-glob=[none|zlib|zstd]'
+
+# RUN: llvm-objcopy --compress-sections 'a[=zlib' a.o out 2>&1 | \
+# RUN:   FileCheck %s --check-prefix=ERR2 --implicit-check-not=error:
+# ERR2:      warning: invalid glob pattern, unmatched '['
+
+# RUN: not llvm-objcopy a.o out --compress-sections='.debug*=zlib-gabi' --compress-sections='.debug*=' 2>&1 | \
+# RUN:   FileCheck -check-prefix=ERR3 %s
+# ERR3:      error: invalid or unsupported --compress-sections format: .debug*=zlib-gabi
+
+# RUN: not llvm-objcopy a.o out --compress-sections='!.debug*=zlib' 2>&1 | \
+# RUN:   FileCheck -check-prefix=ERR4 %s
+# ERR4:      error: --compress-sections: negative pattern is unsupported
+
+.globl _start
+_start:
+  ret
+
+.section foo0,"a"
+.balign 8
+.quad .text-.
+.quad .text-.
+.section foo1,"a"
+.balign 8
+.quad .text-.
+.quad .text-.
+.section nonalloc0,""
+.balign 8
+.quad .text+1
+.quad .text+2
+sym0:
+.section nonalloc1,""
+.balign 8
+.quad 42
+sym1:
+
+.section .debug_str,"MS",@progbits,1
+.Linfo_string0:
+  .asciz "AAAAAAAAAAAAAAAAAAAAAAAAAAA"
+.Linfo_string1:
+  .asciz "BBBBBBBBBBBBBBBBBBBBBBBBBBB"
diff --git a/llvm/test/tools/llvm-objcopy/ELF/decompress-sections.test b/llvm/test/tools/llvm-objcopy/ELF/decompress-sections.test
index 4258ddbe66a3e5..6a6ae461903a6e 100644
--- a/llvm/test/tools/llvm-objcopy/ELF/decompress-sections.test
+++ b/llvm/test/tools/llvm-objcopy/ELF/decompress-sections.test
@@ -4,6 +4,8 @@
 # RUN: yaml2obj %s -o %t
 # RUN: llvm-objcopy --decompress-debug-sections %t %t.de
 # RUN: llvm-readelf -S %t.de | FileCheck %s
+# RUN: llvm-objcopy --compress-sections '*nonalloc=none' --compress-sections .debugx=none %t %t.de
+# RUN: llvm-readelf -S %t.de | FileCheck %s
 
 # CHECK:        Name              Type            Address          Off      Size     ES Flg Lk Inf Al
 # CHECK:        .debug_alloc      PROGBITS        0000000000000000 [[#%x,]] [[#%x,]] 00  AC  0   0  0
@@ -11,6 +13,33 @@
 # CHECK-NEXT:   .debugx           PROGBITS        0000000000000000 [[#%x,]] [[#%x,]] 00      0   0  1
 # CHECK-NEXT:   nodebug           PROGBITS        0000000000000000 [[#%x,]] [[#%x,]] 00   C  0   0  0
 
+# RUN: llvm-objcopy --compress-sections '.debug*=none' %t %t2.de
+# RUN: llvm-readelf -S -x .debug_alloc -x .debug_nonalloc -x .debugx %t2.de | FileCheck %s --check-prefix=CHECK2
+
+# CHECK2:        Name              Type            Address          Off      Size     ES Flg Lk Inf Al
+# CHECK2:        .debug_alloc      PROGBITS        0000000000000000 [[#%x,]] [[#%x,]] 00  A   0   0  1
+# CHECK2-NEXT:   .debug_nonalloc   PROGBITS        0000000000000000 [[#%x,]] [[#%x,]] 00      0   0  1
+# CHECK2-NEXT:   .debugx           PROGBITS        0000000000000000 [[#%x,]] [[#%x,]] 00      0   0  1
+# CHECK2-NEXT:   nodebug           PROGBITS        0000000000000000 [[#%x,]] [[#%x,]] 00   C  0   0  0
+
+# CHECK2:       Hex dump of section '.debug_alloc':
+# CHECK2-NEXT:  0x00000000 2a000000 00000000 2a000000 00000000 *.......*.......
+# CHECK2-NEXT:  0x00000010 2a000000 00000000 2a000000 00000000 *.......*.......
+# CHECK2-NEXT:  0x00000020 2a000000 00000000 2a000000 00000000 *.......*.......
+# CHECK2-NEXT:  0x00000030 2a000000 00000000 2a000000 00000000 *.......*.......
+# CHECK2-EMPTY:
+# CHECK2:       Hex dump of section '.debug_nonalloc':
+# CHECK2-NEXT:  0x00000000 2a000000 00000000 2a000000 00000000 *.......*.......
+# CHECK2-NEXT:  0x00000010 2a000000 00000000 2a000000 00000000 *.......*.......
+# CHECK2-NEXT:  0x00000020 2a000000 00000000 2a000000 00000000 *.......*.......
+# CHECK2-NEXT:  0x00000030 2a000000 00000000 2a000000 00000000 *.......*.......
+# CHECK2-EMPTY:
+# CHECK2-NEXT:  Hex dump of section '.debugx':
+# CHECK2-NEXT:  0x00000000 2a000000 00000000 2a000000 00000000 *.......*.......
+# CHECK2-NEXT:  0x00000010 2a000000 00000000 2a000000 00000000 *.......*.......
+# CHECK2-NEXT:  0x00000020 2a000000 00000000 2a000000 00000000 *.......*.......
+# CHECK2-NEXT:  0x00000030 2a000000 00000000 2a000000 00000000 *.......*.......
+
 --- !ELF
 FileHeader:
   Class:   ELFCLASS64
diff --git a/llvm/tools/llvm-objcopy/ObjcopyOptions.cpp b/llvm/tools/llvm-objcopy/ObjcopyOptions.cpp
index a0c6415bf0e6ae..3e38f26f058998 100644
--- a/llvm/tools/llvm-objcopy/ObjcopyOptions.cpp
+++ b/llvm/tools/llvm-objcopy/ObjcopyOptions.cpp
@@ -736,6 +736,42 @@ objcopy::parseObjcopyOptions(ArrayRef<const char *> RawArgsArr,
       return createStringError(errc::invalid_argument, Reason);
   }
 
+  for (const auto *A : InputArgs.filtered(OBJCOPY_compress_sections)) {
+    SmallVector<StringRef, 0> Fields;
+    StringRef(A->getValue()).split(Fields, '=');
+    if (Fields.size() != 2 || Fields[1].empty()) {
+      return createStringError(
+          errc::invalid_argument,
+          A->getSpelling() +
+              ": parse error, not 'section-glob=[none|zlib|zstd]'");
+    }
+
+    auto Type = StringSwitch<DebugCompressionType>(Fields[1])
+                    .Case("zlib", DebugCompressionType::Zlib)
+                    .Case("zstd", DebugCompressionType::Zstd)
+                    .Default(DebugCompressionType::None);
+    if (Type == DebugCompressionType::None && Fields[1] != "none") {
+      return createStringError(
+          errc::invalid_argument,
+          "invalid or unsupported --compress-sections format: %s",
+          A->getValue());
+    }
+
+    auto &P = Config.compressSections.emplace_back();
+    P.second = Type;
+    auto Matcher =
+        NameOrPattern::create(Fields[0], SectionMatchStyle, ErrorCallback);
+    // =none allows overriding a previous =zlib or =zstd. Reject negative
+    // patterns, which would be confusing.
+    if (Matcher && !Matcher->isPositiveMatch()) {
+      return createStringError(
+          errc::invalid_argument,
+          "--compress-sections: negative pattern is unsupported");
+    }
+    if (Error E = P.first.addMatcher(std::move(Matcher)))
+      return std::move(E);
+  }
+
   Config.AddGnuDebugLink = InputArgs.getLastArgValue(OBJCOPY_add_gnu_debuglink);
   // The gnu_debuglink's target is expected to not change or else its CRC would
   // become invalidated and get rejected. We can avoid recalculating the
diff --git a/llvm/tools/llvm-objcopy/ObjcopyOpts.td b/llvm/tools/llvm-objcopy/ObjcopyOpts.td
index 3c0e5cd475a36b..2d93eeae257875 100644
--- a/llvm/tools/llvm-objcopy/ObjcopyOpts.td
+++ b/llvm/tools/llvm-objcopy/ObjcopyOpts.td
@@ -35,6 +35,12 @@ def : Flag<["--"], "compress-debug-sections">, Alias<compress_debug_sections>,
       AliasArgs<["zlib"]>;
 def decompress_debug_sections : Flag<["--"], "decompress-debug-sections">,
                                 HelpText<"Decompress DWARF debug sections">;
+defm compress_sections
+    : Eq<"compress-sections",
+         "Compress or decompress sections using specified format. Supported "
+         "formats: zlib, zstd. Specify 'none' for decompression">,
+      MetaVarName<"<section-glob>=<format>">;
+
 defm split_dwo
     : Eq<"split-dwo", "Equivalent to --extract-dwo and <dwo-file> as the output file and no other options, "
                       "and then --strip-dwo on the input file">,

goldsteinn and others added 2 commits March 13, 2024 13:13
Created using spr 1.3.5-bogner

[skip ci]
Created using spr 1.3.5-bogner
@MaskRay MaskRay changed the base branch from users/MaskRay/spr/main.llvm-objcopy-add-compress-sections to main March 13, 2024 20:14
MaskRay added a commit to MaskRay/llvm-project that referenced this pull request Mar 13, 2024
--compress-sections is similar to --compress-debug-sections but applies
to arbitrary sections.

* `--compress-sections <section>=none`: decompress sections
* `--compress-sections <section>=[zlib|zstd]`: compress sections with zlib/zstd

Like `--remove-section`, the pattern is by default a glob, but a regex
when --regex is specified.

For `--remove-section` like options, `!` prevents matches and is not
dependent on ordering (see `ELF/wildcard-syntax.test`). Since
`--compress-sections a=zlib --compress-sections a=none` naturally allows
overriding, having an order-independent `!` would be confusing.
Therefore, `!` is disallowed.

Sections within a segment are effectively immutable. Report an error for
an attempt to (de)compress them. `SHF_ALLOC` sections in a relocatable
file can be compressed, but linkers usually reject them.

Link: https://discourse.llvm.org/t/rfc-compress-arbitrary-sections-with-ld-lld-compress-sections/71674

Pull Request: llvm#85036
@MaskRay
Copy link
Member Author

MaskRay commented Mar 26, 2024

Ping:)

Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was waiting for some sort of traction on the RFC (of which there hasn't really been any from what I can tell).

I'll go through the PR as I have nothing specifically against the option, but I think there needs to be some clarity that there is more widespread appeal of this option.

llvm/include/llvm/ObjCopy/CommonConfig.h Outdated Show resolved Hide resolved
llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp Show resolved Hide resolved
// We can't call `AddSection` while iterating over sections,
// because it would mutate the sections array.
SmallVector<std::pair<SectionBase *, std::function<SectionBase *()>>, 0>
ToReplace;
for (SectionBase &Sec : sections()) {
if ((Sec.Flags & SHF_ALLOC) || !StringRef(Sec.Name).starts_with(".debug"))
std::optional<DebugCompressionType> CType;
for (auto &[Matcher, T] : Config.compressSections)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A thought: does it make sense to ignore sections specified with compressSections if their compression state already matches the requested one? Alternatively, report a specific error for that case? So e.g. if requested to decompress the already decompressed .symtab, we'd either do nothing or emit an error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With recent GNU objcopy, --compress-debug-sections=zstd on a zlib-compressed section will recompress the content with zstd. This is whether our behavior is different from GNU. I've actually tried implementing this behavior, but in the end the benefit doesn't feel clear and the complexity seems quite high (I don't even find a good way to implement it related to our CompressedSection abstraction).

I think the scenario is likely very rare and users might not expect a specific behavior.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we perhaps add a test to show the current behaviour then? I'm not sure either way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added
## If a section is already compressed, compression request for another format is ignored.

llvm/test/tools/llvm-objcopy/ELF/compress-sections.s Outdated Show resolved Hide resolved
llvm/test/tools/llvm-objcopy/ELF/decompress-sections.test Outdated Show resolved Hide resolved
@MaskRay
Copy link
Member Author

MaskRay commented Mar 26, 2024

Sorry, I was waiting for some sort of traction on the RFC (of which there hasn't really been any from what I can tell).

I'll go through the PR as I have nothing specifically against the option, but I think there needs to be some clarity that there is more widespread appeal of this option.

Thanks. This is to complement [[RFC] Compress arbitrary sections with ld.lld –compress-sections, so that ld.lld --compress-sections=... compressed sections can be decompressed. My previous refactoring and simplification is to decrease the complexity.

[RFC] Compressed SHT_SYMTAB/SHT_STRTAB for ELF is a related proposal that doesn't get much traction yet. While I agree we should be more careful about whether we should go that direction (supporting SHF_COMPRESSED SHT_STRTAB), I think its acceptance is less related to this patch.

Created using spr 1.3.5-bogner
Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have a release note.

Looks like most of my test comments haven't been addressed?

llvm/docs/CommandGuide/llvm-objcopy.rst Outdated Show resolved Hide resolved
llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp Outdated Show resolved Hide resolved
MaskRay and others added 2 commits March 28, 2024 17:08
Co-authored-by: James Henderson <46713263+jh7370@users.noreply.github.com>
Created using spr 1.3.5-bogner
@MaskRay
Copy link
Member Author

MaskRay commented Mar 29, 2024

This should have a release note.

Looks like most of my test comments haven't been addressed?

Sorry, I did not notice them. Addresses all, I think. #87027 is about DebugCompressionType renaming

MaskRay added a commit to MaskRay/llvm-project that referenced this pull request Mar 29, 2024
--compress-sections is similar to --compress-debug-sections but applies
to arbitrary sections.

* `--compress-sections <section>=none`: decompress sections
* `--compress-sections <section>=[zlib|zstd]`: compress sections with zlib/zstd

Like `--remove-section`, the pattern is by default a glob, but a regex
when --regex is specified.

For `--remove-section` like options, `!` prevents matches and is not
dependent on ordering (see `ELF/wildcard-syntax.test`). Since
`--compress-sections a=zlib --compress-sections a=none` naturally allows
overriding, having an order-independent `!` would be confusing.
Therefore, `!` is disallowed.

Sections within a segment are effectively immutable. Report an error for
an attempt to (de)compress them. `SHF_ALLOC` sections in a relocatable
file can be compressed, but linkers usually reject them.

Link: https://discourse.llvm.org/t/rfc-compress-arbitrary-sections-with-ld-lld-compress-sections/71674

Pull Request: llvm#85036
// We can't call `AddSection` while iterating over sections,
// because it would mutate the sections array.
SmallVector<std::pair<SectionBase *, std::function<SectionBase *()>>, 0>
ToReplace;
for (SectionBase &Sec : sections()) {
if ((Sec.Flags & SHF_ALLOC) || !StringRef(Sec.Name).starts_with(".debug"))
std::optional<DebugCompressionType> CType;
for (auto &[Matcher, T] : Config.compressSections)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we perhaps add a test to show the current behaviour then? I'm not sure either way.

Created using spr 1.3.5-bogner
MaskRay added a commit to MaskRay/llvm-project that referenced this pull request Apr 3, 2024
--compress-sections is similar to --compress-debug-sections but applies
to arbitrary sections.

* `--compress-sections <section>=none`: decompress sections
* `--compress-sections <section>=[zlib|zstd]`: compress sections with zlib/zstd

Like `--remove-section`, the pattern is by default a glob, but a regex
when --regex is specified.

For `--remove-section` like options, `!` prevents matches and is not
dependent on ordering (see `ELF/wildcard-syntax.test`). Since
`--compress-sections a=zlib --compress-sections a=none` naturally allows
overriding, having an order-independent `!` would be confusing.
Therefore, `!` is disallowed.

Sections within a segment are effectively immutable. Report an error for
an attempt to (de)compress them. `SHF_ALLOC` sections in a relocatable
file can be compressed, but linkers usually reject them.

Link: https://discourse.llvm.org/t/rfc-compress-arbitrary-sections-with-ld-lld-compress-sections/71674

Pull Request: llvm#85036
Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this now LGTM, with a comment that still needs tweaking in my opinion.

MaskRay and others added 2 commits April 4, 2024 00:15
Co-authored-by: James Henderson <46713263+jh7370@users.noreply.github.com>
… it is before --compress-sections."

Created using spr 1.3.5-bogner
MaskRay added a commit that referenced this pull request Apr 4, 2024
--compress-sections is similar to --compress-debug-sections but applies
to arbitrary sections.

* `--compress-sections <section>=none`: decompress sections
* `--compress-sections <section>=[zlib|zstd]`: compress sections with zlib/zstd

Like `--remove-section`, the pattern is by default a glob, but a regex
when --regex is specified.

For `--remove-section` like options, `!` prevents matches and is not
dependent on ordering (see `ELF/wildcard-syntax.test`). Since
`--compress-sections a=zlib --compress-sections a=none` naturally allows
overriding, having an order-independent `!` would be confusing.
Therefore, `!` is disallowed.

Sections within a segment are effectively immutable. Report an error for
an attempt to (de)compress them. `SHF_ALLOC` sections in a relocatable
file can be compressed, but linkers usually reject them.

Link: https://discourse.llvm.org/t/rfc-compress-arbitrary-sections-with-ld-lld-compress-sections/71674

Pull Request: #85036
@MaskRay
Copy link
Member Author

MaskRay commented Apr 4, 2024

Landed as 9e3b64b (sorry, forgot spr land / web UI so this is not shown as "Merged").

@MaskRay MaskRay closed this Apr 4, 2024
@MaskRay MaskRay deleted the users/MaskRay/spr/llvm-objcopy-add-compress-sections branch April 4, 2024 17:28
@hctim
Copy link
Collaborator

hctim commented Apr 5, 2024

Hi Ray,

Looks like this broke the UBSan bot (https://lab.llvm.org/buildbot/#/builders/5/builds/42341/steps/9/logs/stdio).

FAIL: LLVM :: tools/llvm-objcopy/ELF/compress-sections.s (81669 of 81716)
******************** TEST 'LLVM :: tools/llvm-objcopy/ELF/compress-sections.s' FAILED ********************
Exit Code: 1
Command Output (stderr):
--
RUN: at line 3: rm -rf /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/test/tools/llvm-objcopy/ELF/Output/compress-sections.s.tmp && mkdir /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/test/tools/llvm-objcopy/ELF/Output/compress-sections.s.tmp && cd /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/test/tools/llvm-objcopy/ELF/Output/compress-sections.s.tmp
+ rm -rf /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/test/tools/llvm-objcopy/ELF/Output/compress-sections.s.tmp
+ mkdir /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/test/tools/llvm-objcopy/ELF/Output/compress-sections.s.tmp
+ cd /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/test/tools/llvm-objcopy/ELF/Output/compress-sections.s.tmp
RUN: at line 4: /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llvm-mc -filetype=obj -triple=x86_64 /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/tools/llvm-objcopy/ELF/compress-sections.s -o a.o
+ /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llvm-mc -filetype=obj -triple=x86_64 /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/tools/llvm-objcopy/ELF/compress-sections.s -o a.o
RUN: at line 7: /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llvm-objcopy a.o out --compress-sections='*0=zlib' --compress-sections '*0=none' --compress-sections 'nomatch=none' 2>&1 | /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/count 0
+ /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llvm-objcopy a.o out '--compress-sections=*0=zlib' --compress-sections '*0=none' --compress-sections nomatch=none
+ /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/count 0
RUN: at line 8: /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llvm-readelf -S out | /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/tools/llvm-objcopy/ELF/compress-sections.s --check-prefix=CHECK1
+ /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llvm-readelf -S out
+ /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/tools/llvm-objcopy/ELF/compress-sections.s --check-prefix=CHECK1
RUN: at line 22: /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llvm-objcopy a.o out2 --compress-sections '*c0=zlib' --compress-sections .debug_str=zstd
+ /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llvm-objcopy a.o out2 --compress-sections '*c0=zlib' --compress-sections .debug_str=zstd
/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/ObjCopy/ELF/ELFObject.cpp:2166:30: runtime error: reference binding to misaligned address 0x000000000001 for type 'const SectionBase', which requires 8 byte alignment
0x000000000001: note: pointer points here
<memory cannot be printed>
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/ObjCopy/ELF/ELFObject.cpp:2166:30 

Quickest commands to repro (some parts can probably be elided):

$ cmake \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DLLVM_USE_LINKER=lld \
-GNinja \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_C_FLAGS="-fsanitize=undefined" \
-DCMAKE_CXX_FLAGS="-fsanitize=undefined" \
-DLLVM_ENABLE_PROJECTS="'clang;lld'" \
-DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi;libunwind" \
-DLLVM_LIBC_ENABLE_LINTING=OFF \
-DLLVM_USE_SANITIZER=Undefined \
-DLLVM_ENABLE_ASSERTIONS=ON \
/usr/local/google/home/mitchp/llvm/llvm
$ LIT_OPTS='--filter compress-sections' ninja check-llvm

hctim added a commit that referenced this pull request Apr 5, 2024
This reverts commit 9e3b64b.

Reason: Broke the UBSan buildbot. See the comments in the pull request
(#85036) for more information.
MaskRay added a commit that referenced this pull request Apr 15, 2024
--compress-sections is similar to --compress-debug-sections but applies
to arbitrary sections.

* `--compress-sections <section>=none`: decompress sections
* `--compress-sections <section>=[zlib|zstd]`: compress sections with zlib/zstd

Like `--remove-section`, the pattern is by default a glob, but a regex
when --regex is specified.

For `--remove-section` like options, `!` prevents matches and is not
dependent on ordering (see `ELF/wildcard-syntax.test`). Since
`--compress-sections a=zlib --compress-sections a=none` naturally allows
overriding, having an order-independent `!` would be confusing.
Therefore, `!` is disallowed.

Sections within a segment are effectively immutable. Report an error for
an attempt to (de)compress them. `SHF_ALLOC` sections in a relocatable
file can be compressed, but linkers usually reject them.

Note: Before this patch, a compressed relocation section is recognized
as a `RelocationSectionBase` as well and `removeSections` `!ToRemove(*ToRelSec)`
may incorrectly interpret a `CompressedSections` as `RelocationSectionBase`,
leading to ubsan failure for the new test. Fix this by setting
`OriginalFlags` in CompressedSection::CompressedSection.

Link: https://discourse.llvm.org/t/rfc-compress-arbitrary-sections-with-ld-lld-compress-sections/71674

Pull Request: #85036
bazuzi pushed a commit to bazuzi/llvm-project that referenced this pull request Apr 15, 2024
--compress-sections is similar to --compress-debug-sections but applies
to arbitrary sections.

* `--compress-sections <section>=none`: decompress sections
* `--compress-sections <section>=[zlib|zstd]`: compress sections with zlib/zstd

Like `--remove-section`, the pattern is by default a glob, but a regex
when --regex is specified.

For `--remove-section` like options, `!` prevents matches and is not
dependent on ordering (see `ELF/wildcard-syntax.test`). Since
`--compress-sections a=zlib --compress-sections a=none` naturally allows
overriding, having an order-independent `!` would be confusing.
Therefore, `!` is disallowed.

Sections within a segment are effectively immutable. Report an error for
an attempt to (de)compress them. `SHF_ALLOC` sections in a relocatable
file can be compressed, but linkers usually reject them.

Note: Before this patch, a compressed relocation section is recognized
as a `RelocationSectionBase` as well and `removeSections` `!ToRemove(*ToRelSec)`
may incorrectly interpret a `CompressedSections` as `RelocationSectionBase`,
leading to ubsan failure for the new test. Fix this by setting
`OriginalFlags` in CompressedSection::CompressedSection.

Link: https://discourse.llvm.org/t/rfc-compress-arbitrary-sections-with-ld-lld-compress-sections/71674

Pull Request: llvm#85036
aniplcc pushed a commit to aniplcc/llvm-project that referenced this pull request Apr 15, 2024
--compress-sections is similar to --compress-debug-sections but applies
to arbitrary sections.

* `--compress-sections <section>=none`: decompress sections
* `--compress-sections <section>=[zlib|zstd]`: compress sections with zlib/zstd

Like `--remove-section`, the pattern is by default a glob, but a regex
when --regex is specified.

For `--remove-section` like options, `!` prevents matches and is not
dependent on ordering (see `ELF/wildcard-syntax.test`). Since
`--compress-sections a=zlib --compress-sections a=none` naturally allows
overriding, having an order-independent `!` would be confusing.
Therefore, `!` is disallowed.

Sections within a segment are effectively immutable. Report an error for
an attempt to (de)compress them. `SHF_ALLOC` sections in a relocatable
file can be compressed, but linkers usually reject them.

Note: Before this patch, a compressed relocation section is recognized
as a `RelocationSectionBase` as well and `removeSections` `!ToRemove(*ToRelSec)`
may incorrectly interpret a `CompressedSections` as `RelocationSectionBase`,
leading to ubsan failure for the new test. Fix this by setting
`OriginalFlags` in CompressedSection::CompressedSection.

Link: https://discourse.llvm.org/t/rfc-compress-arbitrary-sections-with-ld-lld-compress-sections/71674

Pull Request: llvm#85036
tmatheson-arm pushed a commit to tmatheson-arm/llvm-project that referenced this pull request Apr 22, 2024
--compress-sections is similar to --compress-debug-sections but applies
to arbitrary sections.

* `--compress-sections <section>=none`: decompress sections
* `--compress-sections <section>=[zlib|zstd]`: compress sections with zlib/zstd

Like `--remove-section`, the pattern is by default a glob, but a regex
when --regex is specified.

For `--remove-section` like options, `!` prevents matches and is not
dependent on ordering (see `ELF/wildcard-syntax.test`). Since
`--compress-sections a=zlib --compress-sections a=none` naturally allows
overriding, having an order-independent `!` would be confusing.
Therefore, `!` is disallowed.

Sections within a segment are effectively immutable. Report an error for
an attempt to (de)compress them. `SHF_ALLOC` sections in a relocatable
file can be compressed, but linkers usually reject them.

Note: Before this patch, a compressed relocation section is recognized
as a `RelocationSectionBase` as well and `removeSections` `!ToRemove(*ToRelSec)`
may incorrectly interpret a `CompressedSections` as `RelocationSectionBase`,
leading to ubsan failure for the new test. Fix this by setting
`OriginalFlags` in CompressedSection::CompressedSection.

Link: https://discourse.llvm.org/t/rfc-compress-arbitrary-sections-with-ld-lld-compress-sections/71674

Pull Request: llvm#85036
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants