Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[llvm-ar] Use COFF archive format for COFF targets. #82642

Merged
merged 1 commit into from
Feb 24, 2024
Merged

Conversation

cjacek
Copy link
Contributor

@cjacek cjacek commented Feb 22, 2024

Detect COFF files by default and allow specifying it with --format argument.

This is important for ARM64EC, which uses a separated symbol map for EC symbols. Since K_COFF is mostly compatible with K_GNU, this shouldn't really make a difference for other targets.

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 22, 2024

@llvm/pr-subscribers-llvm-binary-utilities

Author: Jacek Caban (cjacek)

Changes

Detect COFF files by default and allow specifying it with --format argument.

This is important for ARM64EC, which uses a separated symbol map for EC symbols. Since K_COFF is mostly compatible with K_GNU, this shouldn't really make a difference for other targets.


Full diff: https://github.com/llvm/llvm-project/pull/82642.diff

5 Files Affected:

  • (modified) llvm/include/llvm/Object/Archive.h (+1)
  • (modified) llvm/lib/Object/Archive.cpp (+11-4)
  • (modified) llvm/lib/Object/ArchiveWriter.cpp (+11-10)
  • (added) llvm/test/tools/llvm-ar/coff-symtab.test (+85)
  • (modified) llvm/tools/llvm-ar/llvm-ar.cpp (+6-1)
diff --git a/llvm/include/llvm/Object/Archive.h b/llvm/include/llvm/Object/Archive.h
index 3dd99a46507a24..66f07939b11050 100644
--- a/llvm/include/llvm/Object/Archive.h
+++ b/llvm/include/llvm/Object/Archive.h
@@ -339,6 +339,7 @@ class Archive : public Binary {
   Kind kind() const { return (Kind)Format; }
   bool isThin() const { return IsThin; }
   static object::Archive::Kind getDefaultKindForHost();
+  static object::Archive::Kind getDefaultKindForTriple(Triple &T);
 
   child_iterator child_begin(Error &Err, bool SkipInternal = true) const;
   child_iterator child_end() const;
diff --git a/llvm/lib/Object/Archive.cpp b/llvm/lib/Object/Archive.cpp
index e447e5b23316f1..d3fdcd9ee88111 100644
--- a/llvm/lib/Object/Archive.cpp
+++ b/llvm/lib/Object/Archive.cpp
@@ -969,12 +969,19 @@ Archive::Archive(MemoryBufferRef Source, Error &Err)
   Err = Error::success();
 }
 
+object::Archive::Kind Archive::getDefaultKindForTriple(Triple &T) {
+  if (T.isOSDarwin())
+    return object::Archive::K_DARWIN;
+  if (T.isOSAIX())
+    return object::Archive::K_AIXBIG;
+  if (T.isOSWindows())
+    return object::Archive::K_COFF;
+  return object::Archive::K_GNU;
+}
+
 object::Archive::Kind Archive::getDefaultKindForHost() {
   Triple HostTriple(sys::getProcessTriple());
-  return HostTriple.isOSDarwin()
-             ? object::Archive::K_DARWIN
-             : (HostTriple.isOSAIX() ? object::Archive::K_AIXBIG
-                                     : object::Archive::K_GNU);
+  return getDefaultKindForTriple(HostTriple);
 }
 
 Archive::child_iterator Archive::child_begin(Error &Err,
diff --git a/llvm/lib/Object/ArchiveWriter.cpp b/llvm/lib/Object/ArchiveWriter.cpp
index 155926a8c5949d..02f72521c8b544 100644
--- a/llvm/lib/Object/ArchiveWriter.cpp
+++ b/llvm/lib/Object/ArchiveWriter.cpp
@@ -62,12 +62,16 @@ object::Archive::Kind NewArchiveMember::detectKindFromObject() const {
   Expected<std::unique_ptr<object::ObjectFile>> OptionalObject =
       object::ObjectFile::createObjectFile(MemBufferRef);
 
-  if (OptionalObject)
-    return isa<object::MachOObjectFile>(**OptionalObject)
-               ? object::Archive::K_DARWIN
-               : (isa<object::XCOFFObjectFile>(**OptionalObject)
-                      ? object::Archive::K_AIXBIG
-                      : object::Archive::K_GNU);
+  if (OptionalObject) {
+    if (isa<object::MachOObjectFile>(**OptionalObject))
+      return object::Archive::K_DARWIN;
+    if (isa<object::XCOFFObjectFile>(**OptionalObject))
+      return object::Archive::K_AIXBIG;
+    if (isa<object::COFFObjectFile>(**OptionalObject) ||
+        isa<object::COFFImportFile>(**OptionalObject))
+      return object::Archive::K_COFF;
+    return object::Archive::K_GNU;
+  }
 
   // Squelch the error in case we had a non-object file.
   consumeError(OptionalObject.takeError());
@@ -80,10 +84,7 @@ object::Archive::Kind NewArchiveMember::detectKindFromObject() const {
             MemBufferRef, file_magic::bitcode, &Context)) {
       auto &IRObject = cast<object::IRObjectFile>(**ObjOrErr);
       auto TargetTriple = Triple(IRObject.getTargetTriple());
-      return TargetTriple.isOSDarwin()
-                 ? object::Archive::K_DARWIN
-                 : (TargetTriple.isOSAIX() ? object::Archive::K_AIXBIG
-                                           : object::Archive::K_GNU);
+      return object::Archive::getDefaultKindForTriple(TargetTriple);
     } else {
       // Squelch the error in case this was not a SymbolicFile.
       consumeError(ObjOrErr.takeError());
diff --git a/llvm/test/tools/llvm-ar/coff-symtab.test b/llvm/test/tools/llvm-ar/coff-symtab.test
new file mode 100644
index 00000000000000..50d08fba3b02f6
--- /dev/null
+++ b/llvm/test/tools/llvm-ar/coff-symtab.test
@@ -0,0 +1,85 @@
+Verify that llvm-ar uses COFF archive format by ensuring that archive map is sorted.
+
+RUN: rm -rf %t.dif && split-file %s %t.dir && cd %t.dir
+
+RUN: yaml2obj coff-symtab.yaml -o coff-symtab.obj
+RUN: llvm-ar crs out.a coff-symtab.obj
+RUN: llvm-nm --print-armap out.a | FileCheck %s
+
+RUN: llvm-as coff-symtab.ll -o coff-symtab.bc
+RUN: llvm-ar crs out2.a coff-symtab.bc
+RUN: llvm-nm --print-armap out2.a | FileCheck %s
+
+RUN: yaml2obj elf.yaml -o coff-symtab.o
+RUN: llvm-ar crs --format coff out3.a coff-symtab.o
+RUN: llvm-nm --print-armap out3.a | FileCheck %s
+
+CHECK: Archive map
+CHECK-NEXT: a in coff-symtab
+CHECK-NEXT: b in coff-symtab
+CHECK-NEXT: c in coff-symtab
+CHECK-EMPTY:
+
+#--- coff-symtab.yaml
+--- !COFF
+header:
+  Machine:           IMAGE_FILE_MACHINE_UNKNOWN
+  Characteristics:   [  ]
+sections:
+  - Name:            .text
+    Characteristics: [ IMAGE_SCN_CNT_CODE, IMAGE_SCN_MEM_EXECUTE, IMAGE_SCN_MEM_READ ]
+    Alignment:       4
+    SectionData:     ''
+symbols:
+  - Name:            b
+    Value:           0
+    SectionNumber:   1
+    SimpleType:      IMAGE_SYM_TYPE_NULL
+    ComplexType:     IMAGE_SYM_DTYPE_FUNCTION
+    StorageClass:    IMAGE_SYM_CLASS_EXTERNAL
+  - Name:            c
+    Value:           0
+    SectionNumber:   1
+    SimpleType:      IMAGE_SYM_TYPE_NULL
+    ComplexType:     IMAGE_SYM_DTYPE_FUNCTION
+    StorageClass:    IMAGE_SYM_CLASS_EXTERNAL
+  - Name:            a
+    Value:           0
+    SectionNumber:   1
+    SimpleType:      IMAGE_SYM_TYPE_NULL
+    ComplexType:     IMAGE_SYM_DTYPE_FUNCTION
+    StorageClass:    IMAGE_SYM_CLASS_EXTERNAL
+...
+
+
+#--- coff-symtab.ll
+target triple = "x86_64-unknown-windows-msvc"
+
+define void @b() { ret void }
+define void @c() { ret void }
+define void @a() { ret void }
+
+#--- elf.yaml
+--- !ELF
+FileHeader:
+  Class:             ELFCLASS64
+  Data  :            ELFDATA2LSB
+  Type:              ET_REL
+  Machine:           EM_X86_64
+Sections:
+  - Name:            .text
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC, SHF_EXECINSTR ]
+    AddressAlign:    0x0000000000000004
+    Content:         ''
+Symbols:
+  - Name:            b
+    Binding:         STB_GLOBAL
+    Section:         .text
+  - Name:            c
+    Binding:         STB_GLOBAL
+    Section:         .text
+  - Name:            a
+    Binding:         STB_GLOBAL
+    Section:         .text
+...
diff --git a/llvm/tools/llvm-ar/llvm-ar.cpp b/llvm/tools/llvm-ar/llvm-ar.cpp
index c8800303bc1e42..c58a85c695dacf 100644
--- a/llvm/tools/llvm-ar/llvm-ar.cpp
+++ b/llvm/tools/llvm-ar/llvm-ar.cpp
@@ -82,6 +82,7 @@ static void printArHelp(StringRef ToolName) {
     =darwin             -   darwin
     =bsd                -   bsd
     =bigarchive         -   big archive (AIX OS)
+    =coff               -   coff
   --plugin=<string>     - ignored for compatibility
   -h --help             - display this help and exit
   --output              - the directory to extract archive members to
@@ -193,7 +194,7 @@ static SmallVector<const char *, 256> PositionalArgs;
 static bool MRI;
 
 namespace {
-enum Format { Default, GNU, BSD, DARWIN, BIGARCHIVE, Unknown };
+enum Format { Default, GNU, COFF, BSD, DARWIN, BIGARCHIVE, Unknown };
 }
 
 static Format FormatType = Default;
@@ -1044,6 +1045,9 @@ static void performWriteOperation(ArchiveOperation Operation,
   case GNU:
     Kind = object::Archive::K_GNU;
     break;
+  case COFF:
+    Kind = object::Archive::K_COFF;
+    break;
   case BSD:
     if (Thin)
       fail("only the gnu format has a thin mode");
@@ -1376,6 +1380,7 @@ static int ar_main(int argc, char **argv) {
                        .Case("darwin", DARWIN)
                        .Case("bsd", BSD)
                        .Case("bigarchive", BIGARCHIVE)
+                       .Case("coff", COFF)
                        .Default(Unknown);
       if (FormatType == Unknown)
         fail(std::string("Invalid format ") + Match);

Copy link
Member

@mstorsjo mstorsjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! This looks like a very straightforward refactoring, with a minimal behaviour change on top.

(In principle, this could maybe have been split into a pure NFC refactoring, then a separate step that only adds the COFF cases on top, but I don't think it's worth the extra effort here, as it's trivial enough anyway, and stacked PRs are annoying.)

Comment on lines +977 to +978
if (T.isOSWindows())
return object::Archive::K_COFF;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little nervous about how this impacts the default archive format on a Windows host: our downstream uses a cross-compiler, typically hosted on Windows, but using GNU-style archives (and the objects therefore have a different triple too).

I'm going to suggest to our internal binutils team that they run some testing with this patch applied, to see if it makes any difference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note the host matters only when llvm-ar can't infer the format from any other source. In practice, it means that it matters only when creating empty archives and archives who's first member is a non-symbolic files. In typical use case, when you pass object files, this PR shouldn't change anything unless you use COFF files.

I could limit impact of this PR by not changing the effect of getDefaultKindForHost, but it feels right to be consistent. If we consider those cases to be a problem, then it's also problematic for existing isOSDarwin() and isOSAIX() checks (which have even more significant format differences). If user cares about the format and uses llvm-ar on non-symbolic files, then an explicit --format argument is the only reliable way I can see to handle all cross compilation variants.

Copy link
Collaborator

@gbreynoo gbreynoo Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jh7370, I checked and this will change our downstream ar's output in the cases @cjacek highlights above. However, due to @cjacek's points I cannot argue against it's submission.

The current behavior that default output format is defined by host does not seem good for the cross-compile use case in general. It may be preferable for the default to be based on the CMAKE option LLVM_DEFAULT_TARGET_TRIPLE or something similar to gnu binutils and the GNUTARGET environment variable. Both are likely out of the scope of this PR though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLVM_DEFAULT_TARGET_TRIPLE sounds like a good idea, I will create a separated PR for it, thanks.

@bd1976bris
Copy link
Collaborator

After years of shipping llvm-ar this change has brought up some interesting questions :) The main question I have is: do we really want the behaviour where the input file type determines the output file type? I'm not sure why this was implemented - and now that I think about it is seems indefensible. For some reason I accepted this behaviour without questioning it, but, looking at it with fresh eyes, I think that the behaviour should be that the output type has a fixed default and can be overridden by an explicit --target option?

@efriedma-quic
Copy link
Collaborator

In theory, the behavior difference between llvm-ar and native "ar" could cause issues in edge cases. In practice, I've never heard of it causing anyone issues. Build systems generally don't create empty archives, or stick random files into archives. And even if they do, it only matters if you try to mix llvm-ar with a non-LLVM native archiver/linker; lld can parse any archive format.

Detect COFF files by default and allow specifying it with --format argument.
@cjacek
Copy link
Contributor Author

cjacek commented Feb 24, 2024

I pushed the commit and created #82888 with @gbreynoo's suggestion. Thanks for reviews!

@cjacek cjacek restored the ar-coff branch February 24, 2024 16:41
cjacek added a commit that referenced this pull request Feb 24, 2024
cjacek added a commit that referenced this pull request Feb 24, 2024
Reverts #82642 for
lld/test/ELF/invalid/Output/data-encoding.test.tmp.a failures on
Windows.
@bd1976bris
Copy link
Collaborator

bd1976bris commented Feb 26, 2024

In theory, the behavior difference between llvm-ar and native "ar" could cause issues in edge cases. In practice, I've never heard of it causing anyone issues. Build systems generally don't create empty archives, or stick random files into archives. And even if they do, it only matters if you try to mix llvm-ar with a non-LLVM native archiver/linker; lld can parse any archive format.

Yes. I think for the reasons you mentioned this is isn't particularly important. It would be nice for the behavior make sense as far as possible though. In offline conversation @jh7370 noted the following:

Original behaviour of using the host comes from 9 years ago in a change by Rafael: https://github.com/llvm/llvm-project/commit/2535ea0b836bc3f1fd48d420f3611d1f97ad461d, and has just been incrementally been built on.
Commit message has no justification why using the host rather than e.g. the default triple (if it even existed in 2015) would be the right thing to do.

So, I think that no one has noticed this up to now and we could change this behaviour.

cjacek added a commit to cjacek/llvm-project that referenced this pull request Mar 12, 2024
Detect COFF files by default and allow specifying it with --format
argument.

This is important for ARM64EC, which uses a separated symbol map for EC
symbols. Since K_COFF is mostly compatible with K_GNU, this shouldn't
really make a difference for other targets.
cjacek added a commit to cjacek/llvm-project that referenced this pull request Mar 13, 2024
Detect COFF files by default and allow specifying it with --format
argument.

This is important for ARM64EC, which uses a separated symbol map for EC
symbols. Since K_COFF is mostly compatible with K_GNU, this shouldn't
really make a difference for other targets.

This originally landed as llvm#82642, but was reverted due to test failures
in tests using no symbol table. Since COFF symbol can't express it,
fallback to GNU format in that case.
cjacek added a commit that referenced this pull request Mar 13, 2024
Detect COFF files by default and allow specifying it with --format
argument.

This is important for ARM64EC, which uses a separated symbol map for EC
symbols. Since K_COFF is mostly compatible with K_GNU, this shouldn't
really make a difference for other targets.

This originally landed as #82642, but was reverted due to test failures
in tests using no symbol table. Since COFF symbol can't express it,
fallback to GNU format in that case.
@cjacek cjacek deleted the ar-coff branch March 13, 2024 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants