-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[llvm-ar] Use COFF archive format for COFF targets. #82642
Conversation
@llvm/pr-subscribers-llvm-binary-utilities Author: Jacek Caban (cjacek) ChangesDetect COFF files by default and allow specifying it with --format argument. This is important for ARM64EC, which uses a separated symbol map for EC symbols. Since K_COFF is mostly compatible with K_GNU, this shouldn't really make a difference for other targets. Full diff: https://github.com/llvm/llvm-project/pull/82642.diff 5 Files Affected:
diff --git a/llvm/include/llvm/Object/Archive.h b/llvm/include/llvm/Object/Archive.h
index 3dd99a46507a24..66f07939b11050 100644
--- a/llvm/include/llvm/Object/Archive.h
+++ b/llvm/include/llvm/Object/Archive.h
@@ -339,6 +339,7 @@ class Archive : public Binary {
Kind kind() const { return (Kind)Format; }
bool isThin() const { return IsThin; }
static object::Archive::Kind getDefaultKindForHost();
+ static object::Archive::Kind getDefaultKindForTriple(Triple &T);
child_iterator child_begin(Error &Err, bool SkipInternal = true) const;
child_iterator child_end() const;
diff --git a/llvm/lib/Object/Archive.cpp b/llvm/lib/Object/Archive.cpp
index e447e5b23316f1..d3fdcd9ee88111 100644
--- a/llvm/lib/Object/Archive.cpp
+++ b/llvm/lib/Object/Archive.cpp
@@ -969,12 +969,19 @@ Archive::Archive(MemoryBufferRef Source, Error &Err)
Err = Error::success();
}
+object::Archive::Kind Archive::getDefaultKindForTriple(Triple &T) {
+ if (T.isOSDarwin())
+ return object::Archive::K_DARWIN;
+ if (T.isOSAIX())
+ return object::Archive::K_AIXBIG;
+ if (T.isOSWindows())
+ return object::Archive::K_COFF;
+ return object::Archive::K_GNU;
+}
+
object::Archive::Kind Archive::getDefaultKindForHost() {
Triple HostTriple(sys::getProcessTriple());
- return HostTriple.isOSDarwin()
- ? object::Archive::K_DARWIN
- : (HostTriple.isOSAIX() ? object::Archive::K_AIXBIG
- : object::Archive::K_GNU);
+ return getDefaultKindForTriple(HostTriple);
}
Archive::child_iterator Archive::child_begin(Error &Err,
diff --git a/llvm/lib/Object/ArchiveWriter.cpp b/llvm/lib/Object/ArchiveWriter.cpp
index 155926a8c5949d..02f72521c8b544 100644
--- a/llvm/lib/Object/ArchiveWriter.cpp
+++ b/llvm/lib/Object/ArchiveWriter.cpp
@@ -62,12 +62,16 @@ object::Archive::Kind NewArchiveMember::detectKindFromObject() const {
Expected<std::unique_ptr<object::ObjectFile>> OptionalObject =
object::ObjectFile::createObjectFile(MemBufferRef);
- if (OptionalObject)
- return isa<object::MachOObjectFile>(**OptionalObject)
- ? object::Archive::K_DARWIN
- : (isa<object::XCOFFObjectFile>(**OptionalObject)
- ? object::Archive::K_AIXBIG
- : object::Archive::K_GNU);
+ if (OptionalObject) {
+ if (isa<object::MachOObjectFile>(**OptionalObject))
+ return object::Archive::K_DARWIN;
+ if (isa<object::XCOFFObjectFile>(**OptionalObject))
+ return object::Archive::K_AIXBIG;
+ if (isa<object::COFFObjectFile>(**OptionalObject) ||
+ isa<object::COFFImportFile>(**OptionalObject))
+ return object::Archive::K_COFF;
+ return object::Archive::K_GNU;
+ }
// Squelch the error in case we had a non-object file.
consumeError(OptionalObject.takeError());
@@ -80,10 +84,7 @@ object::Archive::Kind NewArchiveMember::detectKindFromObject() const {
MemBufferRef, file_magic::bitcode, &Context)) {
auto &IRObject = cast<object::IRObjectFile>(**ObjOrErr);
auto TargetTriple = Triple(IRObject.getTargetTriple());
- return TargetTriple.isOSDarwin()
- ? object::Archive::K_DARWIN
- : (TargetTriple.isOSAIX() ? object::Archive::K_AIXBIG
- : object::Archive::K_GNU);
+ return object::Archive::getDefaultKindForTriple(TargetTriple);
} else {
// Squelch the error in case this was not a SymbolicFile.
consumeError(ObjOrErr.takeError());
diff --git a/llvm/test/tools/llvm-ar/coff-symtab.test b/llvm/test/tools/llvm-ar/coff-symtab.test
new file mode 100644
index 00000000000000..50d08fba3b02f6
--- /dev/null
+++ b/llvm/test/tools/llvm-ar/coff-symtab.test
@@ -0,0 +1,85 @@
+Verify that llvm-ar uses COFF archive format by ensuring that archive map is sorted.
+
+RUN: rm -rf %t.dif && split-file %s %t.dir && cd %t.dir
+
+RUN: yaml2obj coff-symtab.yaml -o coff-symtab.obj
+RUN: llvm-ar crs out.a coff-symtab.obj
+RUN: llvm-nm --print-armap out.a | FileCheck %s
+
+RUN: llvm-as coff-symtab.ll -o coff-symtab.bc
+RUN: llvm-ar crs out2.a coff-symtab.bc
+RUN: llvm-nm --print-armap out2.a | FileCheck %s
+
+RUN: yaml2obj elf.yaml -o coff-symtab.o
+RUN: llvm-ar crs --format coff out3.a coff-symtab.o
+RUN: llvm-nm --print-armap out3.a | FileCheck %s
+
+CHECK: Archive map
+CHECK-NEXT: a in coff-symtab
+CHECK-NEXT: b in coff-symtab
+CHECK-NEXT: c in coff-symtab
+CHECK-EMPTY:
+
+#--- coff-symtab.yaml
+--- !COFF
+header:
+ Machine: IMAGE_FILE_MACHINE_UNKNOWN
+ Characteristics: [ ]
+sections:
+ - Name: .text
+ Characteristics: [ IMAGE_SCN_CNT_CODE, IMAGE_SCN_MEM_EXECUTE, IMAGE_SCN_MEM_READ ]
+ Alignment: 4
+ SectionData: ''
+symbols:
+ - Name: b
+ Value: 0
+ SectionNumber: 1
+ SimpleType: IMAGE_SYM_TYPE_NULL
+ ComplexType: IMAGE_SYM_DTYPE_FUNCTION
+ StorageClass: IMAGE_SYM_CLASS_EXTERNAL
+ - Name: c
+ Value: 0
+ SectionNumber: 1
+ SimpleType: IMAGE_SYM_TYPE_NULL
+ ComplexType: IMAGE_SYM_DTYPE_FUNCTION
+ StorageClass: IMAGE_SYM_CLASS_EXTERNAL
+ - Name: a
+ Value: 0
+ SectionNumber: 1
+ SimpleType: IMAGE_SYM_TYPE_NULL
+ ComplexType: IMAGE_SYM_DTYPE_FUNCTION
+ StorageClass: IMAGE_SYM_CLASS_EXTERNAL
+...
+
+
+#--- coff-symtab.ll
+target triple = "x86_64-unknown-windows-msvc"
+
+define void @b() { ret void }
+define void @c() { ret void }
+define void @a() { ret void }
+
+#--- elf.yaml
+--- !ELF
+FileHeader:
+ Class: ELFCLASS64
+ Data : ELFDATA2LSB
+ Type: ET_REL
+ Machine: EM_X86_64
+Sections:
+ - Name: .text
+ Type: SHT_PROGBITS
+ Flags: [ SHF_ALLOC, SHF_EXECINSTR ]
+ AddressAlign: 0x0000000000000004
+ Content: ''
+Symbols:
+ - Name: b
+ Binding: STB_GLOBAL
+ Section: .text
+ - Name: c
+ Binding: STB_GLOBAL
+ Section: .text
+ - Name: a
+ Binding: STB_GLOBAL
+ Section: .text
+...
diff --git a/llvm/tools/llvm-ar/llvm-ar.cpp b/llvm/tools/llvm-ar/llvm-ar.cpp
index c8800303bc1e42..c58a85c695dacf 100644
--- a/llvm/tools/llvm-ar/llvm-ar.cpp
+++ b/llvm/tools/llvm-ar/llvm-ar.cpp
@@ -82,6 +82,7 @@ static void printArHelp(StringRef ToolName) {
=darwin - darwin
=bsd - bsd
=bigarchive - big archive (AIX OS)
+ =coff - coff
--plugin=<string> - ignored for compatibility
-h --help - display this help and exit
--output - the directory to extract archive members to
@@ -193,7 +194,7 @@ static SmallVector<const char *, 256> PositionalArgs;
static bool MRI;
namespace {
-enum Format { Default, GNU, BSD, DARWIN, BIGARCHIVE, Unknown };
+enum Format { Default, GNU, COFF, BSD, DARWIN, BIGARCHIVE, Unknown };
}
static Format FormatType = Default;
@@ -1044,6 +1045,9 @@ static void performWriteOperation(ArchiveOperation Operation,
case GNU:
Kind = object::Archive::K_GNU;
break;
+ case COFF:
+ Kind = object::Archive::K_COFF;
+ break;
case BSD:
if (Thin)
fail("only the gnu format has a thin mode");
@@ -1376,6 +1380,7 @@ static int ar_main(int argc, char **argv) {
.Case("darwin", DARWIN)
.Case("bsd", BSD)
.Case("bigarchive", BIGARCHIVE)
+ .Case("coff", COFF)
.Default(Unknown);
if (FormatType == Unknown)
fail(std::string("Invalid format ") + Match);
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks! This looks like a very straightforward refactoring, with a minimal behaviour change on top.
(In principle, this could maybe have been split into a pure NFC refactoring, then a separate step that only adds the COFF
cases on top, but I don't think it's worth the extra effort here, as it's trivial enough anyway, and stacked PRs are annoying.)
if (T.isOSWindows()) | ||
return object::Archive::K_COFF; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little nervous about how this impacts the default archive format on a Windows host: our downstream uses a cross-compiler, typically hosted on Windows, but using GNU-style archives (and the objects therefore have a different triple too).
I'm going to suggest to our internal binutils team that they run some testing with this patch applied, to see if it makes any difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note the host matters only when llvm-ar
can't infer the format from any other source. In practice, it means that it matters only when creating empty archives and archives who's first member is a non-symbolic files. In typical use case, when you pass object files, this PR shouldn't change anything unless you use COFF files.
I could limit impact of this PR by not changing the effect of getDefaultKindForHost
, but it feels right to be consistent. If we consider those cases to be a problem, then it's also problematic for existing isOSDarwin()
and isOSAIX()
checks (which have even more significant format differences). If user cares about the format and uses llvm-ar
on non-symbolic files, then an explicit --format
argument is the only reliable way I can see to handle all cross compilation variants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jh7370, I checked and this will change our downstream ar's output in the cases @cjacek highlights above. However, due to @cjacek's points I cannot argue against it's submission.
The current behavior that default output format is defined by host does not seem good for the cross-compile use case in general. It may be preferable for the default to be based on the CMAKE option LLVM_DEFAULT_TARGET_TRIPLE or something similar to gnu binutils and the GNUTARGET environment variable. Both are likely out of the scope of this PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLVM_DEFAULT_TARGET_TRIPLE sounds like a good idea, I will create a separated PR for it, thanks.
After years of shipping llvm-ar this change has brought up some interesting questions :) The main question I have is: do we really want the behaviour where the input file type determines the output file type? I'm not sure why this was implemented - and now that I think about it is seems indefensible. For some reason I accepted this behaviour without questioning it, but, looking at it with fresh eyes, I think that the behaviour should be that the output type has a fixed default and can be overridden by an explicit --target option? |
In theory, the behavior difference between llvm-ar and native "ar" could cause issues in edge cases. In practice, I've never heard of it causing anyone issues. Build systems generally don't create empty archives, or stick random files into archives. And even if they do, it only matters if you try to mix llvm-ar with a non-LLVM native archiver/linker; lld can parse any archive format. |
Detect COFF files by default and allow specifying it with --format argument.
Reverts #82642 for lld/test/ELF/invalid/Output/data-encoding.test.tmp.a failures on Windows.
Yes. I think for the reasons you mentioned this is isn't particularly important. It would be nice for the behavior make sense as far as possible though. In offline conversation @jh7370 noted the following:
So, I think that no one has noticed this up to now and we could change this behaviour. |
Detect COFF files by default and allow specifying it with --format argument. This is important for ARM64EC, which uses a separated symbol map for EC symbols. Since K_COFF is mostly compatible with K_GNU, this shouldn't really make a difference for other targets.
Detect COFF files by default and allow specifying it with --format argument. This is important for ARM64EC, which uses a separated symbol map for EC symbols. Since K_COFF is mostly compatible with K_GNU, this shouldn't really make a difference for other targets. This originally landed as llvm#82642, but was reverted due to test failures in tests using no symbol table. Since COFF symbol can't express it, fallback to GNU format in that case.
Detect COFF files by default and allow specifying it with --format argument. This is important for ARM64EC, which uses a separated symbol map for EC symbols. Since K_COFF is mostly compatible with K_GNU, this shouldn't really make a difference for other targets. This originally landed as #82642, but was reverted due to test failures in tests using no symbol table. Since COFF symbol can't express it, fallback to GNU format in that case.
Detect COFF files by default and allow specifying it with --format argument.
This is important for ARM64EC, which uses a separated symbol map for EC symbols. Since K_COFF is mostly compatible with K_GNU, this shouldn't really make a difference for other targets.