-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[LLD][COFF] Deduplicate common chunks when linking COFF files. #162553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-platform-windows Author: Joshua Cranmer (jcranmer-intel) ChangesThis fixes issue 162148. Common symbols are intended to have only a single version of the data present in the final executable. The MSVC linker is able to successfully deduplicate these chunks. If you have an application with a large number of translation units with a large block of common data (this is possible, for example, with Fortran code), then failing to deduplicate these chunks can make the data size so large that the resulting executable fails to load. The logic in this patch doesn't catch all of the potential cases for deduplication, but it should catch the most common ones. Full diff: https://github.com/llvm/llvm-project/pull/162553.diff 4 Files Affected:
diff --git a/lld/COFF/Chunks.cpp b/lld/COFF/Chunks.cpp
index ff3c89884c24d..d752a5bef7594 100644
--- a/lld/COFF/Chunks.cpp
+++ b/lld/COFF/Chunks.cpp
@@ -773,7 +773,7 @@ uint32_t SectionChunk::getSectionNumber() const {
return s.getIndex() + 1;
}
-CommonChunk::CommonChunk(const COFFSymbolRef s) : sym(s) {
+CommonChunk::CommonChunk(const COFFSymbolRef s) : active(false), sym(s) {
// The value of a common symbol is its size. Align all common symbols smaller
// than 32 bytes naturally, i.e. round the size up to the next power of two.
// This is what MSVC link.exe does.
diff --git a/lld/COFF/Chunks.h b/lld/COFF/Chunks.h
index 7ba58e336451f..bf05d547f9c88 100644
--- a/lld/COFF/Chunks.h
+++ b/lld/COFF/Chunks.h
@@ -520,6 +520,8 @@ class CommonChunk : public NonSectionChunk {
uint32_t getOutputCharacteristics() const override;
StringRef getSectionName() const override { return ".bss"; }
+ bool active;
+
private:
const COFFSymbolRef sym;
};
diff --git a/lld/COFF/Symbols.h b/lld/COFF/Symbols.h
index 465d4df52c630..e166329a66bdf 100644
--- a/lld/COFF/Symbols.h
+++ b/lld/COFF/Symbols.h
@@ -233,6 +233,8 @@ class DefinedCommon : public DefinedCOFF {
CommonChunk *c = nullptr)
: DefinedCOFF(DefinedCommonKind, f, n, s), data(c), size(size) {
this->isExternal = true;
+ if (c)
+ c->active = true;
}
static bool classof(const Symbol *s) {
diff --git a/lld/COFF/Writer.cpp b/lld/COFF/Writer.cpp
index 3d95d219a493c..e365eb140f52b 100644
--- a/lld/COFF/Writer.cpp
+++ b/lld/COFF/Writer.cpp
@@ -1093,6 +1093,10 @@ void Writer::createSections() {
sc->printDiscardedMessage();
continue;
}
+ if (auto *cc = dyn_cast<CommonChunk>(c)) {
+ if (!cc->active)
+ continue;
+ }
StringRef name = c->getSectionName();
if (shouldStripSectionSuffix(sc, name, ctx.config.mingw))
name = name.split('$').first;
|
@llvm/pr-subscribers-lld Author: Joshua Cranmer (jcranmer-intel) ChangesThis fixes issue 162148. Common symbols are intended to have only a single version of the data present in the final executable. The MSVC linker is able to successfully deduplicate these chunks. If you have an application with a large number of translation units with a large block of common data (this is possible, for example, with Fortran code), then failing to deduplicate these chunks can make the data size so large that the resulting executable fails to load. The logic in this patch doesn't catch all of the potential cases for deduplication, but it should catch the most common ones. Full diff: https://github.com/llvm/llvm-project/pull/162553.diff 4 Files Affected:
diff --git a/lld/COFF/Chunks.cpp b/lld/COFF/Chunks.cpp
index ff3c89884c24d..d752a5bef7594 100644
--- a/lld/COFF/Chunks.cpp
+++ b/lld/COFF/Chunks.cpp
@@ -773,7 +773,7 @@ uint32_t SectionChunk::getSectionNumber() const {
return s.getIndex() + 1;
}
-CommonChunk::CommonChunk(const COFFSymbolRef s) : sym(s) {
+CommonChunk::CommonChunk(const COFFSymbolRef s) : active(false), sym(s) {
// The value of a common symbol is its size. Align all common symbols smaller
// than 32 bytes naturally, i.e. round the size up to the next power of two.
// This is what MSVC link.exe does.
diff --git a/lld/COFF/Chunks.h b/lld/COFF/Chunks.h
index 7ba58e336451f..bf05d547f9c88 100644
--- a/lld/COFF/Chunks.h
+++ b/lld/COFF/Chunks.h
@@ -520,6 +520,8 @@ class CommonChunk : public NonSectionChunk {
uint32_t getOutputCharacteristics() const override;
StringRef getSectionName() const override { return ".bss"; }
+ bool active;
+
private:
const COFFSymbolRef sym;
};
diff --git a/lld/COFF/Symbols.h b/lld/COFF/Symbols.h
index 465d4df52c630..e166329a66bdf 100644
--- a/lld/COFF/Symbols.h
+++ b/lld/COFF/Symbols.h
@@ -233,6 +233,8 @@ class DefinedCommon : public DefinedCOFF {
CommonChunk *c = nullptr)
: DefinedCOFF(DefinedCommonKind, f, n, s), data(c), size(size) {
this->isExternal = true;
+ if (c)
+ c->active = true;
}
static bool classof(const Symbol *s) {
diff --git a/lld/COFF/Writer.cpp b/lld/COFF/Writer.cpp
index 3d95d219a493c..e365eb140f52b 100644
--- a/lld/COFF/Writer.cpp
+++ b/lld/COFF/Writer.cpp
@@ -1093,6 +1093,10 @@ void Writer::createSections() {
sc->printDiscardedMessage();
continue;
}
+ if (auto *cc = dyn_cast<CommonChunk>(c)) {
+ if (!cc->active)
+ continue;
+ }
StringRef name = c->getSectionName();
if (shouldStripSectionSuffix(sc, name, ctx.config.mingw))
name = name.split('$').first;
|
@llvm/pr-subscribers-lld-coff Author: Joshua Cranmer (jcranmer-intel) ChangesThis fixes issue 162148. Common symbols are intended to have only a single version of the data present in the final executable. The MSVC linker is able to successfully deduplicate these chunks. If you have an application with a large number of translation units with a large block of common data (this is possible, for example, with Fortran code), then failing to deduplicate these chunks can make the data size so large that the resulting executable fails to load. The logic in this patch doesn't catch all of the potential cases for deduplication, but it should catch the most common ones. Full diff: https://github.com/llvm/llvm-project/pull/162553.diff 4 Files Affected:
diff --git a/lld/COFF/Chunks.cpp b/lld/COFF/Chunks.cpp
index ff3c89884c24d..d752a5bef7594 100644
--- a/lld/COFF/Chunks.cpp
+++ b/lld/COFF/Chunks.cpp
@@ -773,7 +773,7 @@ uint32_t SectionChunk::getSectionNumber() const {
return s.getIndex() + 1;
}
-CommonChunk::CommonChunk(const COFFSymbolRef s) : sym(s) {
+CommonChunk::CommonChunk(const COFFSymbolRef s) : active(false), sym(s) {
// The value of a common symbol is its size. Align all common symbols smaller
// than 32 bytes naturally, i.e. round the size up to the next power of two.
// This is what MSVC link.exe does.
diff --git a/lld/COFF/Chunks.h b/lld/COFF/Chunks.h
index 7ba58e336451f..bf05d547f9c88 100644
--- a/lld/COFF/Chunks.h
+++ b/lld/COFF/Chunks.h
@@ -520,6 +520,8 @@ class CommonChunk : public NonSectionChunk {
uint32_t getOutputCharacteristics() const override;
StringRef getSectionName() const override { return ".bss"; }
+ bool active;
+
private:
const COFFSymbolRef sym;
};
diff --git a/lld/COFF/Symbols.h b/lld/COFF/Symbols.h
index 465d4df52c630..e166329a66bdf 100644
--- a/lld/COFF/Symbols.h
+++ b/lld/COFF/Symbols.h
@@ -233,6 +233,8 @@ class DefinedCommon : public DefinedCOFF {
CommonChunk *c = nullptr)
: DefinedCOFF(DefinedCommonKind, f, n, s), data(c), size(size) {
this->isExternal = true;
+ if (c)
+ c->active = true;
}
static bool classof(const Symbol *s) {
diff --git a/lld/COFF/Writer.cpp b/lld/COFF/Writer.cpp
index 3d95d219a493c..e365eb140f52b 100644
--- a/lld/COFF/Writer.cpp
+++ b/lld/COFF/Writer.cpp
@@ -1093,6 +1093,10 @@ void Writer::createSections() {
sc->printDiscardedMessage();
continue;
}
+ if (auto *cc = dyn_cast<CommonChunk>(c)) {
+ if (!cc->active)
+ continue;
+ }
StringRef name = c->getSectionName();
if (shouldStripSectionSuffix(sc, name, ctx.config.mingw))
name = name.split('$').first;
|
I apologize for the lack of a test with this change--I'm not exactly sure how to go about writing an LLD test for linking together multiple objects, and would like some guidance on how to do so. (I do have some source test files from #162148, though). |
If you need multiple input files, you can use split-file, or stick extra files in the Inputs/ directory. There are a lot of examples of both. |
This fixes issue 162148. Common symbols are intended to have only a single version of the data present in the final executable. The MSVC linker is able to successfully deduplicate these chunks. If you have an application with a large number of translation units with a large block of common data (this is possible, for example, with Fortran code), then failing to deduplicate these chunks can make the data size so large that the resulting executable fails to load. The logic in this patch doesn't catch all of the potential cases for deduplication, but it should catch the most common ones.
ed74c23
to
543500a
Compare
I've included a test based on split-file now. Let me know if you need more comprehensive testing for this change. |
This fixes issue 162148.
Common symbols are intended to have only a single version of the data present in the final executable. The MSVC linker is able to successfully deduplicate these chunks. If you have an application with a large number of translation units with a large block of common data (this is possible, for example, with Fortran code), then failing to deduplicate these chunks can make the data size so large that the resulting executable fails to load.
The logic in this patch doesn't catch all of the potential cases for deduplication, but it should catch the most common ones.