Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ELF] Fix unnecessary inclusion of unreferenced provide symbols #84512

Closed
wants to merge 1 commit into from

Conversation

partaror
Copy link
Contributor

@partaror partaror commented Mar 8, 2024

Previously, linker was unnecessarily including a PROVIDE symbol which was referenced by another unused PROVIDE symbol. For example, if a linker script contained the below code and 'not_used_sym' provide symbol is not included, then linker was still unnecessarily including 'foo' PROVIDE symbol because it was referenced by 'not_used_sym'. This commit fixes this behavior.

PROVIDE(not_used_sym = foo)
PROVIDE(foo = 0x1000)

This commit fixes this behavior by using dfs-like algorithm to find all the symbols referenced in provide expressions of included provide symbols.

This commit also fixes the issue of unused section not being garbage-collected if a symbol of the section is referenced by an unused PROVIDE symbol.

Closes #74771
Closes #84730

Copy link

github-actions bot commented Mar 8, 2024

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 8, 2024

@llvm/pr-subscribers-lld

@llvm/pr-subscribers-lld-elf

Author: None (partaror)

Changes

Previously, linker was unnecessarily including a PROVIDE symbol which was referenced by another unused PROVIDE symbol. For example, if a linker script contained the below code and 'not_used_sym' provide symbol is not included, then linker was still unnecessarily including 'foo' PROVIDE symbol because it was referenced by 'not_used_sym'. This commit fixes this behavior.

PROVIDE(not_used_sym = foo)
PROVIDE(foo = 0x1000)

This commit fixes this behavior by using dfs-like algorithm to find all the symbols referenced in provide expressions of included provide symbols.

Closes #74771


Full diff: https://github.com/llvm/llvm-project/pull/84512.diff

4 Files Affected:

  • (modified) lld/ELF/Driver.cpp (+51-7)
  • (modified) lld/ELF/LinkerScript.h (+10)
  • (modified) lld/ELF/ScriptParser.cpp (+12-1)
  • (modified) lld/test/ELF/linkerscript/symbolreferenced.s (+16-3)
diff --git a/lld/ELF/Driver.cpp b/lld/ELF/Driver.cpp
index 24faa1753f1e3d..fff81d17755945 100644
--- a/lld/ELF/Driver.cpp
+++ b/lld/ELF/Driver.cpp
@@ -2659,6 +2659,56 @@ static void postParseObjectFile(ELFFileBase *file) {
   }
 }
 
+// Returns true if the provide symbol should be added to the link.
+bool shouldAddProvideSym(StringRef symName) {
+  Symbol *b = symtab.find(symName);
+  return b && !b->isDefined() && !b->isCommon();
+}
+
+// Add symbols referred by the provide symbol to the symbol table.
+// This function must only be called for provide symbols that should be added
+// to the link.
+static void
+addProvideSymReferences(StringRef provideSym,
+                        llvm::StringSet<> &addedRefsFromProvideSym) {
+
+  if (addedRefsFromProvideSym.count(provideSym))
+    return;
+  assert((shouldAddProvideSym(provideSym) || !symtab.find(provideSym)) &&
+         "This function must only be called for provide symbols that should be "
+         "added to the link.");
+  addedRefsFromProvideSym.insert(provideSym);
+  for (StringRef name : script->provideMap[provideSym].keys()) {
+    script->referencedSymbols.push_back(name);
+    if (script->provideMap.count(name) &&
+        (shouldAddProvideSym(name) || !symtab.find(name)) &&
+        !addedRefsFromProvideSym.count(name))
+      addProvideSymReferences(name, addedRefsFromProvideSym);
+  }
+}
+
+// Add symbols that are referenced in the linker script.
+// Symbols referenced in a PROVIDE command are only added to the symbol table if
+// the PROVIDE command actually provides the symbol.
+static void addScriptReferencedSymbols() {
+  // Keeps track of references from which PROVIDE symbols have been added to the
+  // symbol table.
+  llvm::StringSet<> addedRefsFromProvideSym;
+  for (StringRef provideSym : script->provideMap.keys()) {
+    if (!addedRefsFromProvideSym.count(provideSym) &&
+        shouldAddProvideSym(provideSym))
+      addProvideSymReferences(provideSym, addedRefsFromProvideSym);
+  }
+
+  // Some symbols (such as __ehdr_start) are defined lazily only when there
+  // are undefined symbols for them, so we add these to trigger that logic.
+  for (StringRef name : script->referencedSymbols) {
+    Symbol *sym = addUnusedUndefined(name);
+    sym->isUsedInRegularObj = true;
+    sym->referenced = true;
+  }
+}
+
 // Do actual linking. Note that when this function is called,
 // all linker scripts have already been parsed.
 void LinkerDriver::link(opt::InputArgList &args) {
@@ -2735,13 +2785,7 @@ void LinkerDriver::link(opt::InputArgList &args) {
   config->hasDynSymTab =
       !ctx.sharedFiles.empty() || config->isPic || config->exportDynamic;
 
-  // Some symbols (such as __ehdr_start) are defined lazily only when there
-  // are undefined symbols for them, so we add these to trigger that logic.
-  for (StringRef name : script->referencedSymbols) {
-    Symbol *sym = addUnusedUndefined(name);
-    sym->isUsedInRegularObj = true;
-    sym->referenced = true;
-  }
+  addScriptReferencedSymbols();
 
   // Prevent LTO from removing any definition referenced by -u.
   for (StringRef name : config->undefined)
diff --git a/lld/ELF/LinkerScript.h b/lld/ELF/LinkerScript.h
index 18eaf58b785e37..9cd2d3f273d70f 100644
--- a/lld/ELF/LinkerScript.h
+++ b/lld/ELF/LinkerScript.h
@@ -16,7 +16,9 @@
 #include "llvm/ADT/ArrayRef.h"
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/MapVector.h"
+#include "llvm/ADT/StringMap.h"
 #include "llvm/ADT/StringRef.h"
+#include "llvm/ADT/StringSet.h"
 #include "llvm/Support/Compiler.h"
 #include <cstddef>
 #include <cstdint>
@@ -379,6 +381,14 @@ class LinkerScript final {
 
   // Sections that will be warned/errored by --orphan-handling.
   SmallVector<const InputSectionBase *, 0> orphanSections;
+
+  // Stores the mapping: provide symbol -> symbols referred in the provide
+  // expression. For example, if the PROVIDE command is:
+  //
+  // PROVIDE(v = a + b + c);
+  //
+  // then provideMap should contain the mapping: 'v' -> ['a', 'b', 'c']
+  llvm::StringMap<llvm::StringSet<>> provideMap;
 };
 
 LLVM_LIBRARY_VISIBILITY extern std::unique_ptr<LinkerScript> script;
diff --git a/lld/ELF/ScriptParser.cpp b/lld/ELF/ScriptParser.cpp
index 3bb1de99480f30..e9953f44156e31 100644
--- a/lld/ELF/ScriptParser.cpp
+++ b/lld/ELF/ScriptParser.cpp
@@ -36,6 +36,7 @@
 #include "llvm/Support/TimeProfiler.h"
 #include <cassert>
 #include <limits>
+#include <optional>
 #include <vector>
 
 using namespace llvm;
@@ -138,6 +139,10 @@ class ScriptParser final : ScriptLexer {
 
   // A set to detect an INCLUDE() cycle.
   StringSet<> seen;
+
+  // If we are currently parsing a PROVIDE|PROVIDE_HIDDEN command,
+  // then this member is set to the provide symbol name.
+  std::optional<llvm::StringRef> activeProvideSym;
 };
 } // namespace
 
@@ -1055,6 +1060,9 @@ SymbolAssignment *ScriptParser::readProvideHidden(bool provide, bool hidden) {
       ;
     return nullptr;
   }
+  llvm::SaveAndRestore saveActiveProvideSym(activeProvideSym);
+  if (provide)
+    activeProvideSym = name;
   SymbolAssignment *cmd = readSymbolAssignment(name);
   cmd->provide = provide;
   cmd->hidden = hidden;
@@ -1570,7 +1578,10 @@ Expr ScriptParser::readPrimary() {
     tok = unquote(tok);
   else if (!isValidSymbolName(tok))
     setError("malformed number: " + tok);
-  script->referencedSymbols.push_back(tok);
+  if (activeProvideSym)
+    script->provideMap[activeProvideSym.value()].insert(tok);
+  else
+    script->referencedSymbols.push_back(tok);
   return [=] { return script->getSymbolValue(tok, location); };
 }
 
diff --git a/lld/test/ELF/linkerscript/symbolreferenced.s b/lld/test/ELF/linkerscript/symbolreferenced.s
index ba7a7721ea9697..587446b1dcb52a 100644
--- a/lld/test/ELF/linkerscript/symbolreferenced.s
+++ b/lld/test/ELF/linkerscript/symbolreferenced.s
@@ -23,9 +23,16 @@
 
 # CHECK:      0000000000001000 a f1
 # CHECK-NEXT: 0000000000001000 A f2
-# CHECK-NEXT: 0000000000001000 a g1
-# CHECK-NEXT: 0000000000001000 A g2
+# CHECK-NEXT: 0000000000001000 A f3
+# CHECK-NEXT: 0000000000001000 A f4
+# CHECK-NEXT: 0000000000001000 A f5
+# CHECK-NEXT: 0000000000001000 A f6
+# CHECK-NEXT: 0000000000001000 A f7
 # CHECK-NEXT: 0000000000001000 A newsym
+# CHECK-NOT: g1
+# CHECK-NOT: g2
+# CHECK-NOT: unused
+# CHECK-NOT: another_unused
 
 # RUN: not ld.lld -T chain2.t a.o 2>&1 | FileCheck %s --check-prefix=ERR --implicit-check-not=error:
 # ERR-COUNT-3: error: chain2.t:1: symbol not found: undef
@@ -40,13 +47,19 @@ patatino:
   movl newsym, %eax
 
 #--- chain.t
-PROVIDE(f2 = 0x1000);
+PROVIDE(f7 = 0x1000);
+PROVIDE(f5 = f6);
+PROVIDE(f6 = f7);
+PROVIDE(f4 = f5);
+PROVIDE(f3 = f4);
+PROVIDE(f2 = f3);
 PROVIDE_HIDDEN(f1 = f2);
 PROVIDE(newsym = f1);
 
 PROVIDE(g2 = 0x1000);
 PROVIDE_HIDDEN(g1 = g2);
 PROVIDE(unused = g1);
+PROVIDE_HIDDEN(another_unused = g1);
 
 #--- chain2.t
 PROVIDE(f2 = undef);

@partaror partaror force-pushed the provide-sym-references-fix branch 3 times, most recently from 1b58d0d to 51c2939 Compare March 11, 2024 09:05
@partaror
Copy link
Contributor Author

Gentle ping for review @MaskRay @rui314

Copy link
Member

@MaskRay MaskRay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch. The code structure looks really good.

lld/ELF/Driver.cpp Outdated Show resolved Hide resolved
// Keeps track of references from which PROVIDE symbols have been added to the
// symbol table.
llvm::StringSet<> addedRefsFromProvideSym;
for (StringRef provideSym : script->provideMap.keys()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://llvm.org/docs/CodingStandards.html#beware-of-non-determinism-due-to-ordering-of-pointers

Consider MapVector<StringRef, SmallVector<StringRef,0> > (the inner data structure doesn't have to be a Set, which is expensive)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the comment. I did not think about non-determinism previously. I have now changed the code to use MapVector<StringRef, SmallVector<StringRef, 0>>.

I want to add one thing: Set is relatively more expensive than vector, however, it can help in avoiding more expensive computations. For example, currently, we store symbols referenced in the linker script in SmallVector<llvm::StringRef, 0> LinkerScript::referencedSymbols;. We have to do some processing on each symbol present in this list, such as, adding the symbol to the symbol table. If a symbol S is referenced 10 times in the linker script, then it will be present in referencedSymbols 10 times as well and therefore it will be processed 10 times. In such a case, would it be better to use Set instead of a vector?

lld/test/ELF/gc-sections-with-provide.s Outdated Show resolved Hide resolved
lld/test/ELF/linkerscript/symbolreferenced.s Outdated Show resolved Hide resolved
lld/ELF/Driver.cpp Outdated Show resolved Hide resolved
lld/ELF/Driver.cpp Outdated Show resolved Hide resolved
Copy link

github-actions bot commented Mar 18, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@partaror
Copy link
Contributor Author

Gentle ping for review @MaskRay

lld/ELF/LinkerScript.cpp Outdated Show resolved Hide resolved
lld/ELF/LinkerScript.cpp Outdated Show resolved Hide resolved
lld/ELF/ScriptParser.cpp Outdated Show resolved Hide resolved

Symbol *SymbolTable::addUnusedUndefined(StringRef name, uint8_t binding) {
return addSymbol(Undefined{ctx.internalFile, name, binding, STV_DEFAULT, 0});
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensure EOL at file end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for catching it. Should we add insertNewlineAtEOF option to .clang-format?

lld/test/ELF/gc-sections-with-provide.s Outdated Show resolved Hide resolved
lld/ELF/LinkerScript.cpp Outdated Show resolved Hide resolved
static void
addProvideSymReferences(StringRef provideSym,
llvm::StringSet<> &addedRefsFromProvideSym) {
assert(LinkerScript::shouldAddProvideSym(provideSym) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can define the static function as a lambda in addScriptReferencedSymbolsToSymTable instead so that you can just reference the local variable addedRefsFromProvideSym.

Then this assert doesn't seem that useful since the code is very short.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This static function is recursive. Lambdas cannot be recursive without using more complicated mechanism such as std::function or adding an explicit parameter that would refer to the lambda in the lambda call operator. These techniques are explained here: Recursive lambda functions in C++14

Please give your suggestion regarding what should we do here. std::function has performance cost due to the use of type-erasure and passing the lambda to the lambda call (for ex, L(L, x, ...)) leads to (slightly) ugly code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am writing a version that uses a stack.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://github.com/MaskRay/llvm-project/tree/my-provide
Feel free to cherry pick it into this patch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for providing the stack-based implementation. I have cherry-picked your commit into this patch.

lld/test/ELF/gc-sections-with-provide.s Show resolved Hide resolved
@partaror partaror force-pushed the provide-sym-references-fix branch 3 times, most recently from 34a77c8 to 16fad83 Compare March 25, 2024 21:06
@partaror partaror requested a review from MaskRay March 25, 2024 21:08
Previously, linker was unnecessarily including a PROVIDE symbol which
was referenced by another unused PROVIDE symbol. For example, if a
linker script contained the below code and 'not_used_sym' provide symbol
is not included, then linker was still unnecessarily including 'foo' PROVIDE
symbol because it was referenced by 'not_used_sym'. This commit fixes
this behavior.

PROVIDE(not_used_sym = foo)
PROVIDE(foo = 0x1000)

This commit fixes this behavior by using dfs-like algorithm to find
all the symbols referenced in provide expressions of included provide
symbols.

This commit also fixes the issue of unused section not being garbage-collected
if a symbol of the section is referenced by an unused PROVIDE symbol.

Closes llvm#74771
Closes llvm#84730

Co-authored-by: Fangrui Song <i@maskray.me>
@MaskRay
Copy link
Member

MaskRay commented Mar 25, 2024

Congratulations on the first patch:) Very well prepared.

I am landing this patch manually as you have a hidden email address. This repository prefers a non-hidden email address
https://discourse.llvm.org/t/hidden-emails-on-github-should-we-do-something-about-it/74223/32
(otherwise, you won't even receive build bot notifications)

@MaskRay
Copy link
Member

MaskRay commented Mar 25, 2024

Landed as ebb326a

@MaskRay MaskRay closed this Mar 25, 2024
@partaror
Copy link
Contributor Author

Congratulations on the first patch:) Very well prepared.

Thank you :).

I am landing this patch manually as you have a hidden email address. This repository prefers a non-hidden email address

Thank you for informing about this. I will fix the hidden-email issue.

@statham-arm
Copy link
Collaborator

Hi @partaror ,

I think this commit has caused some link failures in edge cases. I have two related examples here which didn't fail before this commit landed, and do fail immediately after it.

Input file foo.c:

char *_start(void) {
    extern char var2[];
    extern int i;
    return var2 + i;
}

Input file bar.c:

int i;

Input file quux.c:

char var1[123];

Link script foo.ld:

ENTRY(_start)
PROVIDE(var1 = 0x1000);
PROVIDE(var2 = var1);

Link script quux.ld:

ENTRY(_start)
PROVIDE(var2 = var1);

Build commands:

clang --target=arm-none-eabi -march=armv7m -c foo.c bar.c quux.c
llvm-ar -rcs foo.a foo.o
llvm-ar -rcs quux.a foo.o quux.o
ld.lld -T foo.ld -o foo bar.o foo.a
ld.lld -T quux.ld -o quux bar.o quux.a

Both links fail with errors like this:

ld.lld: error: foo.ld:3: symbol not found: var1
ld.lld: error: foo.ld:3: symbol not found: var1
ld.lld: error: foo.ld:3: symbol not found: var1
ld.lld: error: quux.ld:2: symbol not found: var1
ld.lld: error: quux.ld:2: symbol not found: var1
ld.lld: error: quux.ld:2: symbol not found: var1

My analysis:

In both of these links, the entry point is in the object file foo.o, which isn't listed explicitly on the lld command line, but instead is loaded from a library. That object refers to var2, which is defined in the ld script via PROVIDE. lld successfully works out that it must turn on that provide statement, but it doesn't manage to pull in the transitive requirement of var1, which var2 is defined in terms of.

In the foo.ld example, var1 is another PROVIDE in the same ld script. The function that normally deals with transitive dependencies from one provide to another is LinkerScript::addScriptReferencedSymbolsToSymTable(), but in this case, I think it's already run by the time this dependency is discovered, so it's too late.

In the quux.ld example, var1 is defined in another library object file, and the library quux.a is not searched for the symbol.

@statham-arm statham-arm reopened this Apr 3, 2024
@MaskRay
Copy link
Member

MaskRay commented Apr 3, 2024

edit: Thanks for the report. -u _start can work around the case you reported. I think we shall handle --entry like -u, but there is some complexity because config->entry can be an integer.

I am testing a patch.

MaskRay added a commit to MaskRay/llvm-project that referenced this pull request Apr 3, 2024
When archive member extraction involving ENTRY happens after
`addScriptReferencedSymbolsToSymTable`,
`addScriptReferencedSymbolsToSymTable` may fail to define some PROVIDE
symbols used by ENTRY. This is an edge case that regressed after llvm#84512.
(The interaction with PROVIDE and ENTRY-in-archive was not considered
before).

Fixes: ebb326a
@MaskRay
Copy link
Member

MaskRay commented Apr 3, 2024

@statham-arm Created #87530 . Will it address your issue?

@partaror
Copy link
Contributor Author

partaror commented Apr 3, 2024

Hi @statham-arm Thank you for reporting the issue and detailed analysis. @MaskRay has the fix ready. Thank you @MaskRay for the quick resolution.

MaskRay added a commit that referenced this pull request Apr 4, 2024
When archive member extraction involving ENTRY happens after
`addScriptReferencedSymbolsToSymTable`,
`addScriptReferencedSymbolsToSymTable` may fail to define some PROVIDE
symbols used by ENTRY. This is an edge case that regressed after #84512.
(The interaction with PROVIDE and ENTRY-in-archive was not considered
before).

While here, also ensure that --undefined-glob extracted object files
are parsed before `addScriptReferencedSymbolsToSymTable`.

Fixes: ebb326a

Pull Request: #87530
@MaskRay MaskRay closed this Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ELF] Unused symbol is not being garbage-collected lld provide - provides more symbols than needed
4 participants