Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ELF] Merge verdefIndex into versionId. NFC #72208

Merged
merged 1 commit into from
Nov 14, 2023
Merged

Conversation

MaskRay
Copy link
Member

@MaskRay MaskRay commented Nov 14, 2023

The two fields are similar.

versionId is the Verdef index in the output file. It is set for
--exclude-id=, version script patterns, and sym@ver symbols.

verdefIndex is the Verdef index of a Sharedfile (SharedSymbol or a
copy-relocated Defined), the default value -1 is also used to indicate
that the symbol has not been matched by a version script pattern
(https://reviews.llvm.org/D65716).

It seems confusing to have two fields. Merge them so that we can
allocate one bit for #70130 (suppress --no-allow-shlib-undefined
error in the presence of a DSO definition).

The two fields are similar.

`versionId` is the Verdef index in the output file. It is set for
version script patterns and `sym@ver` symbols.

`verdefIndex` is the Verdef index of a SharedSymbol. The default value
-1 is also used to indicate that the symbol has not been matched by a
version script pattern (https://reviews.llvm.org/D65716).

It seems confusing to have two fields. Merge them so that we can
allocate one bit for llvm#70130 (suppress --no-allow-shlib-undefined
error in the presence of a DSO definition).
@llvmbot
Copy link
Collaborator

llvmbot commented Nov 14, 2023

@llvm/pr-subscribers-lld-elf

@llvm/pr-subscribers-lld

Author: Fangrui Song (MaskRay)

Changes

The two fields are similar.

versionId is the Verdef index in the output file. It is set for
version script patterns and sym@<!-- -->ver symbols.

verdefIndex is the Verdef index of a SharedSymbol. The default value
-1 is also used to indicate that the symbol has not been matched by a
version script pattern (https://reviews.llvm.org/D65716).

It seems confusing to have two fields. Merge them so that we can
allocate one bit for #70130 (suppress --no-allow-shlib-undefined
error in the presence of a DSO definition).


Full diff: https://github.com/llvm/llvm-project/pull/72208.diff

5 Files Affected:

  • (modified) lld/ELF/InputFiles.cpp (+2-2)
  • (modified) lld/ELF/Relocations.cpp (-1)
  • (modified) lld/ELF/SymbolTable.cpp (+5-7)
  • (modified) lld/ELF/Symbols.h (+5-5)
  • (modified) lld/ELF/SyntheticSections.cpp (+4-6)
diff --git a/lld/ELF/InputFiles.cpp b/lld/ELF/InputFiles.cpp
index 8c7f2c8773f2cbc..4b4d7d6db93cd57 100644
--- a/lld/ELF/InputFiles.cpp
+++ b/lld/ELF/InputFiles.cpp
@@ -1546,7 +1546,7 @@ template <class ELFT> void SharedFile::parse() {
           SharedSymbol{*this, name, sym.getBinding(), sym.st_other,
                        sym.getType(), sym.st_value, sym.st_size, alignment});
       if (s->file == this)
-        s->verdefIndex = ver;
+        s->versionId = ver;
     }
 
     // Also add the symbol with the versioned name to handle undefined symbols
@@ -1563,7 +1563,7 @@ template <class ELFT> void SharedFile::parse() {
         SharedSymbol{*this, saver().save(name), sym.getBinding(), sym.st_other,
                      sym.getType(), sym.st_value, sym.st_size, alignment});
     if (s->file == this)
-      s->verdefIndex = idx;
+      s->versionId = idx;
   }
 }
 
diff --git a/lld/ELF/Relocations.cpp b/lld/ELF/Relocations.cpp
index 6765461805dadb0..d4685ce70ece8cb 100644
--- a/lld/ELF/Relocations.cpp
+++ b/lld/ELF/Relocations.cpp
@@ -309,7 +309,6 @@ static void replaceWithDefined(Symbol &sym, SectionBase &sec, uint64_t value,
           size, &sec)
       .overwrite(sym);
 
-  sym.verdefIndex = old.verdefIndex;
   sym.exportDynamic = true;
   sym.isUsedInRegularObj = true;
   // A copy relocated alias may need a GOT entry.
diff --git a/lld/ELF/SymbolTable.cpp b/lld/ELF/SymbolTable.cpp
index fe7edd5b0eb49aa..b3d97e4de779068 100644
--- a/lld/ELF/SymbolTable.cpp
+++ b/lld/ELF/SymbolTable.cpp
@@ -92,7 +92,6 @@ Symbol *SymbolTable::insert(StringRef name) {
   memset(sym, 0, sizeof(Symbol));
   sym->setName(name);
   sym->partition = 1;
-  sym->verdefIndex = -1;
   sym->versionId = VER_NDX_GLOBAL;
   if (pos != StringRef::npos)
     sym->hasVersionSuffix = true;
@@ -235,10 +234,9 @@ bool SymbolTable::assignExactVersion(SymbolVersion ver, uint16_t versionId,
         sym->getName().contains('@'))
       continue;
 
-    // If the version has not been assigned, verdefIndex is -1. Use an arbitrary
-    // number (0) to indicate the version has been assigned.
-    if (sym->verdefIndex == uint16_t(-1)) {
-      sym->verdefIndex = 0;
+    // If the version has not been assigned, assign versionId to the symbol.
+    if (!sym->versionScriptAssigned) {
+      sym->versionScriptAssigned = true;
       sym->versionId = versionId;
     }
     if (sym->versionId == versionId)
@@ -256,8 +254,8 @@ void SymbolTable::assignWildcardVersion(SymbolVersion ver, uint16_t versionId,
   // so we set a version to a symbol only if no version has been assigned
   // to the symbol. This behavior is compatible with GNU.
   for (Symbol *sym : findAllByVersion(ver, includeNonDefault))
-    if (sym->verdefIndex == uint16_t(-1)) {
-      sym->verdefIndex = 0;
+    if (!sym->versionScriptAssigned) {
+      sym->versionScriptAssigned = true;
       sym->versionId = versionId;
     }
 }
diff --git a/lld/ELF/Symbols.h b/lld/ELF/Symbols.h
index 4addb79d1257914..0c79efccb548680 100644
--- a/lld/ELF/Symbols.h
+++ b/lld/ELF/Symbols.h
@@ -313,11 +313,12 @@ class Symbol {
   uint32_t auxIdx;
   uint32_t dynsymIndex;
 
-  // This field is a index to the symbol's version definition.
-  uint16_t verdefIndex;
-
-  // Version definition index.
+  // For a Defined symbol, this represents the Verdef index (VER_NDX_LOCAL,
+  // VER_NDX_GLOBAL, or a named version). For a SharedSymbol, this represents
+  // the Verdef index within the input DSO, which will be converted to a Verneed
+  // index in the output.
   uint16_t versionId;
+  uint8_t versionScriptAssigned : 1;
 
   void setFlags(uint16_t bits) {
     flags.fetch_or(bits, std::memory_order_relaxed);
@@ -357,7 +358,6 @@ class Defined : public Symbol {
   }
   void overwrite(Symbol &sym) const {
     Symbol::overwrite(sym, DefinedKind);
-    sym.verdefIndex = -1;
     auto &s = static_cast<Defined &>(sym);
     s.value = value;
     s.size = size;
diff --git a/lld/ELF/SyntheticSections.cpp b/lld/ELF/SyntheticSections.cpp
index 0f7ebf9d7ba840b..2b32eb3a0fe3558 100644
--- a/lld/ELF/SyntheticSections.cpp
+++ b/lld/ELF/SyntheticSections.cpp
@@ -3140,10 +3140,8 @@ bool VersionTableSection::isNeeded() const {
 
 void elf::addVerneed(Symbol *ss) {
   auto &file = cast<SharedFile>(*ss->file);
-  if (ss->verdefIndex == VER_NDX_GLOBAL) {
-    ss->versionId = VER_NDX_GLOBAL;
+  if (ss->versionId == VER_NDX_GLOBAL)
     return;
-  }
 
   if (file.vernauxs.empty())
     file.vernauxs.resize(file.verdefs.size());
@@ -3152,10 +3150,10 @@ void elf::addVerneed(Symbol *ss) {
   // already allocated one. The verdef identifiers cover the range
   // [1..getVerDefNum()]; this causes the vernaux identifiers to start from
   // getVerDefNum()+1.
-  if (file.vernauxs[ss->verdefIndex] == 0)
-    file.vernauxs[ss->verdefIndex] = ++SharedFile::vernauxNum + getVerDefNum();
+  if (file.vernauxs[ss->versionId] == 0)
+    file.vernauxs[ss->versionId] = ++SharedFile::vernauxNum + getVerDefNum();
 
-  ss->versionId = file.vernauxs[ss->verdefIndex];
+  ss->versionId = file.vernauxs[ss->versionId];
 }
 
 template <class ELFT>

Copy link
Collaborator

@smithp35 smithp35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me. May be worth waiting a bit to see if any other reviewers have any comments too.

uint16_t versionId;
uint8_t versionScriptAssigned : 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to move this below thunkAccessed as I think that would mean that bit could be part of the uint8_t container (I count 7 bits so far). Or is the intention to start a new container.

Copy link
Member

@arichardson arichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change LGTM, but I also believe moving the bitfield to be part of the previous uint8 would be possible. Unless you already have a follow-up patch that makes use of that bit?

@MaskRay
Copy link
Member Author

MaskRay commented Nov 14, 2023

Thanks for the review.

Would it be better to move this below thunkAccessed as I think that would mean that bit could be part of the uint8_t container (I count 7 bits so far). Or is the intention to start a new container.

Unfortunately no, thunkAccessed is the last bit before byte offset 26.

(ccls-member-hierarchy)

a

@wolfy1961
Copy link
Collaborator

For some reason this is causing the test compiler-rt/test/fuzzer/gc-sections.test to fail on a vanilla Ubuntu 22.04 x86_64 machine with an asan failure:

INFO: Seed: 3367176086
INFO: Loaded 1 modules   (1 inline 8-bit counters): 1 [0x5653fa023928, 0x5653fa023929),
INFO: Loaded 1 PC tables (1 PCs): 1 [0x5653fa023930,0x5653fa023940),
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: A corpus is not provided, starting from an empty corpus
=================================================================
==1817959==ERROR: AddressSanitizer: alloc-dealloc-mismatch (malloc vs operator delete) on 0x516000000380
    #0 0x5653fa01b257  (/home/test/build/llvm/llvm-RelWithDebInfo/projects/compiler-rt/test/fuzzer/X86_64DefaultLinuxConfig/Output/gc-sections.test.tmp+0x1f1257)
    ...
    0x516000000380 is located 0 bytes inside of 513-byte region [0x516000000380,0x516000000581)
allocated by thread T0 here:
    #0 0x5653f9fdef1f  (/home/test/build/llvm/llvm-RelWithDebInfo/projects/compiler-rt/test/fuzzer/X86_64DefaultLinuxConfig/Output/gc-sections.test.tmp+0x1b4f1f)
    ...

Strangely, I don't see it reflected on any other buildbot anywhere, but some investigation shows the following:

The patch somehow seems to prevent asan instrumentation from intercepting operator new (allocating a string buffer in a fuzzer routine). Asan is only able to intercept the call to malloc(), and so it attributes it as being allocated by malloc. When the buffer is deallocated at the end of the routine, it is deallocated with operator delete, which asan treats as a mismatch of allocation types.
When I generate the same executable with a compiler/linker built from the commit immediately preceding this one, asan does intercept operator new and the deallocation type matches the allocation type (i.e. from new).

@arichardson
Copy link
Member

Thanks for the review.

Would it be better to move this below thunkAccessed as I think that would mean that bit could be part of the uint8_t container (I count 7 bits so far). Or is the intention to start a new container.

Unfortunately no, thunkAccessed is the last bit before byte offset 26.

(ccls-member-hierarchy) a

Ah yes, I failed to scroll up far enough!

@glandium
Copy link
Contributor

This is also breaking building clang itself with instrumentation for PGO:

[4032/4536] : && /builds/worker/fetches/llvm-project/build/stage2/clang/bin/clang++ --sysroot=/builds/worker/fetches/sysroot -fPIC -fPIC -Qunused-arguments -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -Wno-nested-anon-types -O3 -DNDEBUG  -Wl,-z,nodelete -fuse-ld=lld -Wl,--color-diagnostics   -Wl,--gc-sections  -Wl,--version-script,"/builds/worker/fetches/llvm-project/build/stage3/build/tools/clang/lib/Analysis/plugins/SampleAnalyzer/SampleAnalyzerPlugin.exports" -shared  -o lib/SampleAnalyzerPlugin.so tools/clang/lib/Analysis/plugins/SampleAnalyzer/CMakeFiles/SampleAnalyzerPlugin.dir/MainCallChecker.cpp.o  -Wl,-rpath,"\$ORIGIN/../lib:/builds/worker/fetches/llvm-project/build/stage3/build/lib"  lib/libclang-cpp.so.18git  lib/libLLVM-18git.so && :
FAILED: lib/SampleAnalyzerPlugin.so 
: && /builds/worker/fetches/llvm-project/build/stage2/clang/bin/clang++ --sysroot=/builds/worker/fetches/sysroot -fPIC -fPIC -Qunused-arguments -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -Wno-nested-anon-types -O3 -DNDEBUG  -Wl,-z,nodelete -fuse-ld=lld -Wl,--color-diagnostics   -Wl,--gc-sections  -Wl,--version-script,"/builds/worker/fetches/llvm-project/build/stage3/build/tools/clang/lib/Analysis/plugins/SampleAnalyzer/SampleAnalyzerPlugin.exports" -shared  -o lib/SampleAnalyzerPlugin.so tools/clang/lib/Analysis/plugins/SampleAnalyzer/CMakeFiles/SampleAnalyzerPlugin.dir/MainCallChecker.cpp.o  -Wl,-rpath,"\$ORIGIN/../lib:/builds/worker/fetches/llvm-project/build/stage3/build/lib"  lib/libclang-cpp.so.18git  lib/libLLVM-18git.so && :
ld.lld: error: corrupt input file: version definition index 2 for symbol _end is out of bounds
>>> defined in lib/libclang-cpp.so.18git

ld.lld: error: corrupt input file: version definition index 2 for symbol __bss_start is out of bounds
>>> defined in lib/libclang-cpp.so.18git

ld.lld: error: corrupt input file: version definition index 2 for symbol _edata is out of bounds
>>> defined in lib/libclang-cpp.so.18git
clang++: error: linker command failed with exit code 1 (use -v to see invocation)

MaskRay added a commit that referenced this pull request Nov 16, 2023
MaskRay added a commit that referenced this pull request Nov 16, 2023
Reverts #72208

If a unversioned Defined preempts a versioned DSO definition, the
version ID will not be reset.
@MaskRay
Copy link
Member Author

MaskRay commented Nov 16, 2023

For some reason this is causing the test compiler-rt/test/fuzzer/gc-sections.test to fail on a vanilla Ubuntu 22.04 x86_64 machine with an asan failure:

INFO: Seed: 3367176086
INFO: Loaded 1 modules   (1 inline 8-bit counters): 1 [0x5653fa023928, 0x5653fa023929),
INFO: Loaded 1 PC tables (1 PCs): 1 [0x5653fa023930,0x5653fa023940),
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: A corpus is not provided, starting from an empty corpus
=================================================================
==1817959==ERROR: AddressSanitizer: alloc-dealloc-mismatch (malloc vs operator delete) on 0x516000000380
    #0 0x5653fa01b257  (/home/test/build/llvm/llvm-RelWithDebInfo/projects/compiler-rt/test/fuzzer/X86_64DefaultLinuxConfig/Output/gc-sections.test.tmp+0x1f1257)
    ...
    0x516000000380 is located 0 bytes inside of 513-byte region [0x516000000380,0x516000000581)
allocated by thread T0 here:
    #0 0x5653f9fdef1f  (/home/test/build/llvm/llvm-RelWithDebInfo/projects/compiler-rt/test/fuzzer/X86_64DefaultLinuxConfig/Output/gc-sections.test.tmp+0x1b4f1f)
    ...

Strangely, I don't see it reflected on any other buildbot anywhere, but some investigation shows the following:

The patch somehow seems to prevent asan instrumentation from intercepting operator new (allocating a string buffer in a fuzzer routine). Asan is only able to intercept the call to malloc(), and so it attributes it as being allocated by malloc. When the buffer is deallocated at the end of the routine, it is deallocated with operator delete, which asan treats as a mismatch of allocation types. When I generate the same executable with a compiler/linker built from the commit immediately preceding this one, asan does intercept operator new and the deallocation type matches the allocation type (i.e. from new).

Thanks for the report and sorry that I just saw this. I've reverted it in e845754

If a unversioned Defined preempts a versioned DSO definition, the version ID will not be reset.

There is a case I didn't capture in the improved test obj-preempt-dso.s

% fld.lld @response.txt -o bad -y _ZnamSt11align_val_tRKSt9nothrow_t
usr/lib/gcc/x86_64-linux-gnu/13/libstdc++.so: shared definition of _ZnamSt11align_val_tRKSt9nothrow_t
tmp/Rel/lib/clang/18/lib/x86_64-unknown-linux-gnu/libclang_rt.asan_cxx.a(../../../../../projects/compiler-rt/lib/asan/CMakeFiles/RTAsan_cxx.x86_64.dir/asan_new_delete.cpp.o): definition of _ZnamSt11align_val_tRKSt9nothrow_t
% readelf -Ws usr/lib/gcc/x86_64-linux-gnu/13/libstdc++.so | grep _ZdaPvRKSt9nothrow_t
  1980: 00000000000aeaa0     9 FUNC    GLOBAL DEFAULT   13 _ZdaPvRKSt9nothrow_t@@GLIBCXX_3.4

Improved the test in 3510c48

Relanded in 255ea48

MaskRay added a commit that referenced this pull request Nov 16, 2023
The new RUN line `ld.lld --version-script=a.ver b.so a.o -o a1`
`c3` would be able to caught `replaceWithDefined` bug in
commit 667ea2c (#72208).
MaskRay added a commit to MaskRay/llvm-project that referenced this pull request Nov 16, 2023
The two fields are similar.

`versionId` is the Verdef index in the output file. It is set for
version script patterns and `sym@ver` symbols.

`verdefIndex` is the Verdef index of a SharedSymbol. The default value
-1 is also used to indicate that the symbol has not been matched by a
version script pattern (https://reviews.llvm.org/D65716).

It seems confusing to have two fields. Merge them so that we can
allocate one bit for llvm#70130 (suppress --no-allow-shlib-undefined
error in the presence of a DSO definition).
MaskRay added a commit that referenced this pull request Nov 16, 2023
The two fields are similar.

`versionId` is the Verdef index in the output file. It is set for
`--exclude-id=`, version script patterns, and `sym@ver` symbols.

`verdefIndex` is the Verdef index of a Sharedfile (SharedSymbol or a
copy-relocated Defined), the default value -1 is also used to indicate
that the symbol has not been matched by a version script pattern
(https://reviews.llvm.org/D65716).

It seems confusing to have two fields. Merge them so that we can
allocate one bit for #70130 (suppress --no-allow-shlib-undefined
error in the presence of a DSO definition).
sr-tream pushed a commit to sr-tream/llvm-project that referenced this pull request Nov 20, 2023
…#72484)

Reverts llvm#72208

If a unversioned Defined preempts a versioned DSO definition, the
version ID will not be reset.
sr-tream pushed a commit to sr-tream/llvm-project that referenced this pull request Nov 20, 2023
The new RUN line `ld.lld --version-script=a.ver b.so a.o -o a1`
`c3` would be able to caught `replaceWithDefined` bug in
commit 667ea2c (llvm#72208).
sr-tream pushed a commit to sr-tream/llvm-project that referenced this pull request Nov 20, 2023
The two fields are similar.

`versionId` is the Verdef index in the output file. It is set for
`--exclude-id=`, version script patterns, and `sym@ver` symbols.

`verdefIndex` is the Verdef index of a Sharedfile (SharedSymbol or a
copy-relocated Defined), the default value -1 is also used to indicate
that the symbol has not been matched by a version script pattern
(https://reviews.llvm.org/D65716).

It seems confusing to have two fields. Merge them so that we can
allocate one bit for llvm#70130 (suppress --no-allow-shlib-undefined
error in the presence of a DSO definition).
zahiraam pushed a commit to zahiraam/llvm-project that referenced this pull request Nov 20, 2023
The two fields are similar.

`versionId` is the Verdef index in the output file. It is set for
version script patterns and `sym@ver` symbols.

`verdefIndex` is the Verdef index of a SharedSymbol. The default value
-1 is also used to indicate that the symbol has not been matched by a
version script pattern (https://reviews.llvm.org/D65716).

It seems confusing to have two fields. Merge them so that we can
allocate one bit for llvm#70130 (suppress --no-allow-shlib-undefined
error in the presence of a DSO definition).
zahiraam pushed a commit to zahiraam/llvm-project that referenced this pull request Nov 20, 2023
…#72484)

Reverts llvm#72208

If a unversioned Defined preempts a versioned DSO definition, the
version ID will not be reset.
zahiraam pushed a commit to zahiraam/llvm-project that referenced this pull request Nov 20, 2023
The new RUN line `ld.lld --version-script=a.ver b.so a.o -o a1`
`c3` would be able to caught `replaceWithDefined` bug in
commit 667ea2c (llvm#72208).
zahiraam pushed a commit to zahiraam/llvm-project that referenced this pull request Nov 20, 2023
The two fields are similar.

`versionId` is the Verdef index in the output file. It is set for
`--exclude-id=`, version script patterns, and `sym@ver` symbols.

`verdefIndex` is the Verdef index of a Sharedfile (SharedSymbol or a
copy-relocated Defined), the default value -1 is also used to indicate
that the symbol has not been matched by a version script pattern
(https://reviews.llvm.org/D65716).

It seems confusing to have two fields. Merge them so that we can
allocate one bit for llvm#70130 (suppress --no-allow-shlib-undefined
error in the presence of a DSO definition).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants