-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WebAssembly][Object]Use file offset as function symbol address for linked files #76198
Conversation
…inked files WebAssembly doesn't have a single virtual memory space the way other object formats or architectures do, so "addresses" mean different things depending on the context. Function symbol addresses in object files are offsets from the start of the code section. This is good for linking and relocation. However when dealing with linked binaries, offsets from the start of the file/module are more often used (e.g. for stack traces in browsers), and are more useful for use cases like binary size attribution. This PR changes Object to use the file offset instead of the section offset for function symbols, but only for linked (non-DSO) files.
@llvm/pr-subscribers-backend-webassembly @llvm/pr-subscribers-llvm-binary-utilities Author: Derek Schuff (dschuff) ChangesWebAssembly doesn't have a single virtual memory space the way other object formats or architectures do, so "addresses" mean different things depending on the context. This implements item number 4 from #76107 Full diff: https://github.com/llvm/llvm-project/pull/76198.diff 3 Files Affected:
diff --git a/llvm/lib/Object/WasmObjectFile.cpp b/llvm/lib/Object/WasmObjectFile.cpp
index dfe86a45df3227..99c6f1a8fbd4ee 100644
--- a/llvm/lib/Object/WasmObjectFile.cpp
+++ b/llvm/lib/Object/WasmObjectFile.cpp
@@ -1662,10 +1662,16 @@ Expected<StringRef> WasmObjectFile::getSymbolName(DataRefImpl Symb) const {
Expected<uint64_t> WasmObjectFile::getSymbolAddress(DataRefImpl Symb) const {
auto &Sym = getWasmSymbol(Symb);
if (Sym.Info.Kind == wasm::WASM_SYMBOL_TYPE_FUNCTION &&
- isDefinedFunctionIndex(Sym.Info.ElementIndex))
- return getDefinedFunction(Sym.Info.ElementIndex).CodeSectionOffset;
- else
- return getSymbolValue(Symb);
+ isDefinedFunctionIndex(Sym.Info.ElementIndex)) {
+ // For object files, use the section offset. For linked files, use the file
+ // offset
+ uint32_t Adjustment = isRelocatableObject() || isSharedObject()
+ ? 0
+ : Sections[CodeSection].Offset;
+ return getDefinedFunction(Sym.Info.ElementIndex).CodeSectionOffset +
+ Adjustment;
+ }
+ return getSymbolValue(Symb);
}
uint64_t WasmObjectFile::getWasmSymbolValue(const WasmSymbol &Sym) const {
diff --git a/llvm/test/tools/llvm-nm/wasm/linked.yaml b/llvm/test/tools/llvm-nm/wasm/linked.yaml
new file mode 100644
index 00000000000000..992c1811743b7a
--- /dev/null
+++ b/llvm/test/tools/llvm-nm/wasm/linked.yaml
@@ -0,0 +1,74 @@
+# RUN: yaml2obj %s -o %t.wasm
+# RUN: llvm-nm %t.wasm | FileCheck %s
+
+# CHECK: 0000009f T my_func_export
+# CHECK-NEXT: 0000002a D my_global_export
+# CHECK-NEXT: 00000000 D my_table_export
+
+--- !WASM
+FileHeader:
+ Version: 0x1
+Sections:
+ - Type: TYPE
+ Signatures:
+ - Index: 0
+ ParamTypes: []
+ ReturnTypes: []
+ - Type: IMPORT
+ Imports:
+ - Module: env
+ Field: foo
+ Kind: FUNCTION
+ SigIndex: 0
+ - Module: env
+ Field: bar
+ Kind: GLOBAL
+ GlobalType: I32
+ GlobalMutable: true
+ - Module: env
+ Field: memory
+ Kind: MEMORY
+ Memory:
+ Minimum: 0x1
+ - Type: FUNCTION
+ FunctionTypes: [ 0 ]
+ - Type: TABLE
+ Tables:
+ - Index: 0
+ ElemType: FUNCREF
+ Limits:
+ Flags: [ HAS_MAX ]
+ Minimum: 0x1
+ Maximum: 0x1
+ - Type: GLOBAL
+ Globals:
+ - Index: 1
+ Mutable: false
+ Type: I32
+ InitExpr:
+ Opcode: I32_CONST
+ Value: 42
+ - Type: EXPORT
+ Exports:
+ - Name: my_func_export
+ Kind: FUNCTION
+ Index: 1
+ - Name: my_global_export
+ Kind: GLOBAL
+ Index: 1
+ - Name: my_table_export
+ Kind: TABLE
+ Index: 0
+ - Type: CODE
+ Functions:
+ - Index: 1
+ Locals:
+ Body: 00
+ - Type: DATA
+ Segments:
+ - SectionOffset: 0
+ InitFlags: 0
+ Offset:
+ Opcode: I32_CONST
+ Value: 0
+ Content: ''
diff --git a/llvm/test/tools/llvm-objdump/wasm/linked-symbol-table.yaml b/llvm/test/tools/llvm-objdump/wasm/linked-symbol-table.yaml
new file mode 100644
index 00000000000000..6dd949a441496c
--- /dev/null
+++ b/llvm/test/tools/llvm-objdump/wasm/linked-symbol-table.yaml
@@ -0,0 +1,75 @@
+# RUN: yaml2obj %s -o %t.wasm
+# RUN: llvm-objdump -t %t.wasm | FileCheck %s
+#
+# CHECK: SYMBOL TABLE:
+# CHECK-NEXT: 0000009f g F CODE my_func_export
+# CHECK-NEXT: 0000002a g O DATA my_global_export
+# CHECK-NEXT: 00000000 g TABLE my_table_export
+
+--- !WASM
+FileHeader:
+ Version: 0x1
+Sections:
+ - Type: TYPE
+ Signatures:
+ - Index: 0
+ ParamTypes: []
+ ReturnTypes: []
+ - Type: IMPORT
+ Imports:
+ - Module: env
+ Field: foo
+ Kind: FUNCTION
+ SigIndex: 0
+ - Module: env
+ Field: bar
+ Kind: GLOBAL
+ GlobalType: I32
+ GlobalMutable: true
+ - Module: env
+ Field: memory
+ Kind: MEMORY
+ Memory:
+ Minimum: 0x1
+ - Type: FUNCTION
+ FunctionTypes: [ 0 ]
+ - Type: TABLE
+ Tables:
+ - Index: 0
+ ElemType: FUNCREF
+ Limits:
+ Flags: [ HAS_MAX ]
+ Minimum: 0x1
+ Maximum: 0x1
+ - Type: GLOBAL
+ Globals:
+ - Index: 1
+ Mutable: false
+ Type: I32
+ InitExpr:
+ Opcode: I32_CONST
+ Value: 42
+ - Type: EXPORT
+ Exports:
+ - Name: my_func_export
+ Kind: FUNCTION
+ Index: 1
+ - Name: my_global_export
+ Kind: GLOBAL
+ Index: 1
+ - Name: my_table_export
+ Kind: TABLE
+ Index: 0
+ - Type: CODE
+ Functions:
+ - Index: 1
+ Locals:
+ Body: 00
+ - Type: DATA
+ Segments:
+ - SectionOffset: 0
+ InitFlags: 0
+ Offset:
+ Opcode: I32_CONST
+ Value: 0
+ Content: ''
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would happen if we instead just changes the offset to be absolute in all cases?
@sbc100 we could potentially also do this for DSOs, with the reasoning that the use case is similar (e.g. such addresses could appear in stack traces, and users might want to do binary size attribution on DSOs as well). But I'm not sure if that would make linking worse (maybe not, since the code is PIC anyway, and the dynamic linker doesn't need to know code offsets?) |
Sorry, we raced. |
I yes, I suppose it must. I wonder it would be simpler to update that.. of course that can be a followup too |
Generally speaking I think it's better for most purposes to have things be section-relative rather than file-relative anyway. You don't have to worry about changes to the known sections or custom sections like the dylink section that come before the code section affecting anything else. |
…ss for linked files (#76198)" This reverts commit fc5f51c. Reason: Broke the sanitizer buildbot - https://lab.llvm.org/buildbot/#/builders/5/builds/39751/steps/12/logs/stdio
Thanks, working on a local repro. If I can't fix soon I'll revert. |
…ss for linked files (#76198)" WebAssembly doesn't have a single virtual memory space the way other object formats or architectures do, so "addresses" mean different things depending on the context. Function symbol addresses in object files are offsets from the start of the code section. This is good for linking and relocation. However when dealing with linked binaries, offsets from the start of the file/module are more often used (e.g. for stack traces in browsers), and are more useful for use cases like binary size attribution. This PR changes Object to use the file offset instead of the section offset for function symbols, but only for linked (non-DSO) files. This is a reland of fc5f51c with a fix for the MSan failure (it was not caused by this change, but it was revealed by the new tests).
WebAssembly doesn't have a single virtual memory space the way other object formats or architectures do, so "addresses" mean different things depending on the context.
Function symbol addresses in object files are offsets from the start of the code section. This is good for linking and relocation. However when dealing with linked binaries, offsets from the start of the file/module are more often used (e.g. for stack traces in browsers), and are more useful for use cases like binary size attribution. This PR changes Object to use the file offset instead of the section offset for function symbols, but only for linked (non-DSO) files.
This implements item number 4 from #76107