Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WebAssembly][Object]Use file offset as function symbol address for linked files #76198

Merged
merged 2 commits into from
Jan 2, 2024

Conversation

dschuff
Copy link
Member

@dschuff dschuff commented Dec 22, 2023

WebAssembly doesn't have a single virtual memory space the way other object formats or architectures do, so "addresses" mean different things depending on the context.
Function symbol addresses in object files are offsets from the start of the code section. This is good for linking and relocation. However when dealing with linked binaries, offsets from the start of the file/module are more often used (e.g. for stack traces in browsers), and are more useful for use cases like binary size attribution. This PR changes Object to use the file offset instead of the section offset for function symbols, but only for linked (non-DSO) files.

This implements item number 4 from #76107

…inked files

WebAssembly doesn't have a single virtual memory space the way other object
formats or architectures do, so "addresses" mean different things depending
on the context.
Function symbol addresses in object files are offsets from the start of the code
section. This is good for linking and relocation. However when dealing with
linked binaries, offsets from the start of the file/module are more often
used (e.g. for stack traces in browsers), and are more useful for use
cases like binary size attribution. This PR changes Object to use
the file offset instead of the section offset for function symbols, but
only for linked (non-DSO) files.
@dschuff dschuff changed the title [WebAssembly][Object]Use file offset as function symbol address for l… [WebAssembly][Object]Use file offset as function symbol address for linked files Dec 22, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Dec 22, 2023

@llvm/pr-subscribers-backend-webassembly

@llvm/pr-subscribers-llvm-binary-utilities

Author: Derek Schuff (dschuff)

Changes

WebAssembly doesn't have a single virtual memory space the way other object formats or architectures do, so "addresses" mean different things depending on the context.
Function symbol addresses in object files are offsets from the start of the code section. This is good for linking and relocation. However when dealing with linked binaries, offsets from the start of the file/module are more often used (e.g. for stack traces in browsers), and are more useful for use cases like binary size attribution. This PR changes Object to use the file offset instead of the section offset for function symbols, but only for linked (non-DSO) files.

This implements item number 4 from #76107


Full diff: https://github.com/llvm/llvm-project/pull/76198.diff

3 Files Affected:

  • (modified) llvm/lib/Object/WasmObjectFile.cpp (+10-4)
  • (added) llvm/test/tools/llvm-nm/wasm/linked.yaml (+74)
  • (added) llvm/test/tools/llvm-objdump/wasm/linked-symbol-table.yaml (+75)
diff --git a/llvm/lib/Object/WasmObjectFile.cpp b/llvm/lib/Object/WasmObjectFile.cpp
index dfe86a45df3227..99c6f1a8fbd4ee 100644
--- a/llvm/lib/Object/WasmObjectFile.cpp
+++ b/llvm/lib/Object/WasmObjectFile.cpp
@@ -1662,10 +1662,16 @@ Expected<StringRef> WasmObjectFile::getSymbolName(DataRefImpl Symb) const {
 Expected<uint64_t> WasmObjectFile::getSymbolAddress(DataRefImpl Symb) const {
   auto &Sym = getWasmSymbol(Symb);
   if (Sym.Info.Kind == wasm::WASM_SYMBOL_TYPE_FUNCTION &&
-      isDefinedFunctionIndex(Sym.Info.ElementIndex))
-    return getDefinedFunction(Sym.Info.ElementIndex).CodeSectionOffset;
-  else
-    return getSymbolValue(Symb);
+      isDefinedFunctionIndex(Sym.Info.ElementIndex)) {
+    // For object files, use the section offset. For linked files, use the file
+    // offset
+    uint32_t Adjustment = isRelocatableObject() || isSharedObject()
+                              ? 0
+                              : Sections[CodeSection].Offset;
+    return getDefinedFunction(Sym.Info.ElementIndex).CodeSectionOffset +
+           Adjustment;
+  }
+  return getSymbolValue(Symb);
 }
 
 uint64_t WasmObjectFile::getWasmSymbolValue(const WasmSymbol &Sym) const {
diff --git a/llvm/test/tools/llvm-nm/wasm/linked.yaml b/llvm/test/tools/llvm-nm/wasm/linked.yaml
new file mode 100644
index 00000000000000..992c1811743b7a
--- /dev/null
+++ b/llvm/test/tools/llvm-nm/wasm/linked.yaml
@@ -0,0 +1,74 @@
+# RUN: yaml2obj %s -o %t.wasm
+# RUN: llvm-nm %t.wasm | FileCheck %s
+
+# CHECK: 0000009f T my_func_export
+# CHECK-NEXT: 0000002a D my_global_export
+# CHECK-NEXT: 00000000 D my_table_export
+
+--- !WASM
+FileHeader:
+  Version:         0x1
+Sections:
+  - Type:            TYPE
+    Signatures:
+      - Index:           0
+        ParamTypes:      []
+        ReturnTypes:     []
+  - Type:            IMPORT
+    Imports:
+      - Module:          env
+        Field:           foo
+        Kind:            FUNCTION
+        SigIndex:        0
+      - Module:          env
+        Field:           bar
+        Kind:            GLOBAL
+        GlobalType:      I32
+        GlobalMutable:   true
+      - Module:          env
+        Field:           memory
+        Kind:            MEMORY
+        Memory:
+          Minimum:         0x1
+  - Type:            FUNCTION
+    FunctionTypes:   [ 0 ]
+  - Type:            TABLE
+    Tables:
+      - Index:           0
+        ElemType:        FUNCREF
+        Limits:
+          Flags:           [ HAS_MAX ]
+          Minimum:         0x1
+          Maximum:         0x1
+  - Type:            GLOBAL
+    Globals:
+      - Index:           1
+        Mutable:         false
+        Type:            I32
+        InitExpr:
+          Opcode:          I32_CONST
+          Value:           42
+  - Type:            EXPORT
+    Exports:
+      - Name:            my_func_export
+        Kind:            FUNCTION
+        Index:           1
+      - Name:            my_global_export
+        Kind:            GLOBAL
+        Index:           1
+      - Name:            my_table_export
+        Kind:            TABLE
+        Index:           0
+  - Type:            CODE
+    Functions:
+      - Index:           1
+        Locals:
+        Body:            00
+  - Type:            DATA
+    Segments:
+      - SectionOffset:   0
+        InitFlags:       0
+        Offset:
+          Opcode:          I32_CONST
+          Value:           0
+        Content:         ''
diff --git a/llvm/test/tools/llvm-objdump/wasm/linked-symbol-table.yaml b/llvm/test/tools/llvm-objdump/wasm/linked-symbol-table.yaml
new file mode 100644
index 00000000000000..6dd949a441496c
--- /dev/null
+++ b/llvm/test/tools/llvm-objdump/wasm/linked-symbol-table.yaml
@@ -0,0 +1,75 @@
+# RUN: yaml2obj %s -o %t.wasm
+# RUN: llvm-objdump -t %t.wasm | FileCheck %s
+#
+# CHECK:      SYMBOL TABLE:
+# CHECK-NEXT: 0000009f g F CODE my_func_export
+# CHECK-NEXT: 0000002a g O DATA my_global_export
+# CHECK-NEXT: 00000000 g   TABLE my_table_export
+
+--- !WASM
+FileHeader:
+  Version:         0x1
+Sections:
+  - Type:            TYPE
+    Signatures:
+      - Index:           0
+        ParamTypes:      []
+        ReturnTypes:     []
+  - Type:            IMPORT
+    Imports:
+      - Module:          env
+        Field:           foo
+        Kind:            FUNCTION
+        SigIndex:        0
+      - Module:          env
+        Field:           bar
+        Kind:            GLOBAL
+        GlobalType:      I32
+        GlobalMutable:   true
+      - Module:          env
+        Field:           memory
+        Kind:            MEMORY
+        Memory:
+          Minimum:         0x1
+  - Type:            FUNCTION
+    FunctionTypes:   [ 0 ]
+  - Type:            TABLE
+    Tables:
+      - Index:           0
+        ElemType:        FUNCREF
+        Limits:
+          Flags:           [ HAS_MAX ]
+          Minimum:         0x1
+          Maximum:         0x1
+  - Type:            GLOBAL
+    Globals:
+      - Index:           1
+        Mutable:         false
+        Type:            I32
+        InitExpr:
+          Opcode:          I32_CONST
+          Value:           42
+  - Type:            EXPORT
+    Exports:
+      - Name:            my_func_export
+        Kind:            FUNCTION
+        Index:           1
+      - Name:            my_global_export
+        Kind:            GLOBAL
+        Index:           1
+      - Name:            my_table_export
+        Kind:            TABLE
+        Index:           0
+  - Type:            CODE
+    Functions:
+      - Index:           1
+        Locals:
+        Body:            00
+  - Type:            DATA
+    Segments:
+      - SectionOffset:   0
+        InitFlags:       0
+        Offset:
+          Opcode:          I32_CONST
+          Value:           0
+        Content:         ''

@dschuff dschuff requested a review from sbc100 December 22, 2023 00:51
Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if we instead just changes the offset to be absolute in all cases?

@dschuff
Copy link
Member Author

dschuff commented Dec 22, 2023

@sbc100 we could potentially also do this for DSOs, with the reasoning that the use case is similar (e.g. such addresses could appear in stack traces, and users might want to do binary size attribution on DSOs as well). But I'm not sure if that would make linking worse (maybe not, since the code is PIC anyway, and the dynamic linker doesn't need to know code offsets?)

@dschuff
Copy link
Member Author

dschuff commented Dec 22, 2023

Sorry, we raced.
If we did this for relocatable objects, I think the linker relies on this address being a section offset? I haven't tried it though.

@sbc100
Copy link
Collaborator

sbc100 commented Dec 22, 2023

Sorry, we raced. If we did this for relocatable objects, I think the linker relies on this address being a section offset? I haven't tried it though.

I yes, I suppose it must. I wonder it would be simpler to update that.. of course that can be a followup too

@dschuff
Copy link
Member Author

dschuff commented Dec 22, 2023

Generally speaking I think it's better for most purposes to have things be section-relative rather than file-relative anyway. You don't have to worry about changes to the known sections or custom sections like the dylink section that come before the code section affecting anything else.

llvm/lib/Object/WasmObjectFile.cpp Outdated Show resolved Hide resolved
@dschuff dschuff merged commit fc5f51c into llvm:main Jan 2, 2024
3 of 4 checks passed
@dschuff dschuff deleted the symbolvalue branch January 2, 2024 22:55
@vitalybuka
Copy link
Collaborator

hctim added a commit that referenced this pull request Jan 3, 2024
@dschuff
Copy link
Member Author

dschuff commented Jan 3, 2024

Thanks, working on a local repro. If I can't fix soon I'll revert.

dschuff added a commit that referenced this pull request Jan 3, 2024
…ss for linked files (#76198)"

WebAssembly doesn't have a single virtual memory space the way other object
formats or architectures do, so "addresses" mean different things depending
on the context.
Function symbol addresses in object files are offsets from the start of the code
section. This is good for linking and relocation. However when dealing with
linked binaries, offsets from the start of the file/module are more often
used (e.g. for stack traces in browsers), and are more useful for use
cases like binary size attribution. This PR changes Object to use
the file offset instead of the section offset for function symbols, but
only for linked (non-DSO) files.

This is a reland of fc5f51c with a fix for the MSan failure (it was not caused
by this change, but it was revealed by the new tests).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants