Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Object][Wasm] Use offset instead of index for Global address and store size #81781

Merged
merged 5 commits into from
Feb 15, 2024

Conversation

dschuff
Copy link
Member

@dschuff dschuff commented Feb 14, 2024

Currently the address reported by binutils for a global is its index; but its
offset (in the file or section) is more useful for binary size attribution.
This PR treats globals similarly to functions, and tracks their offset and
size. It also centralizes the logic differentiating linked from object and
dylib files (where section addresses are 0).

…re size

Currently the address reported by binutils for a global is its index; but its
offset (in the file or section) is more useful for binary size attribution.
This PR treats globals similarly to functions, and tracks their offset and
size. It also centralizes the logic differentiating linked from object and
dylib files (where section addresses are 0).
@llvmbot
Copy link
Collaborator

llvmbot commented Feb 14, 2024

@llvm/pr-subscribers-llvm-binary-utilities

@llvm/pr-subscribers-backend-webassembly

Author: Derek Schuff (dschuff)

Changes

Currently the address reported by binutils for a global is its index; but its
offset (in the file or section) is more useful for binary size attribution.
This PR treats globals similarly to functions, and tracks their offset and
size. It also centralizes the logic differentiating linked from object and
dylib files (where section addresses are 0).


Full diff: https://github.com/llvm/llvm-project/pull/81781.diff

6 Files Affected:

  • (modified) llvm/include/llvm/BinaryFormat/Wasm.h (+3)
  • (modified) llvm/include/llvm/Object/Wasm.h (+1-1)
  • (modified) llvm/lib/Object/WasmObjectFile.cpp (+23-13)
  • (modified) llvm/test/tools/llvm-nm/wasm/exports.yaml (+1-1)
  • (modified) llvm/test/tools/llvm-nm/wasm/weak-symbols.yaml (+1-1)
  • (modified) llvm/test/tools/llvm-objdump/wasm/linked-symbol-table-namesec.yaml (+12-3)
diff --git a/llvm/include/llvm/BinaryFormat/Wasm.h b/llvm/include/llvm/BinaryFormat/Wasm.h
index aec6ea0b757799..26d64228797777 100644
--- a/llvm/include/llvm/BinaryFormat/Wasm.h
+++ b/llvm/include/llvm/BinaryFormat/Wasm.h
@@ -350,6 +350,8 @@ struct WasmGlobal {
   WasmGlobalType Type;
   WasmInitExpr InitExpr;
   StringRef SymbolName; // from the "linking" section
+  uint32_t Offset;
+  uint32_t Size;
 };
 
 struct WasmTag {
@@ -453,6 +455,7 @@ struct WasmSymbolInfo {
     // For a data symbols, the address of the data relative to segment.
     WasmDataReference DataRef;
   };
+  uint32_t Size;
 };
 
 enum class NameType {
diff --git a/llvm/include/llvm/Object/Wasm.h b/llvm/include/llvm/Object/Wasm.h
index b8f7bb45d73a4b..cda896b2b7805b 100644
--- a/llvm/include/llvm/Object/Wasm.h
+++ b/llvm/include/llvm/Object/Wasm.h
@@ -239,7 +239,7 @@ class WasmObjectFile : public ObjectFile {
   bool isValidSectionSymbol(uint32_t Index) const;
   wasm::WasmFunction &getDefinedFunction(uint32_t Index);
   const wasm::WasmFunction &getDefinedFunction(uint32_t Index) const;
-  wasm::WasmGlobal &getDefinedGlobal(uint32_t Index);
+  const wasm::WasmGlobal &getDefinedGlobal(uint32_t Index) const;
   wasm::WasmTag &getDefinedTag(uint32_t Index);
 
   const WasmSection &getWasmSection(DataRefImpl Ref) const;
diff --git a/llvm/lib/Object/WasmObjectFile.cpp b/llvm/lib/Object/WasmObjectFile.cpp
index 04e2b809c01493..6ff11e9338d6c4 100644
--- a/llvm/lib/Object/WasmObjectFile.cpp
+++ b/llvm/lib/Object/WasmObjectFile.cpp
@@ -534,7 +534,8 @@ Error WasmObjectFile::parseNameSection(ReadContext &Ctx) {
                                   /* ImportModule */ std::nullopt,
                                   /* ImportName */ std::nullopt,
                                   /* ExportName */ std::nullopt,
-                                  {/* ElementIndex */ Index}};
+                                  {/* ElementIndex */ Index},
+                                  /* Size */ 0};
         const wasm::WasmSignature *Signature = nullptr;
         const wasm::WasmGlobalType *GlobalType = nullptr;
         const wasm::WasmTableType *TableType = nullptr;
@@ -556,6 +557,7 @@ Error WasmObjectFile::parseNameSection(ReadContext &Ctx) {
             } else {
               Info.Flags |= wasm::WASM_SYMBOL_BINDING_LOCAL;
             }
+            Info.Size = functions()[Index - getNumImportedFunctions()].Size;
           } else {
             Info.Flags |= wasm::WASM_SYMBOL_UNDEFINED;
           }
@@ -1400,18 +1402,20 @@ Error WasmObjectFile::parseTagSection(ReadContext &Ctx) {
 
 Error WasmObjectFile::parseGlobalSection(ReadContext &Ctx) {
   GlobalSection = Sections.size();
+  const uint8_t *SectionStart = Ctx.Ptr;
   uint32_t Count = readVaruint32(Ctx);
   Globals.reserve(Count);
   while (Count--) {
     wasm::WasmGlobal Global;
     Global.Index = NumImportedGlobals + Globals.size();
+    const uint8_t *GlobalStart = Ctx.Ptr;
+    Global.Offset = static_cast<uint32_t>(GlobalStart - SectionStart);
     auto GlobalOpcode = readVaruint32(Ctx);
-    auto GlobalType = parseValType(Ctx, GlobalOpcode);
-    // assert(GlobalType <= std::numeric_limits<wasm::ValType>::max());
-    Global.Type.Type = (uint8_t)GlobalType;
+    Global.Type.Type = (uint8_t)parseValType(Ctx, GlobalOpcode);
     Global.Type.Mutable = readVaruint1(Ctx);
     if (Error Err = readInitExpr(Global.InitExpr, Ctx))
       return Err;
+    Global.Size = static_cast<uint32_t>(Ctx.Ptr - GlobalStart);
     Globals.push_back(Global);
   }
   if (Ctx.Ptr != Ctx.End)
@@ -1564,7 +1568,7 @@ WasmObjectFile::getDefinedFunction(uint32_t Index) const {
   return Functions[Index - NumImportedFunctions];
 }
 
-wasm::WasmGlobal &WasmObjectFile::getDefinedGlobal(uint32_t Index) {
+const wasm::WasmGlobal &WasmObjectFile::getDefinedGlobal(uint32_t Index) const {
   assert(isDefinedGlobalIndex(Index));
   return Globals[Index - NumImportedGlobals];
 }
@@ -1815,18 +1819,22 @@ Expected<StringRef> WasmObjectFile::getSymbolName(DataRefImpl Symb) const {
 
 Expected<uint64_t> WasmObjectFile::getSymbolAddress(DataRefImpl Symb) const {
   auto &Sym = getWasmSymbol(Symb);
+  if (!Sym.isDefined())
+    return 0;
+  Expected<section_iterator> Sec = getSymbolSection(Symb);
+  if (!Sec)
+    return Sec.takeError();
+  uint32_t SectionAddress = getSectionAddress(Sec.get()->getRawDataRefImpl());
   if (Sym.Info.Kind == wasm::WASM_SYMBOL_TYPE_FUNCTION &&
       isDefinedFunctionIndex(Sym.Info.ElementIndex)) {
-    // For object files, use the section offset. The linker relies on this.
-    // For linked files, use the file offset. This behavior matches the way
-    // browsers print stack traces and is useful for binary size analysis.
-    // (see https://webassembly.github.io/spec/web-api/index.html#conventions)
-    uint32_t Adjustment = isRelocatableObject() || isSharedObject()
-                              ? 0
-                              : Sections[CodeSection].Offset;
     return getDefinedFunction(Sym.Info.ElementIndex).CodeSectionOffset +
-           Adjustment;
+           SectionAddress;
   }
+  if (Sym.Info.Kind == wasm::WASM_SYMBOL_TYPE_GLOBAL &&
+      isDefinedGlobalIndex(Sym.Info.ElementIndex)) {
+    return getDefinedGlobal(Sym.Info.ElementIndex).Offset + SectionAddress;
+  }
+
   return getSymbolValue(Symb);
 }
 
@@ -1936,6 +1944,8 @@ uint32_t WasmObjectFile::getSymbolSize(SymbolRef Symb) const {
   const WasmSymbol &Sym = getWasmSymbol(Symb);
   if (!Sym.isDefined())
     return 0;
+  if (Sym.isTypeGlobal())
+    return globals()[Sym.Info.ElementIndex - getNumImportedGlobals()].Size;
   if (Sym.isTypeData())
     return Sym.Info.DataRef.Size;
   if (Sym.isTypeFunction())
diff --git a/llvm/test/tools/llvm-nm/wasm/exports.yaml b/llvm/test/tools/llvm-nm/wasm/exports.yaml
index 15c98ae2bf4a22..27edb052203aa0 100644
--- a/llvm/test/tools/llvm-nm/wasm/exports.yaml
+++ b/llvm/test/tools/llvm-nm/wasm/exports.yaml
@@ -64,4 +64,4 @@ Sections:
 
 # CHECK:      00000000 D dexport
 # CHECK-NEXT: 00000001 T fexport
-# CHECK-NEXT: 00000000 D gexport
+# CHECK-NEXT: 00000001 D gexport
diff --git a/llvm/test/tools/llvm-nm/wasm/weak-symbols.yaml b/llvm/test/tools/llvm-nm/wasm/weak-symbols.yaml
index d8e4cece0fef77..41297044794cd3 100644
--- a/llvm/test/tools/llvm-nm/wasm/weak-symbols.yaml
+++ b/llvm/test/tools/llvm-nm/wasm/weak-symbols.yaml
@@ -80,6 +80,6 @@ Sections:
 
 # CHECK:      00000000 W weak_defined_data
 # CHECK-NEXT: 00000001 W weak_defined_func
-# CHECK-NEXT: 00000000 W weak_defined_global
+# CHECK-NEXT: 00000001 W weak_defined_global
 # CHECK-NEXT:          w weak_import_data
 # CHECK-NEXT:          w weak_import_func
diff --git a/llvm/test/tools/llvm-objdump/wasm/linked-symbol-table-namesec.yaml b/llvm/test/tools/llvm-objdump/wasm/linked-symbol-table-namesec.yaml
index dc87e62bcaac37..147175f713a831 100644
--- a/llvm/test/tools/llvm-objdump/wasm/linked-symbol-table-namesec.yaml
+++ b/llvm/test/tools/llvm-objdump/wasm/linked-symbol-table-namesec.yaml
@@ -3,10 +3,11 @@
 #
 # CHECK:      SYMBOL TABLE:
 # CHECK-NEXT: 00000000   F *UND* 00000000 my_func_import_name
-# CHECK-NEXT: 00000083 g F CODE 00000003 my_func_export_name
-# CHECK-NEXT: 00000086 l F CODE 00000003 my_func_local_name
+# CHECK-NEXT: 0000008c g F CODE 00000003 my_func_export_name
+# CHECK-NEXT: 0000008f l F CODE 00000003 my_func_local_name
 # CHECK-NEXT: 00000000    *UND* 00000000 my_global_import_name
-# CHECK-NEXT: 00000001 g  GLOBAL 00000000 my_global_export_name
+# CHECK-NEXT: 0000004c g  GLOBAL 00000005 my_global_export_name
+# CHECK-NEXT: 00000051 g  GLOBAL 00000009 my_global_local_name
 # CHECK-NEXT: 00000000 l O DATA 00000004 my_datasegment_name
 
 --- !WASM
@@ -44,6 +45,12 @@ Sections:
         InitExpr:
           Opcode:          I32_CONST
           Value:           42
+      - Index:           2
+        Mutable:         true
+        Type:            I64
+        InitExpr:
+          Opcode:          I64_CONST
+          Value:           5000000000
   - Type:            EXPORT
     Exports:
       - Name:            my_func_export
@@ -82,6 +89,8 @@ Sections:
        Name:         my_global_import_name
      - Index:        1
        Name:         my_global_export_name
+     - Index:        2
+       Name:         my_global_local_name
     DataSegmentNames:
      - Index:        0
        Name:         my_datasegment_name

@sbc100
Copy link
Collaborator

sbc100 commented Feb 14, 2024

Currently the address reported by binutils for a global is its index; but its offset (in the file or section) is more useful for binary size attribution. This PR treats globals similarly to functions, and tracks their offset and size. It also centralizes the logic differentiating linked from object and dylib files (where section addresses are 0).

On the other hand, the "address" of a global is kind of its index right? i.e. if you want to access a global you need to know its address (index) to pass to global.get/global.set.

Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comments

Copy link

github-actions bot commented Feb 14, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@dschuff
Copy link
Member Author

dschuff commented Feb 14, 2024

On the other hand, the "address" of a global is kind of its index right? i.e. if you want to access a global you need to know its address (index) to pass to global.get/global.set.

Yes, this is an interesting difference in the meanings or use cases you might have for symbols. And in fact getSymbolValue() does still return indices for both globals and functions whereas getSymbolAddress now returns offsets. It started doing this for functions way back when Wouter added disassembly support IIUC, and now globals match. This does mean that the symbol's "address" doesn't match its "value", whereas IIUC it's usually the case that they are the same thing. In principle this could violate some assumption somewhere, but it's been that way a long time for functions and hasn't been a problem. I think it just so happens that the linker always uses the "value" (since for linking/relocation use cases you do want the index as you pointed out) and the binutils use the "address". So maybe it's ok?

@dschuff dschuff merged commit 2eaeae7 into llvm:main Feb 15, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants