Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

entryPoints do not include dynamic symbols #12

Closed
RyanGlScott opened this issue Apr 5, 2022 · 0 comments · Fixed by #15
Closed

entryPoints do not include dynamic symbols #12

RyanGlScott opened this issue Apr 5, 2022 · 0 comments · Fixed by #15
Labels
bug Something isn't working

Comments

@RyanGlScott
Copy link
Contributor

RyanGlScott commented Apr 5, 2022

In this program:

static int ONE = 1;

int getzero(void) {
  return 0;
}

int getone(void) {
  return ONE + getzero();
}

If you compile it to a shared library and strip it:

$ gcc -nostdlib -shared getone.c -o libgetone-stripped.so
$ strip libgetone-stripped.so

The assembly will look like this:

$ objdump -d libgetone-stripped.so 

libgetone-stripped.so:     file format elf64-x86-64


Disassembly of section .plt:

0000000000001000 <getzero@plt-0x10>:
    1000:       ff 35 02 30 00 00       pushq  0x3002(%rip)        # 4008 <getone+0x2fdd>
    1006:       ff 25 04 30 00 00       jmpq   *0x3004(%rip)        # 4010 <getone+0x2fe5>
    100c:       0f 1f 40 00             nopl   0x0(%rax)

0000000000001010 <getzero@plt>:
    1010:       ff 25 02 30 00 00       jmpq   *0x3002(%rip)        # 4018 <getzero+0x2ff8>
    1016:       68 00 00 00 00          pushq  $0x0
    101b:       e9 e0 ff ff ff          jmpq   1000 <getzero@plt-0x10>

Disassembly of section .text:

0000000000001020 <getzero>:
    1020:       55                      push   %rbp
    1021:       48 89 e5                mov    %rsp,%rbp
    1024:       b8 00 00 00 00          mov    $0x0,%eax
    1029:       5d                      pop    %rbp
    102a:       c3                      retq   

000000000000102b <getone>:
    102b:       55                      push   %rbp
    102c:       48 89 e5                mov    %rsp,%rbp
    102f:       e8 dc ff ff ff          callq  1010 <getzero@plt>
    1034:       8b 15 e6 2f 00 00       mov    0x2fe6(%rip),%edx        # 4020 <getone+0x2ff5>
    103a:       01 d0                   add    %edx,%eax
    103c:       5d                      pop    %rbp
    103d:       c3                      retq

Note that there are two function entry points here, one for getzero (at address 0x1020) and another for getone (at address 0x102b). macaw-loader, on the other hand, only discovers the entry point for getzero. This is due to a limitation in how entryPoints is defined:

x86EntryPoints :: (X.MonadThrow m)
=> BL.LoadedBinary MX.X86_64 (E.ElfHeaderInfo 64)
-> m (NEL.NonEmpty (MM.MemSegmentOff 64))
x86EntryPoints loadedBinary = do
case BLE.resolveAbsoluteAddress mem addrWord of
-- n.b. no guarantee of uniqueness, and in particular, entryPoint is probably in symbols somewhere
Just entryPoint -> return (entryPoint NEL.:| mapMaybe (BLE.resolveAbsoluteAddress mem) symbolWords)
Nothing -> X.throwM (InvalidEntryPoint addrWord)
where
offset = fromMaybe 0 (LC.loadOffset (BL.loadOptions loadedBinary))
mem = BL.memoryImage loadedBinary
addrWord = MM.memWord (offset + (fromIntegral (E.headerEntry (E.header (elf (BL.binaryFormatData loadedBinary))))))
elfData = elf (BL.binaryFormatData loadedBinary)
symbolWords = [ MM.memWord (fromIntegral (offset + (E.steValue entry)))
| Just (Right st) <- [E.decodeHeaderSymtab elfData]
, entry <- F.toList (E.symtabEntries st)
, E.steType entry == E.STT_FUNC
]

This implementation uses decodeHeaderSymtab, which only consults the static symbol table. This happens to contain the address for getzero because it is the main entry point address for the shared library:

$ readelf -h libgetone-stripped.so | grep "Entry point address:"
  Entry point address:               0x1020

However, libgetone-stripped.so also contains dynamic symbols:

$ readelf --dyn-syms libgetone-stripped.so 

Symbol table '.dynsym' contains 3 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000001020    11 FUNC    GLOBAL DEFAULT    7 getzero
     2: 000000000000102b    19 FUNC    GLOBAL DEFAULT    7 getone

If the entryPoints function consulted the dynamic symbols, similarly to how it is done in macaw, it would be able to find the address for getone.

This example uses x86, but it applies to AArch32 and PPC32 as well, which use identical implementations for entryPoints.

@RyanGlScott RyanGlScott added the bug Something isn't working label May 26, 2022
RyanGlScott added a commit to GaloisInc/elf-edit that referenced this issue Feb 15, 2023
Just like `decodeHeaderSymtab` decodes the static function symbol table,
`decodeHeaderDynsym` serves the same role for dynamic function symbol tables.
The functionality of `decodeHeaderDynsym` largely overlaps with the niche that
the `dynamicEntries`/`dynSymEntry` functions provide, so I have included a
comparison to those functions in the Haddocks for `decodeHeaderDynsym`.

This will be useful for eventual fixes for GaloisInc/macaw#277 and
GaloisInc/macaw-loader#12.
RyanGlScott added a commit to GaloisInc/elf-edit that referenced this issue Feb 23, 2023
Just like `decodeHeaderSymtab` decodes the static function symbol table,
`decodeHeaderDynsym` serves the same role for dynamic function symbol tables.
The functionality of `decodeHeaderDynsym` largely overlaps with the niche that
the `dynamicEntries`/`dynSymEntry` functions provide, so I have included a
comparison to those functions in the Haddocks for `decodeHeaderDynsym`.

This will be useful for eventual fixes for GaloisInc/macaw#277 and
GaloisInc/macaw-loader#12.
RyanGlScott added a commit that referenced this issue Feb 23, 2023
This:

* Bumps the `elf-edit` submodule to bring in the changes from
  GaloisInc/elf-edit#34, which adds `decodeHeaderDynsym`.
* Bumps the `macaw` submodule to bring in the changes from
  GaloisInc/macaw#320, which changes the ELF loader to always load
  dynamic function symbols.

  (Bumping the `macaw` submodule also requires bumping the `crucible`,
  `llvm-pretty`, and `semmc` submodules to adapt to recent changes.)
* Modifies the code for X86-64, AArch32, and PPC to always include dynamic
  function symbols.

Fixes #12.
RyanGlScott added a commit that referenced this issue Feb 25, 2023
This:

* Bumps the `elf-edit` submodule to bring in the changes from
  GaloisInc/elf-edit#34, which adds `decodeHeaderDynsym`.
* Bumps the `macaw` submodule to bring in the changes from
  GaloisInc/macaw#320, which changes the ELF loader to always load
  dynamic function symbols.

  (Bumping the `macaw` submodule also requires bumping the `crucible`,
  `llvm-pretty`, and `semmc` submodules to adapt to recent changes.)
* Modifies the code for X86-64, AArch32, and PPC to always include dynamic
  function symbols.

Fixes #12.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant