New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in _Backtrace_Unwind #47551

Open
bossmc opened this Issue Jan 18, 2018 · 9 comments

Comments

Projects
None yet
6 participants
@bossmc
Contributor

bossmc commented Jan 18, 2018

Upstream issue: https://bugs.llvm.org/show_bug.cgi?id=36005

The version of libunwind used in the x86_64-unknown-linux-musl (and possibly the i686-...-musl one too?) standard library has a bug where it will sometimes walk off the end of the segment containing the .eh_frame section and segfault.

I've struggled to reproduce this except in some proprietary code, but the backtrace looks like:

Program terminated with signal 11, Segmentation fault.
#0  0x0000000000fa733b in libunwind::LocalAddressSpace::get32(unsigned long) ()
(gdb) bt
#0  0x0000000000fa733b in libunwind::LocalAddressSpace::get32(unsigned long) ()
#1  0x0000000000faa0a2 in libunwind::CFI_Parser<libunwind::LocalAddressSpace>::findFDE(libunwind::LocalAddressSpace&, unsigned long, unsigned long, unsigned int, unsigned long, libunwind::CFI_Parser<libunwind::LocalAddressSpace>::FDE_Info*, libunwind::CFI_Parser<libunwind::LocalAddressSpace>::CIE_Info*) ()
#2  0x0000000000fa983d in libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_x86_64>::getInfoFromDwarfSection(unsigned long, libunwind::UnwindInfoSections const&, unsigned int) ()
#3  0x0000000000fa923d in libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_x86_64>::setInfoBasedOnIPRegister(bool) ()
#4  0x0000000000fa8fff in libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_x86_64>::step() ()
#5  0x0000000000fa820e in unw_step ()
#6  0x0000000000fa6ac0 in _Unwind_Backtrace ()

There's a proposed fix in the linked LLVM issue, which can possibly be patched into libunwind before building the musl target if taking an updated libunwind isn't possible.

@sfackler

This comment has been minimized.

Member

sfackler commented Feb 1, 2018

I ran into this as well.

@bossmc

This comment has been minimized.

Contributor

bossmc commented Feb 1, 2018

There doesn't seem to be any movement on the LLVM issue since I raised it, not even an assignee. Do you know if this is normal behaviour for them? Or if I've mis-raised the issue?

@jwilm

This comment has been minimized.

jwilm commented May 7, 2018

Running into this as well at OneSignal

@aidanhs

This comment has been minimized.

Member

aidanhs commented Jun 21, 2018

I can reproduce this on latest stable on an open source project:

docker run -it ubuntu:18.04 bash
apt update
apt install curl git gcc make musl-tools file
curl https://sh.rustup.rs -sSf | sh
source $HOME/.cargo/env
rustup target add x86_64-unknown-linux-musl
git clone https://github.com/mozilla/sccache.git
cd sccache
export TARGET=x86_64-unknown-linux-musl && export OPENSSL_DIR=/openssl-musl
./scripts/travis-musl-openssl.sh
cargo build --target x86_64-unknown-linux-musl
RUST_LOG=sccache=debug RUST_BACKTRACE=1 SCCACHE_NO_DAEMON=1 SCCACHE_START_SERVER=1 $(pwd)/target/x86_64-unknown-linux-musl/debug/sccache &
RUST_LOG=sccache=debug $(pwd)/target/x86_64-unknown-linux-musl/debug/sccache gcc -c src/test/test.c -o /tmp/test.o
RUST_LOG=sccache=debug $(pwd)/target/x86_64-unknown-linux-musl/debug/sccache gcc -c src/test/test.c -o /tmp/test.o

On the second run of the final command, the background process will segfault.

@bossmc

This comment has been minimized.

Contributor

bossmc commented Jul 20, 2018

After much investigation, I think there's two bugs here, one that LLVM's unwinder will consider unreadable memory as part of the .eh_frame section, and one that the rust compiler creates invalid (unusual?) unwind information.

The crash is specifically happening when trying to find unwind information for the frame above main (not crate::main but the "c-runtime" main. Since the frames above the C entry point are provided by the runtime (musl in this case), they are contained in crti.o/crt1.o which, in rust's musl-targeting stdlib have been provided. The provided object files have no unwind information in them:

$ readelf -S ~/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-musl/lib/crti.o 
There are 19 section headers, starting at offset 0x4a0:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .text             PROGBITS         0000000000000000  00000040
       0000000000000000  0000000000000000  AX       0     0     1
  [ 2] .data             PROGBITS         0000000000000000  00000040
       0000000000000000  0000000000000000  WA       0     0     1
  [ 3] .bss              NOBITS           0000000000000000  00000040
       0000000000000000  0000000000000000  WA       0     0     1
  [ 4] .init             PROGBITS         0000000000000000  00000040
       0000000000000001  0000000000000000  AX       0     0     1
  [ 5] .fini             PROGBITS         0000000000000000  00000041
       0000000000000001  0000000000000000  AX       0     0     1
  [ 6] .note.GNU-stack   PROGBITS         0000000000000000  00000042
       0000000000000000  0000000000000000           0     0     1
  [ 7] .debug_line       PROGBITS         0000000000000000  00000042
       0000000000000056  0000000000000000           0     0     1
  [ 8] .rela.debug_line  RELA             0000000000000000  000002e0
       0000000000000030  0000000000000018   I      17     7     8
  [ 9] .debug_info       PROGBITS         0000000000000000  00000098
       0000000000000049  0000000000000000           0     0     1
  [10] .rela.debug_info  RELA             0000000000000000  00000310
       0000000000000048  0000000000000018   I      17     9     8
  [11] .debug_abbrev     PROGBITS         0000000000000000  000000e1
       0000000000000012  0000000000000000           0     0     1
  [12] .debug_aranges    PROGBITS         0000000000000000  00000100
       0000000000000040  0000000000000000           0     0     16
  [13] .rela.debug_arang RELA             0000000000000000  00000358
       0000000000000048  0000000000000018   I      17    12     8
  [14] .debug_ranges     PROGBITS         0000000000000000  00000140
       0000000000000040  0000000000000000           0     0     16
  [15] .rela.debug_range RELA             0000000000000000  000003a0
       0000000000000060  0000000000000018   I      17    14     8
  [16] .shstrtab         STRTAB           0000000000000000  00000400
       000000000000009f  0000000000000000           0     0     1
  [17] .symtab           SYMTAB           0000000000000000  00000180
       0000000000000150  0000000000000018          18    12     8
  [18] .strtab           STRTAB           0000000000000000  000002d0
       000000000000000d  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
  I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)

This in and of itself isn't a huge problem, the libunwind code stops trying to unwind when it can't find unwind information for the next frame (https://github.com/llvm-mirror/libunwind/blob/release_39/src/UnwindCursor.hpp#L1323-L1324), so it should stop after main which would be fine. so long as we can safely fail to find the unwind information.

The issue is that libunwind's logic for searching for FDE (frame description entry) for the parent frame is pretty all-encompassing and runs in many phases, and one of these crashes (sometimes) if there searched for address is not present in the unwind information). At a high level the process looks like (https://github.com/llvm-mirror/libunwind/blob/release_39/src/UnwindCursor.hpp#L1196)

  • Look for a "compact unwinding table"
  • Look for "DWARF unwind information"
  • Look for ARM EHABI unwind info
  • Look for dynamically created DWARF unwind entries

For Rust x86_64 musl binaries, the compiler has provided DWARF unwind information, so we fall into the second bullet (https://github.com/llvm-mirror/libunwind/blob/release_39/src/UnwindCursor.hpp#L866):

  • No compact encoding hint, so the first section isn't interesting
  • Search the .eh_frame_hdr section for an index for the frame - not present
  • Check the local cache in case we've seen this frame before - we haven't
  • Scan the whole of the .eh_frame section for the address of interest - CRASH!

Digging further, we see that the logic for scanning the .eh_frame sector looks like (https://github.com/llvm-mirror/libunwind/blob/release_39/src/DwarfParser.hpp#L175):

  • Read a length field
    • If it's 0 stop, we're done - rustc never seems to generate this "end-of-information" marker
  • Skip over CIE entries
  • For non-CIE entries work out if the address we're looking for is in the range of the entry, if so we've got our FDE
  • Jump forward by the length of the CFI entry

In each loop we check that the newly found CIE entry is in a reasonable place (between ehSectionStart and ehSectionEnd). If it's not, bail out for safety and fail the lookup.

Unfortunately, as in the linked issue originally ehSectionEnd is incorrectly set too big (it's set to ehSectionStart + **segmentLength** which is almost always wrong) so the unwind code will step right off the end of the .eh_frame segment and into one of:

  • .gcc_except_table (LSDA) - This starts with a signature that is approximately 0x9c9bff, which is bigger than the error introduced by the ehSectionEnd miscalculation, so (after failing to parse the LSDA as a CFI entry) it jumps forward out of the range ehSectionStart..ehSectionEnd and bails out because the of the "reasonable location" check I mentioned above.
  • The gap before .gcc_except_table (if the .eh_frame did not have a length that's a multiple of 32 and thus there's padding between the sections). In this case when it tries to read the next entry one of three things happens:
    • The padding bytes happen to have a large value (when interpreted as a u32) and we're in a similar position to the case where there's no padding, fail to parse the CFI entry, try to jump to the next one, realise we're jumping out of the allowed range and bail out for safety.
    • The padding bytes are medium sized when interpreted as a u32, in this case, after the failed parse, the unwinder jumps far enough to be off the end of the loaded segment, but not far enough to be past the ehSectionEnd so the sanity check passes and the unwinder attempts to read the next CFI entry, reads a 32-bit number from an invalid address and... 💥
    • The padding bytes are small when interpreted as a u32, and the unwinder jumps into a random part of the LSDA and we're back in a situation that's functionally similar to when we're reading the padding bytes and the same three options are still applicable in the next lookup.

So, what's the bug(s) here?

  • LLVM is miscalculating the section length (the original issue)
  • Rustc (or the linker?) doesn't include the end-of-records marker at the end of .eh_frame

Either of these would fix this issue, probably both should be done?

@bossmc

This comment has been minimized.

Contributor

bossmc commented Jul 20, 2018

Ah, there's another possible route for the crash:

  • If the LOAD segement is really big, the signature at the start of the LSDA might not be big enough to jump past ehSectionEnd leading to an invalid read and a 💥
@cetra3

This comment has been minimized.

cetra3 commented Sep 3, 2018

I've ran into this issue as well. What's the status of this?

@bossmc

This comment has been minimized.

Contributor

bossmc commented Sep 7, 2018

I've been looking at a proper fix for this and neither of the two paths I proposed above are yeilding simple results:

  • LLVM miscalculating the section length:
    • The simple fix I proposed in the upstream issue allows the unwinder to read arbitrary parts of the LOAD segment before bailing, which might have unexpected effects
    • A more complex fix needs to calculate the section length at runtime, which is tricky as sections don't exist at runtime...
      • Neither the .eh_frame nor .eh_frame_ptr section contains a length field so we're not simply told the length
      • Maybe read the executable off disk (look in /proc/self/exe) and parse the section table? Might not be readable to the current user. Might not even exist (not sure if this is possible, can you create a linux executable from an in-memory image?)
  • Add terminator entry to the end of the .eh_frames section
    • This relies on the static linker's behaviour, which is a shame as rustc allows you to bring your own linker
    • ld and gold both seem inconsistent about whether they add terminator entries, sometimes they do, sometimes they don't
    • This feels like something that rustc can't fix locally or rely on externally

Another option would be for LLVM to skip walking the .eh_frame section if the lookup in the .eh_frame_ptr fails. It's unlikely to ever reveal anything new, and it seems that this section is just un-walkable with any kind of reliability.

@bossmc

This comment has been minimized.

Contributor

bossmc commented Sep 21, 2018

Here are two tested workarounds, neither very pretty, but both work:

1 - Custom link script

  1. Add rustflags = ["-C", "-Wl,--verbose"] to your Cargo config

  2. Do a build, extract the link script from the linker verbose output

  3. Find the lines:

     .eh_frame       : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) }
     ...
     .eh_frame       : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) }
    
  4. Change them to:

     .eh_frame       : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) LONG(0x0) }
     ...
     .eh_frame       : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) LONG(0x0) }
    

    (0x00000000 is the CIE terminator)

  5. Save off this script somewhere

  6. Set rustflags = ["-C", "-Wl,-T<script>"] in Cargo config (probably under the Musl target)

  7. Rebuild

2 - Add a custom object

  1. Create an object containing just the .eh_frame terminator:

    ; Create a terminator entry for the `.eh_frame` section of rust binaries.
    ;
    ; See https://github.com/rust-lang/rust/issues/47551 for details
    ;
    ; You can build this with:
    ;
    ;    nasm -f elf64 eh_frame_terminator.asm
    ;
    section .eh_frame
      ; The terminator is a 0-length CIE, the first field of which is the length
      ; as a 32-bit number.
      dd 0x00000000
    

2, Add rustflags = ["-C", "link-args=-Wl,<path/to/>eh_frame_terminator.o"] to your Cargo config (probably under the Musl target)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment