Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit symbol_map to only nonzero-sized symbols #447

Closed
wants to merge 2 commits into from

Conversation

sfackler
Copy link

The documentation for symbol_map claims that this filtering should be happening, but it doesn't appear that it was. In particular, I'm seeing that code produced by rustc has symbols with the Unknown kind named like $x.224 defined at the same address as many functions, and depending on the ordering can be returned by SymbolMap::get rather than the "real" function name.

@sfackler
Copy link
Author

sfackler commented Jun 30, 2022

A snippet from readelf -s where you can see an example of this:

   345: 00000000016077bc    92 FUNC    LOCAL  DEFAULT   13 _ZN4core3ptr71dr[...]
   346: 00000000016077bc     0 NOTYPE  LOCAL  DEFAULT   13 $x.229

@sfackler
Copy link
Author

sfackler commented Jul 1, 2022

Hmm, should this STT_NOTYPE actually be here? https://github.com/gimli-rs/object/blob/master/src/read/elf/symbol.rs#L476

It looks like the backtrace crate limits itself to STT_FUNC and STT_OBJECT: https://github.com/rust-lang/backtrace-rs/blob/master/src/symbolize/gimli/elf.rs#L113

@philipc
Copy link
Contributor

philipc commented Jul 1, 2022

I've seen STT_NOTYPE for some text or data symbols in the past. I'll see if I can find them again.

@sfackler
Copy link
Author

sfackler commented Jul 1, 2022

Alternatively, the logic could try to pick the "best" name for a given address. Beyond the weird $x.224 symbols, this could help more generally. For example, in Ubuntu 20.04's libpthread.so.0, I see both of these symbols in its .symtab:

   780: 000000000000f500   928 FUNC    LOCAL  DEFAULT   14 __pthread_cond_timedwait
  1226: 000000000000f500   928 FUNC    GLOBAL DEFAULT   14 pthread_cond_timedwait@@GLIBC_2.17

There could be some logic that e.g. prefers FUNC over NOTYPE and GLOBAL over LOCAL. Currently the name you end up with is at the whims of sort_unstable which isn't ideal.

@philipc
Copy link
Contributor

philipc commented Jul 1, 2022

Examples of STT_NOTYPE are rare. One from libffi.so:

   158: 0000000000004b98   140 NOTYPE  GLOBAL DEFAULT   11 ffi_closure_SYSV

A more dubious one from a vmlinux for aarch64:

191893: ffffffc000083800  1924 NOTYPE  GLOBAL DEFAULT    2 vectors

My recollection is that these are generated by something other than gcc/llvm (e.g. maybe GNU as).

is_definition is primarily for use by things like symbol_map, so additional checks should be added to there instead of changing symbol_map. I think checking for non-zero size will avoid the symbols causing you problems, but there will probably still be some cases for which this isn't enough.

There could be some logic that e.g. prefers FUNC over NOTYPE and GLOBAL over LOCAL.

This sounds desirable, although the way the code is written doesn't make this easy to do.

1226: 000000000000f500 928 FUNC GLOBAL DEFAULT 14 pthread_cond_timedwait@@GLIBC_2.17

As an aside, we don't currently include versions in the map, but maybe that would be useful.

@sfackler
Copy link
Author

sfackler commented Jul 1, 2022

As an aside, we don't currently include versions in the map, but maybe that would be useful.

It appears that the version is directly embedded in the symbol name, so I'd expect the map to return the versioned name for roughly half the symbols depending on how the unstable sort behaved.

@sfackler
Copy link
Author

sfackler commented Jul 1, 2022

Updated to filter out zero sized symbols instead.

@sfackler sfackler changed the title Limit symbol_map to only text and data symbols Limit symbol_map to only nonzero-sized symbols Jul 1, 2022
@bjorn3
Copy link
Contributor

bjorn3 commented Jul 1, 2022

It looks like all GCC_except_table* symbols are marked as zero-sized. Furthermore on the rustup executable it shows several symbols that should be included as zero sized. For example _ZN4home6OS_ENV17hd513f13b3a826f77E.

$ readelf -W --symbols ~/.cargo/bin/rustup | grep -v 00000000000000 | grep "  0" | grep -v GCC_except_table | grep -v SECTION
   280: 0000000000cc49ac     0 NOTYPE  GLOBAL DEFAULT   27 _edata
   312: 0000000000cc93e0     0 NOTYPE  GLOBAL DEFAULT   28 _end
   321: 0000000000cc49ac     0 NOTYPE  GLOBAL DEFAULT   28 __bss_start
    41: 000000000010d6e8     0 FUNC    LOCAL  DEFAULT   13 call_gmon_start
  2574: 000000000084c5e0     0 OBJECT  LOCAL  DEFAULT   15 _ZN14num_bigint_dig7bigrand20SMALL_PRIMES_PRODUCT17hbe491e16a58b374bE
  6262: 0000000000837f6d     0 OBJECT  LOCAL  DEFAULT   15 _ZN4home6OS_ENV17hd513f13b3a826f77E
  6326: 0000000000891ec0     0 OBJECT  LOCAL  DEFAULT   15 _ZN4ring2ec10curve255197ed2551912verification7ED2551917h8b881db946aa7bc1E
  9259: 0000000000c09178     0 OBJECT  LOCAL  DEFAULT   23 __JCR_LIST__
  9260: 000000000010d700     0 FUNC    LOCAL  DEFAULT   13 deregister_tm_clones
  9261: 000000000010d740     0 FUNC    LOCAL  DEFAULT   13 register_tm_clones
  9262: 000000000010d790     0 FUNC    LOCAL  DEFAULT   13 __do_global_dtors_aux
  9264: 0000000000c09170     0 OBJECT  LOCAL  DEFAULT   22 __do_global_dtors_aux_fini_array_entry
  9265: 000000000010d7d0     0 FUNC    LOCAL  DEFAULT   13 frame_dummy
  9266: 0000000000c09168     0 OBJECT  LOCAL  DEFAULT   21 __frame_dummy_init_array_entry
  9502: 00000000005f2081     0 NOTYPE  LOCAL  DEFAULT   13 _vpaes_schedule_low_round
  9513: 00000000005f45c0     0 NOTYPE  LOCAL  DEFAULT   13 __bn_sqr8x_internal
  9514: 00000000005f4b39     0 NOTYPE  LOCAL  DEFAULT   13 __bn_sqr8x_reduction
  9515: 00000000005f5b60     0 NOTYPE  LOCAL  DEFAULT   13 __bn_sqrx8x_internal
  9516: 00000000005f6197     0 NOTYPE  LOCAL  DEFAULT   13 __bn_sqrx8x_reduction
  9551: 00000000005ff680     0 OBJECT  LOCAL  DEFAULT   13 K512
  9553: 0000000000600dc0     0 NOTYPE  LOCAL  DEFAULT   13 chacha20_poly1305_constants
  9557: 000000000060506d     0 NOTYPE  LOCAL  DEFAULT   13 process_extra_in_trailer
  9561: 000000000060bd80     0 OBJECT  LOCAL  DEFAULT   13 K256
 10192: 00000000006c0f20     0 NOTYPE  LOCAL  DEFAULT   13 __bn_sqr8x_internal
 10193: 00000000006c1499     0 NOTYPE  LOCAL  DEFAULT   13 __bn_sqr8x_reduction
 10194: 00000000006c24c0     0 NOTYPE  LOCAL  DEFAULT   13 __bn_sqrx8x_internal
 10195: 00000000006c2af7     0 NOTYPE  LOCAL  DEFAULT   13 __bn_sqrx8x_reduction
 11282: 0000000000722420     0 NOTYPE  LOCAL  DEFAULT   13 K_XX_XX
 11283: 0000000000720060     0 NOTYPE  LOCAL  DEFAULT   13 _avx2_shortcut
 11284: 000000000071df00     0 NOTYPE  LOCAL  DEFAULT   13 _avx_shortcut
 11285: 000000000071d8c0     0 NOTYPE  LOCAL  DEFAULT   13 _shaext_shortcut
 11290: 0000000000726d40     0 NOTYPE  LOCAL  DEFAULT   13 K_XX_XX
 11291: 00000000007253f0     0 NOTYPE  LOCAL  DEFAULT   13 _avx2_shortcut
 11292: 0000000000724660     0 NOTYPE  LOCAL  DEFAULT   13 _avx_shortcut
 11293: 00000000007235a0     0 NOTYPE  LOCAL  DEFAULT   13 _shaext_shortcut
 11294: 0000000000723820     0 NOTYPE  LOCAL  DEFAULT   13 _ssse3_shortcut
 11303: 000000000072e700     0 NOTYPE  LOCAL  DEFAULT   13 K256
 11304: 000000000072ef20     0 NOTYPE  LOCAL  DEFAULT   13 K256_shaext
 11305: 000000000072c2a0     0 NOTYPE  LOCAL  DEFAULT   13 _avx2_shortcut
 11306: 000000000072a1a0     0 NOTYPE  LOCAL  DEFAULT   13 _avx_shortcut
 11307: 0000000000729900     0 NOTYPE  LOCAL  DEFAULT   13 _shaext_shortcut
 11312: 0000000000730280     0 OBJECT  LOCAL  DEFAULT   13 K256
 11313: 0000000000730540     0 NOTYPE  LOCAL  DEFAULT   13 _shaext_shortcut
 11733: 000000000075a920     0 NOTYPE  LOCAL  DEFAULT   13 _avx_cbc_dec_shortcut
 11734: 000000000075a200     0 NOTYPE  LOCAL  DEFAULT   13 _avx_cbc_enc_shortcut
 11738: 000000000075d440     0 NOTYPE  LOCAL  DEFAULT   13 K_XX_XX
 11743: 000000000075dac0     0 OBJECT  LOCAL  DEFAULT   13 K256
 11773: 00000000007667e1     0 NOTYPE  LOCAL  DEFAULT   13 _vpaes_schedule_low_round
 12251: 00000000007d0e00     0 OBJECT  LOCAL  DEFAULT   13 K512
 12512: 00000000009e34e0     0 OBJECT  LOCAL  DEFAULT   17 __FRAME_END__
 12513: 0000000000c09178     0 OBJECT  LOCAL  DEFAULT   23 __JCR_END__
 12652: 0000000000cbd120     0 OBJECT  LOCAL  DEFAULT   26 _GLOBAL_OFFSET_TABLE_
 12937: 0000000000c09170     0 NOTYPE  LOCAL  DEFAULT   21 __init_array_end
 13010: 0000000000c09160     0 NOTYPE  LOCAL  DEFAULT   21 __init_array_start
 13080: 0000000000cbced0     0 OBJECT  LOCAL  DEFAULT   25 _DYNAMIC
 13333: 000000000010d6bc     0 FUNC    GLOBAL DEFAULT   13 _start
 13704: 00000000007eeb50     0 FUNC    GLOBAL DEFAULT   14 _fini
 14809: 0000000000cc49b0     0 OBJECT  GLOBAL HIDDEN    27 __TMC_END__
 14894: 0000000000cbe000     0 OBJECT  GLOBAL HIDDEN    27 __dso_handle
 15366: 0000000000cc49ac     0 NOTYPE  GLOBAL DEFAULT   28 __bss_start
 15766: 0000000000cc93e0     0 NOTYPE  GLOBAL DEFAULT   28 _end
 16162: 0000000000cc49ac     0 NOTYPE  GLOBAL DEFAULT   27 _edata
 16433: 00000000000f7a28     0 FUNC    GLOBAL DEFAULT   11 _init

@philipc
Copy link
Contributor

philipc commented Jul 2, 2022

I think that GCC_except_table* should be treated as labels, not definitions. Same for some of those other zero size symbols.

Some of the symbols appear to be incorrectly specified as zero size. For example, register_tm_clones appears to be 0x4d bytes of instructions.

So I think it'll be a case of trying the best we can, rather than expecting it to be perfect. How about only doing the zero size check for STT_NOTYPE?

@philipc
Copy link
Contributor

philipc commented Aug 1, 2022

How about only doing the zero size check for STT_NOTYPE?

@sfackler Would this work for you? Or do you have another suggestion?

@sfackler
Copy link
Author

sfackler commented Aug 1, 2022

This is the logic I ended up using, which also incorporates that heuristic to pick the "best" symbol:

    fn symbol_map(file: &File<'a, &'a [u8]>) -> SymbolMap<SymbolMapName<'a>> {
        let mut map = BTreeMap::<u64, PotentialName<'_>>::new();

        let symbols = file
            .symbol_table()
            // https://github.com/gimli-rs/object/pull/443
            .filter(|t| t.symbols().next().is_some())
            .or_else(|| file.dynamic_symbol_table());

        // Binaries commonly have multiple symbols for a given address, so we put a bit of effort in here to pick the
        // "best" one. In particular, we limit to function and data object symbols, and prefer globally visible symbols
        // over private symbols.
        for symbol in symbols.into_iter().flat_map(|s| s.symbols()) {
            if !symbol.is_definition() {
                continue;
            }

            if symbol.kind() == SymbolKind::Unknown {
                continue;
            }

            let name = match symbol.name() {
                Ok(name) => name,
                Err(_) => continue,
            };

            let name = PotentialName {
                name: SymbolMapName::new(symbol.address(), name),
                global: symbol.is_global(),
            };

            match map.entry(symbol.address()) {
                Entry::Occupied(mut e) => {
                    if name.global && !e.get().global {
                        e.insert(name);
                    }
                }
                Entry::Vacant(e) => {
                    e.insert(name);
                }
            }
        }

        SymbolMap::new(map.into_values().map(|n| n.name).collect())
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants