Demangle Function Names #270

data-pup · 2019-03-15T22:10:01Z

Fixes #174.

This is a much larger PR than I expected it to be, and apologies on that regard. 😬

I tried a simpler fix here: master...data-pup:simple-function-name-demangle-fix

Notably, we need to include the type of item in the output, so that we can differentiate between code and function items. Unfortunately, this meant that the user would need to specify items by their entire name and type/location when using subcommands that accept arguments.

This change, while more involved, allows us to query for items by name, while still showing the demangled names of functions and code blocks! By moving the name of an item from the ir::Item struct into the ir::ItemKind field, we can handle these properly.

The other notable part is that this involves storing the "annotation" (i.e. func[1]) in that struct, and adding an annotated_name method which is used when displaying items. This is required because the diff command breaks if we use the annotated name, because we can't tell that two items are related if an item moved in the index space.

Because this is a hefty PR, I kept changes in separate commits, to help make reviewing this easier. I'll try and find some time to annotate the important parts over the weekend, or early next week 🙂

data-pup · 2019-03-15T22:22:45Z

twiggy/tests/all/expectations/dominators_wee_alloc_with_depth_and_row

-            387 ┊     13.74% ┊ func[2]
-            378 ┊     13.42% ┊   ⤷ wee_alloc::alloc_with_refill::hb32c1bbce9ebda8e
+            387 ┊     13.74% ┊ func[2]: wee_alloc::alloc_with_refill::hb32c1bbce9ebda8e
+            378 ┊     13.42% ┊   ⤷ code[2]: wee_alloc::alloc_with_refill::hb32c1bbce9ebda8e


Here's what we see now when running dominators now 😸

bjorn3 · 2019-03-16T09:59:02Z

twiggy/tests/all/expectations/diff_wee_alloc

         -25 ┊ data[1]
         -25 ┊ data[2]
-         +15 ┊ hello
+         +15 ┊ code[0]: hello
         +15 ┊ import env::rust_oom


Maybe give this and custom section ... a prefix like the others too.

fitzgen · 2019-03-19T19:53:42Z

Sorry for the delay -- I'll get to this tomorrow for sure. It is a bit bigger than I expected it to be :-p

fitzgen · 2019-03-20T19:50:43Z

Sorry for the delay -- I'll get to this tomorrow for sure. It is a bit bigger than I expected it to be :-p

fitzgen · 2019-03-20T19:50:54Z

(Woops, mis-click)

fitzgen

Thanks for the PR @data-pup

I have concerns about the usability/ergonomics of this new output, and I'm not sure it is an improvement over what we have right now. Additionally, non-wasm inputs don't have the func vs code distinction, but are still getting the annotation prefix, so their output is particularly worse with this in that they get the annotation prefix but it doesn't even need disambiguation.

I think I would prefer, only on wasm where there is an ambiguity in the first place, output that is something like

123 | foo::bar::Baz::quux (func[7])
456 | foo::bar::Baz::quux (code[7])

over putting the annotation in the front.

Even this I am not super happy with, though. I wonder if it makes sense to collapse func[123] and code[123] into the same IR item, since no body is used by more than one func and vice versa. They are essentially the same, even if they are split at teh physical binary level (to allow eager type checking and validation). Twiggy's original goal/design was to try and model exactly what the binary layout was, but in this case, since there is no sharing, we might want to loosen that for usability.

cc @alexcrichton

fitzgen · 2019-03-20T19:53:20Z

ir/ir.rs

+            ItemKind::Func(func) => {
+                if let Some(name) = func.name() {
+                    // format!("{}: {}", func.decorator(), name)
+                    func.decorator().to_string()


Seems like this is WIP? both sides of the if/else are the same, but there is some commented out code.

~~wooops, good catch. Thanks 😅~~

Turns out that this did get cleaned up in a later commit, was using this to make sure I was still generating the same output when annotations were removed 👍

fitzgen · 2019-03-20T19:55:08Z

ir/ir.rs

@@ -517,6 +530,9 @@ pub enum ItemKind {
    /// with the executable code.
    Data(Data),

+    /// Function definition. Declares the signature of a function.
+    Func(Function),


This is pretty wasm-specific, but I guess that's fine, since other kinds of binaries can simply not generate any IR items of this variant.

fitzgen · 2019-03-20T20:01:46Z

ir/ir.rs

+    }
+
+    /// Get this item's decorated name.
+    /// TODO: This needs better documentation.


Yes, please :)

Ditto here, the TODO was removed, but I still have the same sentiment. Maybe a simple example, something like:

/// Get this item's decorated name, i.e. 'foo (code[7])'

That might warrant adding some other info to the existing name method, which I'm also happy to add if you think that makes sense 🙂

fitzgen · 2019-03-20T20:19:19Z

twiggy/tests/all/expectations/elf_top_25_hello_world_rs

+          7917 ┊     0.38% ┊ Subroutine[68][131]: je_stats_print
+          7852 ┊     0.37% ┊ Subroutine[52][2800]: mallocx
+          7019 ┊     0.33% ┊ Subroutine[53][4212]: je_arena_boot
+          6221 ┊     0.30% ┊ Subroutine[71][86]: je_malloc_vsnprintf


I'm concerned that this is a regression in usability for our output, and not an improvement. It seems that every item has an extra prefix now that is not meaningful or actionable to the user.

Yeah, totally agree. Maybe an alternative would be to represent subroutines in an ELF/etc. file with a different variant that wouldn't display this prefix?

similar to the Func variant, other things need not create a Subroutine variant if that isn't applicable, and in the case of DWARF info, we can hide that behind a cfg flag anyhow

data-pup · 2019-03-20T21:07:42Z

Even this I am not super happy with, though. I wonder if it makes sense to collapse func[123] and code[123] into the same IR item, since no body is used by more than one func and vice versa. They are essentially the same, even if they are split at teh physical binary level (to allow eager type checking and validation). Twiggy's original goal/design was to try and model exactly what the binary layout was, but in this case, since there is no sharing, we might want to loosen that for usability.

Totally agree w/ you about the ergonomics of this re: output being iffy. I can refactor the output to resemble the format you provided, & agree w/ you that the extra info would be best listed after the name (if at all).

I've noticed that the code/func relationship is confusing to people when explaining some of the output of this tool, so maybe collapsing these together would be useful. Happy to wait for Alex's feedback on this. At that point, it might be worth deciding whether I should try solving this w/ a different approach altogether, or refactoring this into something w/ nicer output 🙂

Thanks for the review @fitzgen ❕

data-pup · 2019-03-20T21:17:04Z

ir/ir.rs

-            code.demangled().unwrap_or(&self.name)
-        } else {
-            &self.name
+        match &self.kind {


Another note, while we're in review: I sort of detest that this is called the item's "kind" if it now also stores the name due to this approach. Would this be best referred to as ItemMetadata or something along those lines?

heh, we used kind in rust-bindgen, so it is familiar to me, but I don't have much of a first-principles argument here

alexcrichton · 2019-03-20T21:47:13Z

I think I agree with @fitzgen that the best way to handle this on WebAssembly is to merge the func/code nodes. When looking at the size of a function we may as well account for the bytes that the function type pointer takes up (although it's like 1-3 at most realistically). It'll definitely make reading the output a bit easier as I know in the past I've had to follow chains a few levels deep to get to the real part of a function!

data-pup · 2019-03-21T14:11:41Z

That sounds good to me! Thanks for the review/advice y'all 🎉 🙏

I'm going to close this for now, because it is probably easiest to start over if we should go that route. If I run into any trouble I'll ping for advice back in #174.

fitzgen · 2019-03-21T22:09:07Z

Thanks @data-pup! Sorry for the delay followed by the back-and-forth :(

data-pup · 2019-03-22T16:07:47Z

@fitzgen not a problem at all! I know you're all kinda busy w/ gloo & other projects atm 🙂 Most of this work did feel speculative & exploratory anyhow, so it's nice to feel confident about why merging func/code items in the IR model is a better approach than this.

data-pup added 26 commits March 14, 2019 18:39

move 'name' field into ir::ItemKind variants

c2dc0f5

add Func variant. Adjust Code variant.

49cce26

move analyze/analyses/diff.rs into submodule

c36c091

move DiffEntry into submodule

78e28c1

move diff emission logic into submodule

73c9042

diff refactoring

efd55fe

print decorated names of functions in diff output

6089604

update diff expectations w/ decorated function names

235c33d

add decorated names to code blocks

680ed8f

update diff expectations with decorated code block names

d2dbd79

annotate names in dominators output

d649215

update dominators test outputs

26809f5

display annotated names in garbage output

324b4d9

update garbage expected outputs

1fd121d

use demangled name for code blocks in decorated output

0c39d9b

update code block names in test expectations

6824407

outline ir::Code::demangle

47a652a

use demangled name for functions

811f798

update function name demangling in test expectations

fd2102c

decorate item names in paths command

2765fff

update paths test expectations

639ea65

use decorated name for top output

2ed65af

update top test expectations

704bb68

move demangle function into submodule

042f21c

rename decorator -> annotation

cd7b565

ir::Item::name should still return &str

f29095e

data-pup commented Mar 15, 2019

View reviewed changes

bjorn3 reviewed Mar 16, 2019

View reviewed changes

fitzgen closed this Mar 20, 2019

fitzgen reopened this Mar 20, 2019

fitzgen reviewed Mar 20, 2019

View reviewed changes

data-pup commented Mar 20, 2019

View reviewed changes

data-pup closed this Mar 21, 2019

data-pup mentioned this pull request Jun 25, 2019

Demangle function names #308

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Demangle Function Names #270

Demangle Function Names #270

data-pup commented Mar 15, 2019 •

edited

data-pup Mar 15, 2019

bjorn3 Mar 16, 2019

fitzgen commented Mar 19, 2019

fitzgen commented Mar 20, 2019

fitzgen commented Mar 20, 2019

fitzgen left a comment

fitzgen Mar 20, 2019

data-pup Mar 20, 2019 •

edited

data-pup Mar 20, 2019

fitzgen Mar 20, 2019

fitzgen Mar 20, 2019

data-pup Mar 20, 2019

fitzgen Mar 20, 2019

data-pup Mar 20, 2019

data-pup Mar 20, 2019

data-pup commented Mar 20, 2019 •

edited

data-pup Mar 20, 2019

fitzgen Mar 21, 2019

alexcrichton commented Mar 20, 2019

data-pup commented Mar 21, 2019

fitzgen commented Mar 21, 2019

data-pup commented Mar 22, 2019

Demangle Function Names #270

Demangle Function Names #270

Conversation

data-pup commented Mar 15, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fitzgen commented Mar 19, 2019

fitzgen commented Mar 20, 2019

fitzgen commented Mar 20, 2019

fitzgen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

data-pup Mar 20, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

data-pup commented Mar 20, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexcrichton commented Mar 20, 2019

data-pup commented Mar 21, 2019

fitzgen commented Mar 21, 2019

data-pup commented Mar 22, 2019

data-pup commented Mar 15, 2019 •

edited

data-pup Mar 20, 2019 •

edited

data-pup commented Mar 20, 2019 •

edited