Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demangle Function Names #270

Closed
wants to merge 26 commits into from

Conversation

data-pup
Copy link
Member

@data-pup data-pup commented Mar 15, 2019

Fixes #174.

This is a much larger PR than I expected it to be, and apologies on that regard. 😬

I tried a simpler fix here: master...data-pup:simple-function-name-demangle-fix

Notably, we need to include the type of item in the output, so that we can differentiate between code and function items. Unfortunately, this meant that the user would need to specify items by their entire name and type/location when using subcommands that accept arguments.

This change, while more involved, allows us to query for items by name, while still showing the demangled names of functions and code blocks! By moving the name of an item from the ir::Item struct into the ir::ItemKind field, we can handle these properly.

The other notable part is that this involves storing the "annotation" (i.e. func[1]) in that struct, and adding an annotated_name method which is used when displaying items. This is required because the diff command breaks if we use the annotated name, because we can't tell that two items are related if an item moved in the index space.

Because this is a hefty PR, I kept changes in separate commits, to help make reviewing this easier. I'll try and find some time to annotate the important parts over the weekend, or early next week 🙂

387 ┊ 13.74% ┊ func[2]
378 ┊ 13.42% ┊ ⤷ wee_alloc::alloc_with_refill::hb32c1bbce9ebda8e
387 ┊ 13.74% ┊ func[2]: wee_alloc::alloc_with_refill::hb32c1bbce9ebda8e
378 ┊ 13.42% ┊ ⤷ code[2]: wee_alloc::alloc_with_refill::hb32c1bbce9ebda8e
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what we see now when running dominators now 😸

-25 ┊ data[1]
-25 ┊ data[2]
+15 ┊ hello
+15 ┊ code[0]: hello
+15 ┊ import env::rust_oom
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe give this and custom section ... a prefix like the others too.

@fitzgen
Copy link
Member

fitzgen commented Mar 19, 2019

Sorry for the delay -- I'll get to this tomorrow for sure. It is a bit bigger than I expected it to be :-p

1 similar comment
@fitzgen
Copy link
Member

fitzgen commented Mar 20, 2019

Sorry for the delay -- I'll get to this tomorrow for sure. It is a bit bigger than I expected it to be :-p

@fitzgen fitzgen closed this Mar 20, 2019
@fitzgen fitzgen reopened this Mar 20, 2019
@fitzgen
Copy link
Member

fitzgen commented Mar 20, 2019

(Woops, mis-click)

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @data-pup

I have concerns about the usability/ergonomics of this new output, and I'm not sure it is an improvement over what we have right now. Additionally, non-wasm inputs don't have the func vs code distinction, but are still getting the annotation prefix, so their output is particularly worse with this in that they get the annotation prefix but it doesn't even need disambiguation.

I think I would prefer, only on wasm where there is an ambiguity in the first place, output that is something like

123 | foo::bar::Baz::quux (func[7])
456 | foo::bar::Baz::quux (code[7])

over putting the annotation in the front.

Even this I am not super happy with, though. I wonder if it makes sense to collapse func[123] and code[123] into the same IR item, since no body is used by more than one func and vice versa. They are essentially the same, even if they are split at teh physical binary level (to allow eager type checking and validation). Twiggy's original goal/design was to try and model exactly what the binary layout was, but in this case, since there is no sharing, we might want to loosen that for usability.

cc @alexcrichton

ir/ir.rs Outdated
ItemKind::Func(func) => {
if let Some(name) = func.name() {
// format!("{}: {}", func.decorator(), name)
func.decorator().to_string()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this is WIP? both sides of the if/else are the same, but there is some commented out code.

Copy link
Member Author

@data-pup data-pup Mar 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wooops, good catch. Thanks 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out that this did get cleaned up in a later commit, was using this to make sure I was still generating the same output when annotations were removed 👍

@@ -517,6 +530,9 @@ pub enum ItemKind {
/// with the executable code.
Data(Data),

/// Function definition. Declares the signature of a function.
Func(Function),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty wasm-specific, but I guess that's fine, since other kinds of binaries can simply not generate any IR items of this variant.

ir/ir.rs Outdated
}

/// Get this item's decorated name.
/// TODO: This needs better documentation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here, the TODO was removed, but I still have the same sentiment. Maybe a simple example, something like:

/// Get this item's decorated name, i.e. 'foo (code[7])'

That might warrant adding some other info to the existing name method, which I'm also happy to add if you think that makes sense 🙂

7917 ┊ 0.38% ┊ Subroutine[68][131]: je_stats_print
7852 ┊ 0.37% ┊ Subroutine[52][2800]: mallocx
7019 ┊ 0.33% ┊ Subroutine[53][4212]: je_arena_boot
6221 ┊ 0.30% ┊ Subroutine[71][86]: je_malloc_vsnprintf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that this is a regression in usability for our output, and not an improvement. It seems that every item has an extra prefix now that is not meaningful or actionable to the user.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, totally agree. Maybe an alternative would be to represent subroutines in an ELF/etc. file with a different variant that wouldn't display this prefix?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to the Func variant, other things need not create a Subroutine variant if that isn't applicable, and in the case of DWARF info, we can hide that behind a cfg flag anyhow

@data-pup
Copy link
Member Author

data-pup commented Mar 20, 2019

Even this I am not super happy with, though. I wonder if it makes sense to collapse func[123] and code[123] into the same IR item, since no body is used by more than one func and vice versa. They are essentially the same, even if they are split at teh physical binary level (to allow eager type checking and validation). Twiggy's original goal/design was to try and model exactly what the binary layout was, but in this case, since there is no sharing, we might want to loosen that for usability.

Totally agree w/ you about the ergonomics of this re: output being iffy. I can refactor the output to resemble the format you provided, & agree w/ you that the extra info would be best listed after the name (if at all).

I've noticed that the code/func relationship is confusing to people when explaining some of the output of this tool, so maybe collapsing these together would be useful. Happy to wait for Alex's feedback on this. At that point, it might be worth deciding whether I should try solving this w/ a different approach altogether, or refactoring this into something w/ nicer output 🙂

Thanks for the review @fitzgen

code.demangled().unwrap_or(&self.name)
} else {
&self.name
match &self.kind {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another note, while we're in review: I sort of detest that this is called the item's "kind" if it now also stores the name due to this approach. Would this be best referred to as ItemMetadata or something along those lines?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh, we used kind in rust-bindgen, so it is familiar to me, but I don't have much of a first-principles argument here

@alexcrichton
Copy link
Contributor

I think I agree with @fitzgen that the best way to handle this on WebAssembly is to merge the func/code nodes. When looking at the size of a function we may as well account for the bytes that the function type pointer takes up (although it's like 1-3 at most realistically). It'll definitely make reading the output a bit easier as I know in the past I've had to follow chains a few levels deep to get to the real part of a function!

@data-pup
Copy link
Member Author

That sounds good to me! Thanks for the review/advice y'all 🎉 🙏

I'm going to close this for now, because it is probably easiest to start over if we should go that route. If I run into any trouble I'll ping for advice back in #174.

@data-pup data-pup closed this Mar 21, 2019
@fitzgen
Copy link
Member

fitzgen commented Mar 21, 2019

Thanks @data-pup! Sorry for the delay followed by the back-and-forth :(

@data-pup
Copy link
Member Author

@fitzgen not a problem at all! I know you're all kinda busy w/ gloo & other projects atm 🙂 Most of this work did feel speculative & exploratory anyhow, so it's nice to feel confident about why merging func/code items in the IR model is a better approach than this.

@data-pup data-pup mentioned this pull request Jun 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dominators with depth 1 seemingly doesn't demangle names
4 participants