New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dwarf] Ignore DW_AT_linkage_name and DIEs with DW_AT_declaration #268
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of minor thoughts/tangents - though broader question: What does Bloaty do with the information if it isn't a declaration? The test case doesn't seem realistic to me - I wouldn't expect to see a declaration with a high/low_pc (a declaration with code seems like a contradiction) - so maybe something more realistic would help explain/motivate the feature?
For any DIE that is not a declaration (and is not stripped), Bloaty will look for any information it can find in the DIE that will associate a region of the binary with the current compileunit. That includes
The actual case seen in the wild didn't have low_pc/high_pc, it's true, but it did have For the test case I decided to use low/high_pc instead to limit the test case to pure DWARF only, instead of exercising a DWARF/ELF interaction. That said, I could change it to use |
@dwblaikie back to you, I updated the test to use |
Are the first and last cases needed, or could low/high/ranges be adequate? Then maybe you wouldn't need to check for declaration?
Ah, yeah, maybe the more suitable filter would be to only use the low/high/ranges data to attach a given subprogram to .text code? |
This is tricky; I added Maybe I should try to determine whether there are any compilers today that would produce DIEs of this sort. Unfortunately it's hard to prove a negative. |
Here is an example I found where a variable was discoverable by
Here Here is another example where the DIE gives us
However I also discovered cases where Bloaty was being misled, and the
After some experimentation, I'm inclined to think that:
Does that sound reasonable? |
Hmm, there's a case I hadn't thought of that maybe demonstrates the need for the DW_AT_declaration checking: variables whose actual object definitions have been optimized away, but where the variable is still described. This comes up for any variable, really - easily with static or inline variables (where the normal compilation stage can see all uses of this copy of the variable (ie: like an inline function, if all calls/uses are inlined then the definition can be dropped), but with LTO can happen to any variable):
Compiled with optimizations, LLVM generates this DWARF for 'i':
(unoptimized, this DW_TAG_variable DIE does include a DW_AT_location) So, yeah, maybe the original direction of this patch is good - checking for DW_AT_declaration. I guess it depends what you're using this information for? What observations are built from this? (what does Bloaty conclude/say/report if it doesn't ignore a declaration DIE?) |
Ultimately the whole goal of this exercise is to build a memory map for the binary. The memory map is trying to determine, for every byte of the file (or byte of the address space), what compileunit emitted it. In essence, we are trying to build a linker map after-the-fact, using only the information that is still present in the final binary. For each DIE in a CU, we attribute whatever regions we can find to the enclosing CU. Anything that was optimized out of the final binary we don't care about, because it doesn't have a footprint in the final binary. So it doesn't need to be part of the map.
It seems like what I proposed would handle this case successfully?
|
For the second part of the question: I'll use the example from this PR's test case:
This will lead Bloaty to report something like:
Whereas if it did not ignore the declaration, it would have reported:
Because now Since |
Are these meant to be attributed by CU (ie: to know which source file ultimately used the entity and caused it to be pulled in), or by original source (ie: to know where in the source code the entity is written, to go and make changes to it for instance)? Because in general I'd expect the declaration to have source/line numbers on it - so despite bar.c not being built with debug info (I assume that's the scenario here - otherwise you'd get the entity from bar.c's CU, right?)
Ah, fair enough. Maybe the bug/issue then is in using DW_AT_linkage_name to find symbols/attribute entities? If the DWARF producer knows the entity, it'll use high/low/ranges/location to describe it. But also, yeah, attributing anything that's only a declaration to its CU would be wrong too. |
The first. This is specifically for the
They are both built with debug info (see the We do get the entity from bar.c's CU. But since it's also present in foo.c's CU, and because we are scanning foo.c's CU first, foo.c wins and the DIE from bar.c "loses the race", so to speak. (unless we ignore
Yep, agreed. That is what I concluded above:
It's a slight bummer that |
You could still use the linkage_name only on entities that already have a location to find the size - but the DWARF way would be to look at the |
Ok PTAL, I changed the PR to both:
There is still work to do, but this seems like a good stopping point for now. It should be enough to fix the issue #236 which motivated this PR to begin with. As follow-up work I want to:
|
DWARF DIEs which have DW_AT_declaration are declarations, not definitions, and shouldn't be counted against the compileunit where they appear. This PR will cause us to completely ignore any DIE which has DW_AT_declaration set.
We also stop paying attention to DW_AT_linkage_name. We can always get the same information from either DW_AT_location or low/hi pc pairs.
This also improves the logic around deciding whether the high_pc value is a size or an absolute address. This logic should be based on the DWARF form.
Fixes: #236