New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cached random access to CUs and DIEs #264
Conversation
I've been working on this for a few weeks now, developing a test to locate static variables. I've been testing using a full Linux kernel build so testing a few is useful. This part has been refactored enough that I'm comfortable pushing it; the bigger script still needs a few rounds. This addresses #262 so I've put some efffort into getting this commented and pushed. I just added the LUTEntry object today. |
I'm dealing with cases of both intra-CU and cross-CU references (Xcode loves the latter). If one tries to follow the reference to another CU where not a single DIE has been parsed/cached so far, will that work? Please test the scenario. If it works as it is, remove the misleading comment please :)
|
It does. |
38e210e
to
f8892cb
Compare
I initially read this as can I factor this into two commits, but then I realized you asked for multiple PRs.
Based on your request I squashed DIE cache cleanup (which was always just showing my work but caused review noise), and consolidated the find a DIE pieces into a helpers commit. Review note: github seems to show the commits by authored date instead of parent order so the test case is appearing before the API helpers in the GUI. |
@rhelmot @rupran @woodruffw pinging some folks who've been active recently -- since this is a large change, I'd really appreciate some help with reviewing :-) My review queue for pycparser has grown rather long. |
@mdmillerii do you have some measurements / benchmark numbers to report? What kind of operations does this make faster, and how much faster? Note to self: we really need some portfolio of files for performance measurements |
I can review (and test) here in a bit, thanks for the ping! |
Thanks! I'll assign you, just unassign when you're done |
elftools/dwarf/dwarfinfo.py
Outdated
@@ -112,6 +117,84 @@ def has_debug_info(self): | |||
""" | |||
return bool(self.debug_info_sec) | |||
|
|||
def get_DIE_from_lute(self, lutentry): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Maybe rename to either get_DIE_from_lut
or get_DIE_from_lut_entry
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lut_entry SGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good to me! I left one (style) nit, and am currently testing this against a big suite of internal projects to see if it breaks anything.
Confirmed that this works with my internal tooling. I'll do some additional performance testing. |
@woodruffw out of curiosity, what is this big suite of projects you have? Do you have a setup to analyze many different ELF binaries with pyelftools? |
elftools/dwarf/die.py
Outdated
""" Search for our parent by, starting with the CU top die, | ||
setting the _parent link of each child and then iterating on | ||
the child which was closest to object DIE offset without over, | ||
whithan to the object DIE offset, which is an ancestor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence does not make sense to me, maybe reword or split the sentence in two parts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that was trying to stuff everything into a sentence and it got lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. I left some minor comments. Feel free to fix these and the comments made by others (thanks for the reviews!), and rebase.
elftools/dwarf/compileunit.py
Outdated
self.cu_die_offset <= refaddr < self.cu_offset + self.size, | ||
'refaddr %s not in DIE range of CU %s' % (refaddr, self.cu_offset)) | ||
|
||
die = self._get_cached_DIE(refaddr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just return this directly without the die =
intermediate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
elftools/dwarf/die.py
Outdated
search = self.cu.get_top_DIE() | ||
|
||
while search.offset < self.offset: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: remove this empty line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
elftools/dwarf/die.py
Outdated
|
||
search = prev | ||
|
||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what's going on here. pass
outside the loop?
Should this line be removed?
Also the commented code following it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactoring remenants, fixed.
elftools/dwarf/dwarfinfo.py
Outdated
@@ -112,6 +117,84 @@ def has_debug_info(self): | |||
""" | |||
return bool(self.debug_info_sec) | |||
|
|||
def get_DIE_from_lute(self, lutentry): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lut_entry SGTM
@@ -253,23 +336,41 @@ def range_lists(self): | |||
|
|||
#------ PRIVATE ------# | |||
|
|||
def _parse_CUs_iter(self): | |||
def _parse_CUs_iter(self, offset=0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document what offset means
elftools/dwarf/dwarfinfo.py
Outdated
See get_CU_at(). | ||
""" | ||
# If the insert point is 0 the entry does not match (perhaps the cache | ||
# is empty). Otherwise on a miss parse and insert. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand what the second sentence in this comment means. Please rephrase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was saying if the first sentence was false, then do the actions. Hopefully the new is clearer,
test/test_dwarf_cu_and_die_cache.py
Outdated
for (k, v) in pt.items(): | ||
ndie = dwarf.get_DIE_from_lute(v) | ||
self.dprint(ndie) | ||
# if not ndie.tag == 'DW_TAG_variable': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove?
Sorry for the ambiguity there: it's both a big suite of projects (i.e. test binaries) and tools that use pyelftools (as a dependency) to extract information from those binaries. There's nothing publicly available at the moment, but it's all part of the DARPA CHESS program. |
Add a function to set the _parent link of known chldren, iterating down each parent of a target DIE. Walk all children of a given parent and set each child's ._parent to avoid O(n^2) walking. A future commit will add other methods to instatiate a DIE that will not set the _parent link as the DIE is instantiated. This walk uses the knowledge that in a flattened tree a parents offset will always be less than the childs. The call to die.set_parent in compile_unit iter_DIE_children could be removed to make the method private,, but it is free to set starting from the top DIE. Alternativly make it an optional argument to DIE creation.
Accept a resolved reference address for a DIE in a compile unit and parse the DIE at that location. Insert into the _diemap / _dielist cache shared with iter_DIE_children() for fast repeated lookups. This can be used to follow attribute references to a DIE that be referenced several times (eg for a DW_AT_type reference) or find a DIE referenced in a lookup table.
Maintain a cache of compile units parsed and a map of their offsets similar to the one mainained of DIEs by compile units. Add the ability to parse a random compile unit when the offset of the compile unit header is known. Add the ability to search for a compile unit containing (spanning) a given refaddr, such as that obtained from a DIE reference class attribute, starting from the closest previous cached compile unit.
Add APIs to lookup a DIE from: (a) a DIE reference class attribute taking into account the attribute form, (b) from a lookup table entry (NameLUTEntry) from a .pub_types or .pub_names section, or (c) directly from a reference addresss (.debug_info offset) regardless of how it was obtained. Add a test that will lookup dies from pubnames and follow die by ref. This is a simple test that exercises the new cache lookup methods and provides a starting point on how to determine a variables type. For now raise NotImplemented exception for type signature lookup and supplemental dwarf object files.
This commit addresses the review comments in eliben#264 showing the changes to the prior submission after rebasing before folding them into the prior commits.
Here's my update. I also pushed a branch random-die-review that shows the previous content rebased then 1 patch reverted so reviewers can see what was changed, specifically on a5d42a4 |
Here is a series that refactors the DIE cache to cache entries without walking from the top of a CU,
and also one that adds a similar cache to the CU lookup in the DWARFInfo.