Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for parsing and reading C++ type templates #60

Merged
merged 6 commits into from
Jan 9, 2021

Conversation

jgkamat
Copy link
Contributor

@jgkamat jgkamat commented Apr 23, 2020

There are a couple of things missing, but I want to try to keep this as small as possible for now.

Completed:

  • Parsing type (not value) templates from functions and structs (and probably classes too).
  • Accessing template parameters over python
  • Extremely basic tests

Missing:

  • Parsing for value templates
  • Ability to construct templated types from python

Right now this support is always enabled (not only for C++ units), as we won't see this dwarf tag (I hope) in C code, and plumbing the current language down might be a little difficult. If there's a straightforward (and desirable) way to do this, let me know!

I'm the most unsure about the python bindings (and the TypeTemplateParameter python object). Most of it was derived from the Members array, so hopefully it's correct, but if you could take a close look at that it would be nice. Hopefully I didn't miss anything too obvious!

The commits are not too well seperated at the moment (too much is in the first commit). I can break it out or move things around if it would make more sense, however!)


r = dwarf_child(die, &child);
while (r == 0) {
if (dwarf_tag(&child) == DW_TAG_template_type_parameter) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since implementing support for incomplete types was a little sensitive as I moved stuff around, I made it into its own commit for now. If it looks good, I'll squash it into the others where they belong.

Right now, I ended up doing two passes over the children, to avoid creating the members array for incomplete types. However I could also do a single pass, and delete any (erroneous) member entries for incomplete types. Let me know what you like more!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep it as one pass by modifying the loop over the DWARF children to check if (!declaration && dwarf_tag(&child) == DW_TAG_member). You'll also have to move the incomplete type initialization to happen in the same place as the complete types and skip the dwarf_bytesize() for incomplete types

Copy link
Owner

@osandov osandov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it took me awhile to get to this. In terms of splitting up the commits, I think it'd make the most sense to have two commits: one which adds this support for structs, unions, and classes (including the Python scaffolding + tests) and a separate commit which adds the same to functions. Would it be painful to respin that way?

libdrgn/drgn.h.in Outdated Show resolved Hide resolved
libdrgn/drgn.h.in Outdated Show resolved Hide resolved
libdrgn/drgn.h.in Outdated Show resolved Hide resolved
libdrgn/type.c Outdated
@@ -197,6 +204,7 @@ void drgn_int_type_init(struct drgn_type *type, const char *name, uint64_t size,
}
type->_private.size = size;
type->_private.is_signed = is_signed;
type->_private.num_template_parameters = 0;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't bother initializing this for types without template parameters. It should be caught by asserting in the getter.

_drgn.pyi Outdated

template_parameters: Sequence[TypeTemplateParameter]
"""
List of template parameters making up this type.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "List of template parameters of this type"? This should also mention what types it's valid for (structure, union, class, and function?).


r = dwarf_child(die, &child);
while (r == 0) {
if (dwarf_tag(&child) == DW_TAG_template_type_parameter) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep it as one pass by modifying the loop over the DWARF children to check if (!declaration && dwarf_tag(&child) == DW_TAG_member). You'll also have to move the incomplete type initialization to happen in the same place as the complete types and skip the dwarf_bytesize() for incomplete types

self.assertEqual(die_type.template_parameters[0].name, "T")
self.assertEqual(die_type.template_parameters[1].name, "U")
self.assertEqual(die_type.template_parameters[0].type, int_type("int", 4, True))
self.assertEqual(die_type.template_parameters[1].type, int_type("int", 4, True))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add the tests after the Python binding support so you can compare against a constructed type instead of having to check each template parameter.

@@ -1808,67 +1863,73 @@ static DrgnType *compound_type(PyObject *tag_obj, PyObject *size_obj,
DrgnType *struct_type(PyObject *self, PyObject *args, PyObject *kwds)
{
static char *keywords[] = {
"tag", "size", "members", "qualifiers", "language", NULL,
"tag", "size", "members", "qualifiers", "template_parameters",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have template_parameters between members and qualifiers? Or does that break a bunch of tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it only breaks one, so I'll fix it and change the order

Copy link
Contributor Author

@jgkamat jgkamat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I focused most on my efforts on the areas that conflicted, so I'm not sure if changes are needed in the places that didnt (most of the python boilerplate). It seems to still work though, so I'll leave that as is for now. I'm expecting the rebase onto the members patch to have less conflicts however.

bool can_cache_members = true;
bool can_cache_templates = true;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these 'can_cache' variables for? I don't think I understand what they're doing for members, so I'm not sure if something is needed if it is true on templates as well.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we access some attribute of a type, we translate the C attribute to Python and cache it for subsequent uses. When a type is created in Python, we translate the Python arguments to C to build the type. As an optimization, we can sometimes cache the Python argument as an attribute directly when the type is created (that's what all the _PyDict_SetItemId() calls at the end of this function do). We can't cache the list of members/parameters/template parameters if they use a callback, because they each need their own thunk. That's what these are for, and you do it for template parameters (as well as a call to _PyDict_SetItemId()).

libdrgn/type.c Outdated
@@ -499,12 +511,50 @@ drgn_compound_type_builder_add_member(struct drgn_compound_type_builder *builder
return NULL;
}

struct drgn_error *
drgn_type_builder_add_template_parameter(struct drgn_program *prog,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just flagging this - I didn't want to duplicate almost the same thing for both compound and function type builders - so I unified the "add template parameter path" by splitting the arguments and omitting the builder. Not sure if there's a nicer way to de-duplicate the code without something like this.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be better to have drgn_compound_type_builder_add_template_parameter() and drgn_function_type_builder_add_template_parameter() be the public interfaces, but they can both call through to this (static) helper internally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the public interfaces are separate, is there a way I could not duplicate the callers, ie unpack_template and parse_template? I could pass both builders and see which one is non-null but that seems a bit hacky.

name = dwarf_formstring(attr);
if (!name) {
return drgn_error_create(DRGN_ERROR_OTHER,
"DW_TAG_member has invalid DW_AT_name");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DW_TAG_template_type_parameter, not DW_TAG_member, right?

name = NULL;
}

err = drgn_lazy_type_from_dwarf(dbinfo, die, false, "DW_TAG_member",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.


Dwarf_Die child;
int r = dwarf_child(die, &child);
while (r == 0) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can do this in the same pass as parsing the members. The main difference is that if the type is a declaration, you shouldn't get the size or endianness, and you should ignore any DW_TAG_member children.

libdrgn/type.c Outdated
Comment on lines 540 to 543
enum drgn_type_kind kind = builder->kind;
assert(kind == DRGN_TYPE_STRUCT ||
kind == DRGN_TYPE_UNION ||
kind == DRGN_TYPE_CLASS);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drgn_compound_type_builder_init() already checked this, so no need to repeat it.

libdrgn/type.c Outdated
@@ -828,6 +859,9 @@ drgn_function_type_create(struct drgn_function_type_builder *builder,
type->_private.language =
lang ? lang : drgn_program_language(builder->prog);
*ret = type;
// TODO fix this, functions can have templates
type->_private.num_template_parameters = 0;
type->_private.language = drgn_language_or_default(lang);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this snuck in from a conflict. We're already assigning the language.

libdrgn/type.h Outdated
@@ -348,8 +364,9 @@ drgn_compound_type_builder_add_member(struct drgn_compound_type_builder *builder
* member must remain valid for the lifetime of @c builder->prog.
* @param[in] tag Name of the type. Not copied; must remain valid for the
* lifetime of @c builder->prog. May be @c NULL if the type is anonymous.
* @param[in] size Size of the type in bytes.
* @param[in] size Size of the type in bytes. Ignored if type is incomplete.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think about it, I think we want to explicitly return an error if is_complete and size != 0. We should also return an error if is_complete and members were added to the builder.

_drgn.pyi Outdated
@@ -529,6 +529,7 @@ class Program:
tag: Optional[str],
size: IntegerLike,
members: Sequence[TypeMember],
template_parameters: Optional[Sequence[TypeTemplateParameter]] = None,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean for template_parameters to be None? For members, there's a distinction: an empty sequence is no members, and None is an incomplete type. I don't see a distinction for template_parameters. Perhaps this should be template_parameters: Sequence[TypeTemplateParameter] = () (also, template_parameters should be mentioned in the docstring).

_drgn.pyi Outdated
Comment on lines 1642 to 1644
:param type: Type of the template parameter. This may be a :class:`Type` or ``None``.
:param name: Name of the template parameter. This may be ``None`` if the parameter is
unnamed.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should go in the docstring for __init__() now (and you can copy the text for TypeMember.__init__() since that correctly mentions that type can be a callable).

Comment on lines 614 to 618
if (append_member(parts, self, &first, template_parameters) == -1)
goto out_repr_leave;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make this conditional on the type having template parameters? That'll make repr() less noisy for the common case of no parameters.

bool can_cache_members = true;
bool can_cache_templates = true;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we access some attribute of a type, we translate the C attribute to Python and cache it for subsequent uses. When a type is created in Python, we translate the Python arguments to C to build the type. As an optimization, we can sometimes cache the Python argument as an attribute directly when the type is created (that's what all the _PyDict_SetItemId() calls at the end of this function do). We can't cache the list of members/parameters/template parameters if they use a callback, because they each need their own thunk. That's what these are for, and you do it for template parameters (as well as a call to _PyDict_SetItemId()).

cached_members : Py_None) == -1))
cached_members : Py_None) == -1) ||
(can_cache_templates &&
_PyDict_SetItemId(type_obj->attr_cache, &DrgnType_attr_template_parameters.id,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I'm doing this right - can you take a closer look at this (and the function version as well). I have no idea how to test it.

@jgkamat
Copy link
Contributor Author

jgkamat commented Oct 13, 2020

I rebased this diff on top of #72 - still no value templates support but it shouldn't be too difficult. I'm just trying to resolve conflicts right now and make sure things are working as is.

@jgkamat
Copy link
Contributor Author

jgkamat commented Jan 8, 2021

This is more of a note to myself so I don't forget than anything else -

on void templates there is no DW_AT_type (at least on clang), but there is a name, which means we have to do something to represent these instead of just failing - or we can just drop them completely in a worst case.

In preparation for struct drgn_type referencing struct drgn_object, move
the former after the latter.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
Getting the bit field size of a member will soon require evaluating the
lazy type, so return it from drgn_member_type() instead of accessing it
directly.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
In order to support static members, methods, default function arguments,
and value template parameters, we need to be able to store a drgn_object
in a drgn_type_member or drgn_type_parameter. These are all cases where
we want lazy evaluation, so we can replace drgn_lazy_type with a new
drgn_lazy_object which implements the same idea but for objects. Types
can still be represented with an absent object.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
This will be needed for types containing reference objects.

Based on a patch from Jay Kamat.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
In preparation for calling the object parsing code from the type parsing
code, move it up in the file (and update the coding style in
drgn_object_from_dwarf_enumerator() while we're at it).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
Add struct drgn_type_template_parameter to libdrgn, the corresponding
TypeTemplateParameter to the Python bindings, and support for parsing
them from DWARF.

With this, support for templates is almost, but not quite, complete. The
main wart is that DW_TAG_name of compound types includes the template
parameters, so the type tag includes it as well. We should remove that
from the tag and instead have the type formatting code add it only when
getting the full type name.

Based on a patch from Jay Kamat.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
@osandov
Copy link
Owner

osandov commented Jan 9, 2021

I rebased your patches on the lazy object rework I did, added support for non-type parameters, and cleaned it up. If all the tests pass, I think this should be ready to go in!

@osandov osandov merged commit 352c31e into osandov:master Jan 9, 2021
@osandov
Copy link
Owner

osandov commented Jan 9, 2021

Merged! Let me know if you get the chance to test it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants