Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-frontend] Inheritance and overriding #1639

Merged
merged 7 commits into from Feb 7, 2024

Conversation

brcfarias
Copy link
Collaborator

This PR adds supports for inheritance in Python, addressing the following aspects:

  • When a class lacks a definition for __init__, the corresponding method from the base class must be referenced. In such cases, the first class in the inheritance list with a defined __init__ method should be utilized.
  • Instance attributes are now added in all instance methods.
  • Methods overriding is also addressed.

The regression tests provides a overview of the features that are being handled.

@brcfarias brcfarias force-pushed the python-frontend branch 3 times, most recently from c5a2c96 to 88a3221 Compare February 3, 2024 23:14
auto self_instance = instance_attr_map.find(self_id);
if (self_instance != instance_attr_map.end())
{
std::vector<std::string> &attr_list = instance_attr_map[obj_symbol_id];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the order of elements in attr_list important? If not, you could consider using std::unordered_map and std::unordered_set for faster lookups.

}
struct_typet &class_type = static_cast<struct_typet &>(class_symbol->type);
for (auto component : class_type.components())
clazz.components().push_back(component);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we use emplace_back instead of push_back for better performance?

assert(obj1.data == 10)
obj1.data = 20
assert(obj1.data == 20)
assert(obj1.x == 5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend keeping the original regression and creating a new one with these changes.

Copy link
Contributor

@lucasccordeiro lucasccordeiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Member

@fbrausse fbrausse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the delegation to the first __init__ in the list of base classes work recursively, e.g., will instantiating D invoke B's init method here?

class A: pass
class B:
  def __init__(self): pass
class C(B): pass
class D(A, C): pass

I am not a big fan of encoding the relationship between methods and classes in the symbol's identifier. Is the C++ frontend using the same mechanism, or could we possibly find a way to make these relationships accessible via some kind of API?

src/python-frontend/python_converter.cpp Outdated Show resolved Hide resolved
Comment on lines 384 to 388
symbol_id.replace(
pos,
std::string("@C@" + current_class + "@F@" + current_func_name).length(),
std::string("@C@" + base_class + "@F@" + method_name));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, the value given to parameter symbol_id is modified in each iteration of the loop. Likely, this should operate on a copy of that value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something, but with the current change a copy is being created outside of the loop, and the loop still repeatedly modifies this single copy. If the intention is to replace current_class with base_class and current_func_name with method_name, I think this will only work for the first iteration, as after that the replacement will have been stored in sym_id.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will only work for the first iteration.

I guess I got your point now. method_name doesn't need to be updated.
We are searching for the same method across different classes, so only the classname needs to vary.

For example if main.py contains:

class A:
  def foo()

class B(A):
  pass
  
obj = B()
obj.foo()

After parsing A and B we will have the following symbol for the method foo in the context: py:main.py@C@A@foo

Given that the obj is of type B, we first look for "py:main.py@C@B@foo" (which doesn't exist), then look for "py:main.py@C@A@foo".

Copy link
Member

@fbrausse fbrausse Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That wasn't my point but seems to be valid, too. :)

Consider a class Z (one character) and one XY (2 characters) to be iterated over in the loop, in this order, and let the current class name be ABC (3 characters). What will happen is sym_id = "py:main.py@C@ABC@F@foo" is getting replaced by sym_id = "py:main.py@C@Z@F@foo". In the second iteration, the substring @C@ABC will not be found anymore in sym_id, because after the first iteration the current_class ABC has already been replaced with Z.

That's why I thought that this should work on a new copy of the original string in each iteration.

Edit: Never mind, you're updating the current_class. Didn't see that, sorry. From my point of view this method could use a comment and I think a proper way of identifying sub-/base-class relationships and overridden methods in the symbol table is needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the second iteration, the substring @C@ABC will not be found anymore in sym_id, because after the first iteration the current_class ABC has already been replaced with Z.

current_class is updated at the end of each iteration. So, in the second call to sym_id.rfind("@C@" + current_class);, current_class would hold "Z", and the substring would be found.

src/python-frontend/python_converter.cpp Outdated Show resolved Hide resolved
src/python-frontend/python_converter.cpp Outdated Show resolved Hide resolved
@brcfarias
Copy link
Collaborator Author

brcfarias commented Feb 5, 2024

Does the delegation to the first __init__ in the list of base classes work recursively?

Not yet, but I'll implement it. Thanks for pointing that out.

I am not a big fan of encoding the relationship between methods and classes in the symbol's identifier. Is the C++ frontend using the same mechanism, or could we possibly find a way to make these relationships accessible via some kind of API?

The relationship is encoded in the AST/JSON: https://github.com/brcfarias/esbmc/blob/035bcf33fbcbf8b0a74762fc47fec05854ebd71c/src/python-frontend/python_converter.cpp#L377-L379

@fbrausse
Copy link
Member

fbrausse commented Feb 6, 2024

Does the delegation to the first __init__ in the list of base classes work recursively?

Not yet, but I'll implement it. Thanks for pointing that out.

👍

I am not a big fan of encoding the relationship between methods and classes in the symbol's identifier. Is the C++ frontend using the same mechanism, or could we possibly find a way to make these relationships accessible via some kind of API?

The relationship is encoded in the AST/JSON: https://github.com/brcfarias/esbmc/blob/035bcf33fbcbf8b0a74762fc47fec05854ebd71c/src/python-frontend/python_converter.cpp#L377-L379

Right. I meant an API in ESBMC that allows obtaining symbols corresponding to base classes and/or overridden functions without having to fiddle with the identifiers in a non-documented way. Do you know whether or how the C++ frontend is doing this?

@brcfarias
Copy link
Collaborator Author

Right. I meant an API in ESBMC that allows obtaining symbols corresponding to base classes and/or overridden functions without having to fiddle with the identifiers in a non-documented way. Do you know whether or how the C++ frontend is doing this?

I see. In C++ this part is handled as follows:

if (!perform_virtual_dispatch(member))
{
exprt comp;
if (get_decl(*member.getMemberDecl(), comp))
return true;
if (!is_member_decl_static(member))
{
exprt base;
if (get_expr(*member.getBase(), base))
return true;
assert(!comp.name().empty());
// for MemberExpr referring to struct field (or method in case of C++ class)
new_expr = member_exprt(base, comp.name(), comp.type());
}
else
{
// for static members, use the member decl symbol directly
// without making a member_exprt, e.g.
// If the member_exprt refers to a class static member, then
// replace "OBJECT.MyStatic = 1" with "MyStatic = 1;"
assert(comp.statement() == "decl");
assert(comp.op0().is_symbol());
new_expr = comp.op0();
}
}
else
{
if (get_vft_binding_expr(member, new_expr))
return true;
}

If the method isn't virtual we simply call "new_expr = member_exprt(base, comp.name(), comp.type());", as seen on line 1706 above.

If it is virtual, this part is responsible for managing dynamic binding:

// get the parent class id of the method to which this MemberExpr refers
const auto md = llvm::dyn_cast<clang::CXXMethodDecl>(member.getMemberDecl());
assert(md);
std::string base_class_id, base_class_name;
get_decl_name(*md->getParent(), base_class_name, base_class_id);
exprt comp;
if (get_decl(*member.getMemberDecl(), comp))
return true;
/*
* This is the component name as in vtable's type symbol
* e.g. virtual_table::tag.Bird::c:@S@Bird@F@do_something#
*/
std::string member_comp_name =
vtable_type_prefix + base_class_id + "::" + comp.name().as_string();
pointer_typet member_type(comp.type());
member_exprt deref_member(vtable_ptr_deref, member_comp_name, member_type);
deref_member.set("#lvalue", true);
new_expr = dereference_exprt(deref_member, member_type);
new_expr.set("#lvalue", true);

I'm unsure if it's worth reusing this approach or perhaps extending struct_typet to include "base classes".

@fbrausse
Copy link
Member

fbrausse commented Feb 6, 2024

I'm unsure if it's worth reusing this approach or perhaps extending struct_typet to include "base classes".

I also don't know, yet. To my mind, the best option would be to extend the symbolt class with that kind of information, not the types themselves. But we can leave this for later.

@lucasccordeiro lucasccordeiro merged commit 9f1bd5d into esbmc:master Feb 7, 2024
9 checks passed
@lucasccordeiro
Copy link
Contributor

Thanks for submitting this PR, @brcfarias.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants